You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+55-23
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
4
4
<divalign="center">
5
5
6
-
**PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition.**
6
+
**PyTorch implementation of Conformer variants for Speech Recognition**
7
7
8
8
9
9
</div>
@@ -25,11 +25,21 @@
25
25
</a>
26
26
27
27
28
-
Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. Conformer combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.
28
+
This repository contains PyTorch implementations of various Conformer architectures for speech recognition. Conformer models combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. The original Conformer significantly outperforms previous Transformer and CNN-based models, achieving state-of-the-art accuracies.
29
+
30
+
## Implemented Architectures
31
+
32
+
1.**Original Conformer** - The standard architecture as introduced in the paper "Conformer: Convolution-augmented Transformer for Speech Recognition"
33
+
34
+
2.**Fast Conformer** - An optimized version that achieves 2.8× faster inference than the original Conformer through:
35
+
- 8x downsampling (vs 4x in original) using three depthwise separable convolutions
36
+
- Reduced channel count (256 instead of 512) in the subsampling blocks
37
+
- Smaller kernel size (9 instead of 31) in convolutional modules
38
+
- Support for linearly scalable attention for long-form audio via limited context + global token
I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.
88
-
89
-
## Code Style
90
-
I follow [PEP-8](https://www.python.org/dev/peps/pep-0008/) for code style. Especially the style of docstrings is important to generate documentation.
91
-
92
-
## Reference
93
-
-[Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/pdf/2005.08100.pdf)
94
-
-[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860)
0 commit comments