Skip to content

Commit 68216b0

Browse files
Update README.md
1 parent 93906b3 commit 68216b0

File tree

1 file changed

+55
-23
lines changed

1 file changed

+55
-23
lines changed

README.md

+55-23
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
<div align="center">
55

6-
**PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition.**
6+
**PyTorch implementation of Conformer variants for Speech Recognition**
77

88

99
</div>
@@ -25,11 +25,21 @@
2525
</a>
2626

2727

28-
Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. Conformer combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.
28+
This repository contains PyTorch implementations of various Conformer architectures for speech recognition. Conformer models combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. The original Conformer significantly outperforms previous Transformer and CNN-based models, achieving state-of-the-art accuracies.
29+
30+
## Implemented Architectures
31+
32+
1. **Original Conformer** - The standard architecture as introduced in the paper "Conformer: Convolution-augmented Transformer for Speech Recognition"
33+
34+
2. **Fast Conformer** - An optimized version that achieves 2.8× faster inference than the original Conformer through:
35+
- 8x downsampling (vs 4x in original) using three depthwise separable convolutions
36+
- Reduced channel count (256 instead of 512) in the subsampling blocks
37+
- Smaller kernel size (9 instead of 31) in convolutional modules
38+
- Support for linearly scalable attention for long-form audio via limited context + global token
2939

3040
<img src="https://user-images.githubusercontent.com/42150335/105602364-aeafad80-5dd8-11eb-8886-b75e2d9d31f4.png" height=600>
3141

32-
This repository contains only model code, but you can train with conformer at [openspeech](https://github.com/openspeech-team/openspeech)
42+
This repository contains model code only, but you can train with conformer variants at [openspeech](https://github.com/openspeech-team/openspeech)
3343

3444
## Installation
3545
This project recommends Python 3.7 or higher.
@@ -49,6 +59,8 @@ pip install -e .
4959

5060
## Usage
5161

62+
### Original Conformer
63+
5264
```python
5365
import torch
5466
import torch.nn as nn
@@ -79,23 +91,43 @@ outputs, output_lengths = model(inputs, input_lengths)
7991
# Calculate CTC Loss
8092
loss = criterion(outputs.transpose(0, 1), targets, output_lengths, target_lengths)
8193
```
82-
83-
## Troubleshoots and Contributing
84-
If you have any questions, bug reports, and feature requests, please [open an issue](https://github.com/sooftware/conformer/issues) on github or
85-
contacts [email protected] please.
86-
87-
I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.
88-
89-
## Code Style
90-
I follow [PEP-8](https://www.python.org/dev/peps/pep-0008/) for code style. Especially the style of docstrings is important to generate documentation.
91-
92-
## Reference
93-
- [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/pdf/2005.08100.pdf)
94-
- [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860)
95-
- [kimiyoung/transformer-xl](https://github.com/kimiyoung/transformer-xl)
96-
- [espnet/espnet](https://github.com/espnet/espnet)
97-
98-
## Author
99-
100-
* Soohwan Kim [@sooftware](https://github.com/sooftware)
101-
* Contacts: [email protected]
94+
95+
### Fast Conformer
96+
97+
```python
98+
import torch
99+
import torch.nn as nn
100+
from conformer import FastConformer
101+
102+
batch_size, sequence_length, dim = 3, 12345, 80
103+
104+
cuda = torch.cuda.is_available()
105+
device = torch.device('cuda' if cuda else 'cpu')
106+
107+
criterion = nn.CTCLoss().to(device)
108+
109+
inputs = torch.rand(batch_size, sequence_length, dim).to(device)
110+
input_lengths = torch.LongTensor([12345, 12300, 12000])
111+
targets = torch.LongTensor([[1, 3, 3, 3, 3, 3, 4, 5, 6, 2],
112+
[1, 3, 3, 3, 3, 3, 4, 5, 2, 0],
113+
[1, 3, 3, 3, 3, 3, 4, 2, 0, 0]]).to(device)
114+
target_lengths = torch.LongTensor([9, 8, 7])
115+
116+
model = FastConformer(num_classes=10,
117+
input_dim=dim,
118+
encoder_dim=32,
119+
num_encoder_layers=3,
120+
conv_kernel_size=9).to(device)
121+
122+
# Forward propagate
123+
outputs, output_lengths = model(inputs, input_lengths)
124+
125+
# Calculate CTC Loss
126+
loss = criterion(outputs.transpose(0, 1), targets, output_lengths, target_lengths)
127+
```
128+
129+
## TODO
130+
131+
- [ ] Training scripts
132+
133+
- [ ] Inference scripts

0 commit comments

Comments
 (0)