1. [Publications](/publications)
2. NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022
 
 # NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022

  ![Publication image](/sites/default/files/styles/wide/public/default_images/default.jpeg?itok=qUFsuJCP "Publication image")

 This paper provides an overview of NVIDIA NeMo’s speech translation systems for the IWSLT 2022 Offline Speech Translation Task. Our cascade system consists of 1) Conformer RNN-T automatic speech recognition model, 2) punctuation-capitalization model based on pre-trained T5 encoder, 3) ensemble of Transformer neural machine translation models fine-tuned on TED talks. Our end-to-end model has less parameters and consists of Conformer encoder and Transformer decoder. It relies on the cascade system by re-using its pre-trained ASR encoder and training on synthetic translations generated with the ensemble of NMT models. Our En-&gt;De cascade and end-to-end systems achieve 29.7 and 26.2 BLEU on the 2020 test set correspondingly, both outperforming the previous year’s best of 26 BLEU.



 ## Authors



Oleksii Hrinchuk (NVIDIA)

Vahid Noroozi (NVIDIA)

Abhinav Khattar (NVIDIA)

Anton Peganov (NVIDIA)

Sandeep Subramanian (NVIDIA)

Somshubra Majumdar (NVIDIA)

Oleksii Kuchaiev (NVIDIA)

 

 

 ## Publication Date



Monday, May 2, 2022

 

 ## Published in



[IWSLT](https://aclanthology.org/2022.iwslt-1.18/)

 

 ## Research Area



[Machine Translation](/research-area/machine-translation)

 

 

 ## External Links



[Paper](https://aclanthology.org/2022.iwslt-1.18.pdf)