1. [Publications](/publications)
2. CarneliNet: Neural Mixture Model for Automatic Speech Recognition
 
 # CarneliNet: Neural Mixture Model for Automatic Speech Recognition

  ![Publication image](/sites/default/files/styles/wide/public/default_images/default.jpeg?itok=qUFsuJCP "Publication image")

 End-to-end automatic speech recognition systems have achieved great accuracy by using deeper and deeper models. However, the increased depth comes with a larger receptive field that can negatively impact model performance in streaming scenarios. We propose an alternative approach that we call Neural Mixture Model. The basic idea is to introduce a parallel mixture of shallow networks instead of a very deep network. To validate this idea we design CarneliNet -- a CTC-based neural network composed of three mega-blocks. Each mega-block consists of multiple parallel shallow sub-networks based on 1D depthwise-separable convolutions. We evaluate the model on LibriSpeech, MLS and AISHELL-2 datasets and achieved close to state-of-the-art results for CTC-based models. Finally, we demonstrate that one can dynamically reconfigure the number of parallel sub-networks to accommodate the computational requirements without retraining.



 ## Authors



Aleksei Kalinov (Skolkovo Institute of Science and Technology)

Somshubra Majumdar (NVIDIA)

Jagadeesh Balam (NVIDIA)

Boris Ginsburg (NVIDIA)

 

 

 ## Publication Date



Thursday, July 22, 2021

 

 ## Research Area



[Speech Processing](/research-area/speech-processing)

 

 

 ## External Links



[Paper](https://arxiv.org/abs/2107.10708)