Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Skip to main content
Artificial Intelligence Computing Leadership from NVIDIA
Login
Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Search
Search
Enter the terms you wish to search for.
Research Areas
Speech Processing
Associated Publications
2024
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
Yuchen Hu, Chen Chen,
Huck Yang
, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang
NeurIPS
Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities
Siyin Wang,
Huck Yang
, Ji Wu, Chao Zhang
EMNLP
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Yuchen Hu, Chen Chen,
Huck Yang
, Ruizhe Li, Zhehuai Chen, Eng Siong Chng
ACL 2024
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
YuChen Hu, Chen Chen,
Huck Yang
, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng
ICLR 2024
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng,
Huck Yang
ICLR 2024
A Chat about Boring Problems: Studying GPT-Based Text Normalization
Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg
ICASSP
Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition
Aleksandr Laptev, Boris Ginsburg
IEEE
2023
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Chen Chen, YuChen Hu,
Huck Yang
, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng
NeurIPS 2023
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Srijith Radhakrishnan,
Huck Yang
, Sumeer Khan, Rohit Kumar, Narsis Kiani, David Gomez-Cabrero, Jesper Tegnér
EMNLP
Investigating End-to-End ASR Architectures for Long Form Audio Transcription
Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg
NeMo Forced Aligner and its application to word alignment for subtitle generation
Elena Rastorgueva, Vitaly Lavrukhin, Boris Ginsburg
Interspeech
Confidence-based Ensembles of End-to-End Speech Recognition Models
Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg
Interspeech
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleskii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg
2022
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg
Multi-blank Transducers for Speech Recognition
Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh, Subhankar Ghosh, Boris Ginsburg
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification
Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
Interspeech
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg
IEEE
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg
TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context
Nithin Rao Koluguri, Taejin Park, Boris Ginsburg
IEEE
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization
Evelina Bakhturina, Yang Zhang, Boris Ginsburg
2021
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings
Oktai Tatanov, Stanislav Beliaev, Boris Ginsburg
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings
Oktai Tatanov, Stanislav Beliaev, Boris Ginsburg
A Unified Transformer-based Framework for Duplex Text Normalization
Tuan Manh Lai, Yang Zhang, Evelina Bakhturina , Boris Ginsburg, Heng Ji
CarneliNet: Neural Mixture Model for Automatic Speech Recognition
Aleksei Kalinov, Somshubra Majumdar, Jagadeesh Balam, Boris Ginsburg
TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis
Stanislav Beliaev, Boris Ginsburg
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev, Boris Ginsburg
NeMo Inverse Text Normalization: From Development To Production
Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg
A Toolbox for Construction and Analysis of Speech Datasets
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg
SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition
Patrick K. O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko
Interspeech
Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition
Somshubra Majumdar, Jagadeesh Balam, Oleksii Hrinchuk, Vitaly Lavrukhin, Vahid Noroozi, Boris Ginsburg
Hi-Fi Multi-Speaker English TTS Dataset
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang
2020
MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection
Fei Jia, Somshubra Majumdar, Boris Ginsburg
IEEE
SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification
Nithin Rao Koluguri, Jason Li, Vitaly Lavrukhin, Boris Ginsburg
Improving Noise Robustness of an End-to-End Neural Model for Automatic Speech Recognition
Jagadeesh Balam, Jocelyn Huang, Vitaly Lavrukhin, Slyne Deng, Somshubra Majumdar, Boris Ginsburg
Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition
Jocelyn Huang, Oleksii Kuchaiev, Patrick O’Neill, Vitaly Lavrukhin, Jason Li, Adriana Flores, Georg Kucsko, Boris Ginsburg
MatchboxNet - 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition
Somshubra Majumdar, Boris Ginsburg
Interspeech
2019
Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model
Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg
IEEE
QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions
Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang
Jasper: An End-to-End Convolutional Neural Acoustic Model
Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde