Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Skip to main content
Artificial Intelligence Computing Leadership from NVIDIA
Login
Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Search
Search
Enter the terms you wish to search for.
Publications
Our publications provide insight into some of our leading-edge research.
Filters
Search
Apply
Filters
Filters
Publication Year
2025
(3)
2024
(10)
2023
(8)
2022
(8)
2021
(11)
2020
(5)
2019
(3)
Facet Publication Year
Research Areas
Speech Processing
(21)
Natural Language Processing
(10)
Machine Translation
(6)
Generative AI
(5)
Artificial Intelligence and Machine Learning
(4)
Applied Perception
(3)
Computer Vision
(1)
Events
ICLR
(2)
NeurIPS
(1)
21 results found
Speech Processing
Clear all
2024
2021
Speech Processing
2024
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Huck Yang
, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, yen-ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Shinji Watanabe, Andreas Stolcke
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
Yuchen Hu, Chen Chen,
Huck Yang
, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang
NeurIPS
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
Sung-Feng Huang
, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang,
Huck Yang
, Yu Tsao,
Frank Wang
, Hung-yi Lee,
Szu-Wei Fu
Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities
Siyin Wang,
Huck Yang
, Ji Wu, Chao Zhang
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Yichen Lu, Jiaqi Song,
Huck Yang
, Shinji Watanabe
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Yuchen Hu, Chen Chen,
Huck Yang
, Ruizhe Li, Zhehuai Chen, Eng Siong Chng
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
YuChen Hu, Chen Chen,
Huck Yang
, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng
ICLR
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng,
Huck Yang
ICLR
A Chat about Boring Problems: Studying GPT-Based Text Normalization
Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg
Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition
Aleksandr Laptev, Boris Ginsburg
2021
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings
Oktai Tatanov, Stanislav Beliaev, Boris Ginsburg
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings
Oktai Tatanov, Stanislav Beliaev, Boris Ginsburg
A Unified Transformer-based Framework for Duplex Text Normalization
Tuan Manh Lai, Yang Zhang, Evelina Bakhturina , Boris Ginsburg, Heng Ji
CarneliNet: Neural Mixture Model for Automatic Speech Recognition
Aleksei Kalinov, Somshubra Majumdar, Jagadeesh Balam, Boris Ginsburg
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev, Boris Ginsburg
TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis
Stanislav Beliaev, Boris Ginsburg
NeMo Inverse Text Normalization: From Development To Production
Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg
A Toolbox for Construction and Analysis of Speech Datasets
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg
SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition
Patrick K. O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko
Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition
Somshubra Majumdar, Jagadeesh Balam, Oleksii Hrinchuk, Vitaly Lavrukhin, Vahid Noroozi, Boris Ginsburg
Hi-Fi Multi-Speaker English TTS Dataset
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang