Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Skip to main content
Publications
Our publications provide insight into some of our leading-edge research.
Filters
Search
Apply
Filters
Filters
Publication Year
2025
(3)
2024
(10)
2023
(8)
2022
(8)
2021
(11)
2020
(5)
2019
(3)
Facet Publication Year
Research Areas
Speech Processing
(11)
Natural Language Processing
(3)
Artificial Intelligence and Machine Learning
(2)
Generative AI
(2)
Events
ICLR
(2)
11 results found
Speech Processing
Clear all
2025
2022
Speech Processing
2025
VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations
Sung-Feng Huang
, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Pin-Jui Ku, Ante Jukić,
Huck Yang
, Yu Tsao,
Frank Wang
, Hung-yi Lee,
Szu-Wei Fu
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
Chen Chen, Yuchen Hu, Siyin Wang, Helin Wang, Zhehuai Chen, Chao Zhang,
Huck Yang
, EngSiong Chng
ICLR
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu, Sang-gil Lee,
Huck Yang
, Yuan Gong,
Frank Wang
, James R. Glas, Rafael Valle
ICLR
2022
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg
Multi-blank Transducers for Speech Recognition
Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh, Subhankar Ghosh, Boris Ginsburg
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification
Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg
TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context
Nithin Rao Koluguri, Taejin Park, Boris Ginsburg
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization
Evelina Bakhturina, Yang Zhang, Boris Ginsburg