  ## Natural Language Processing

 ### Associated Publications

 

### 2026 

[Nemotron-Labs-Diffusion: A Tri-Mode Language Model Unifying Autoregressive, Diffusion, and Self-Speculation Decoding](/publication/2026-05_nemotron-labs-diffusion-tri-mode-language-model-unifying-autoregressive)

[Yonggan Fu](/person/yonggan-fu), Lexington Whalen, Abhinav Garg, Chengyue Wu, Maksim Khadkevich, Nicolai Oswald, Enze Xie, Daniel Egert, Sharath Turuvekere Sreenivas,, Shizhe Diao, Chenhan Yu, Ye Yu, Weijia Chen, Sajad Norouzi, Jingyu Liu, Shiyi Lan, Ligeng Zhu, Jin Wang, Jindong Jiang, Morteza Mardani, Mehran Maghoumi, Song Han, Ante Jukić, Nima Tajbakhsh, Jan Kautz, [Pavlo Molchanov](/person/pavlo-molchanov)













[TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models](/index.php/publication/2026-04_timeomni-1-incentivizing-complex-reasoning-time-series-large-language-models)

Tong Guan, [Huck Yang](/index.php/person/huck-yang), Sabato Marco Siniscalchi, Qingsong Wen, Ming Jin, Shirui Pan



[ICLR](https://openreview.net/forum?id=kOIclg7muL)









[RLP: Reinforcement as a Pretraining Objective](/publication/2026-04_rlp-reinforcement-pretraining-objective)

[Ali Hatamizadeh](/person/ali-hatamizadeh), Syeda Nahida Akter, Shrimai Prabhumoye, [Jan Kautz](/person/jan-kautz), Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Yejin Choi



[International Conference on Learning Representations (ICLR) 2026](https://iclr.cc/)









[iGRPO: Self-Feedback-Driven LLM Reasoning](/publication/2026-02_igrpo-self-feedback-driven-llm-reasoning)

[Ali Hatamizadeh](/person/ali-hatamizadeh), Shrimai Prabhumoye, Igor Gitman, [Ximing Lu](/person/ximing-lu), Seungju Han, Wei Ping, Yejin Choi, [Jan Kautz](/person/jan-kautz)













### 2025 

[Alpamayo 1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail](/publication/2025-10_alpamayo-r1)

[Marco Pavone](/person/marco-pavone), Many other contributors found on Page 33













[Fugatto 1 - Foundational Generative Audio Transformer Opus 1](/index.php/publication/2025-04_fugatto-1-foundational-generative-audio-transformer-opus-1)

Rafael Valle, Rohan Badlani, Zhifeng Kong, Sang-gil Lee, Arushi Goel, Sungwon Kim, Joao Felipe Santos, Shuqi Dai, [Siddharth Gururani](/index.php/person/siddharth-gururani), Aya AIJa'fari, Alex Liu, Kevin Shih, Wei Ping, [Huck Yang](/index.php/person/huck-yang), Bryan Catanzaro



[ICLR 2025](https://openreview.net/forum?id=B2Fqu7Y2cd)









[Hymba: A Hybrid-head Architecture for Small Language Models](/publication/2025-04_hymba-hybrid-head-architecture-small-language-models)

Xin Dong, [Yonggan Fu\*](/person/yonggan-fu), Shizhe Diao, [Wonmin Byeon](/person/wonmin-byeon), Zijia Chen, Ameya Sunil Mahabaleshwarkar, Shih-Yang Liu, [Matthijs Van keirsbilck](/person/matthijs-van-keirsbilck), [Min-Hung Chen](/person/min-hung-chen), [Yoshi Nishi](/person/yoshi-nishi), Yingyan Celine Lin, [Jan Kautz](/person/jan-kautz), [Pavlo Molchanov](/person/pavlo-molchanov)



[Hymba - ICLR 2025](https://jankautz.com/publications/Hymba_ICLR25.pdf)



ICLR spotlight paper





[Gated Delta Networks: Improving Mamba2 with Delta Rule](/index.php/publication/2025-04_gated-delta-networks-improving-mamba2-delta-rule)

Songlin Yang, [Jan Kautz](/index.php/person/jan-kautz), [Ali Hatamizadeh](/index.php/person/ali-hatamizadeh)



[International Conference on Learning Representations (ICLR) 2025](https://iclr.cc/)









[Audio Large Language Models Can Be Descriptive Speech Quality Evaluators](/publication/2025-04_audio-large-language-models-can-be-descriptive-speech-quality-evaluators)

Chen Chen, Yuchen Hu, Siyin Wang, Helin Wang, Zhehuai Chen, Chao Zhang, [Huck Yang](/person/huck-yang), EngSiong Chng



[ICLR 2025](https://openreview.net/forum?id=U42TkrEDzb)









[Minitron-SSM: Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning](/publication/2025-04_minitron-ssm-efficient-hybrid-language-model-compression-through-group-aware)

Ali Taghibakhshi, Sharath Turuvekere Sreenivas, [Saurav Muralidharan](/person/saurav-muralidharan), Marcin Chochowski, Yashaswi Karnati, Raviraj Joshi, Ameya Sunil Mahabaleshwarkar, Zijia Chen, Yoshi Suhara, Oluwatobi Olabiyi, Daniel Korzekwa, Mostofa Patwary, Mohammad Shoeybi, [Jan Kautz](/person/jan-kautz), Bryan Catanzaro, Ashwath Aithal, Nima Tajbakhsh, [Pavlo Molchanov](/person/pavlo-molchanov)



[NeurIPS 2025](https://arxiv.org/abs/2504.11409)









[Towards Neural Scaling Laws for Time Series Foundation Models](/index.php/publication/2025-04_towards-neural-scaling-laws-time-series-foundation-models)

Qingren Yao, [Huck Yang](/index.php/person/huck-yang), Renhe Jiang, Ming Jin, Shirui Pan



[ICLR 2025](https://openreview.net/forum?id=uCqxDfLYrB)









[Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation](/index.php/publication/2025-02_semantic-prompt-learning-weakly-supervised-semantic-segmentation)

Ci-Siang Lin, Chien-Yi Wang, [Frank Wang](/index.php/person/frank-wang), [Min-Hung Chen](/index.php/person/min-hung-chen)



[Winter Conference on Applications of Computer Vision (WACV)](https://wacv2025.thecvf.com/)









[Spatio-Temporal Context Prompting for Zero-Shot Action Detection](/publication/2025-02_spatio-temporal-context-prompting-zero-shot-action-detection)

Wei-Jhe Huang, [Min-Hung Chen](/person/min-hung-chen), Shang-Hong Lai



[Winter Conference on Applications of Computer Vision (WACV)](https://wacv2025.thecvf.com/)









### 2024 

[Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition](/index.php/publication/2024-12_large-language-model-based-generative-error-correction-challenge-and-baselines)

[Huck Yang](/index.php/person/huck-yang), Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, yen-ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Shinji Watanabe, Andreas Stolcke



[SLT 2024](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10832176)









[Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models](/index.php/publication/2024-12_self-taught-recognizer-toward-unsupervised-adaptation-speech-foundation-models)

Yuchen Hu, Chen Chen, [Huck Yang](/index.php/person/huck-yang), Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang



[NeurIPS](https://arxiv.org/pdf/2405.14161)









[Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities](/index.php/publication/2024-11_bayesian-example-selection-improves-context-learning-speech-text-and-visual)

Siyin Wang, [Huck Yang](/index.php/person/huck-yang), Ji Wu, Chao Zhang



[EMNLP](https://arxiv.org/pdf/2404.14716)









[FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model](/index.php/publication/2024-11_fastadasp-multitask-adapted-efficient-inference-large-speech-language-model)

Yichen Lu, Jiaqi Song, [Huck Yang](/index.php/person/huck-yang), Shinji Watanabe



[EMNLP](https://aclanthology.org/2024.emnlp-industry.33.pdf)









[From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment](/publication/2024-11_descriptive-richness-bias-unveiling-dark-side-generative-image-caption)

Yusuke Hirota, [Ryo Hachiuma](/person/ryo-hachiuma), [Huck Yang](/person/huck-yang), Yuta Nakashima



[EMNLP](https://arxiv.org/pdf/2406.13912)









[Open-World Task and Motion Planning via Vision-Language Model Inferred Constraints](/publication/2024-11_open-world-task-and-motion-planning-vision-language-model-inferred-constraints)

Nishanth Kumar, William Shen, [Fabio Ramos](/person/fabio-ramos), Dieter Fox, Tomás Lozano-Pérez, Leslie Pack Kaelbling, [Caelan Garrett](/person/caelan-garrett)



[CoRL 2024 Workshop on Language and Robot Learning Language as an Interface](https://arxiv.org/abs/2411.08253)









[HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation](/publication/2024-11_hamster-hierarchical-action-models-open-world-robot-manipulation)

Yi Li, Yuquan Deng, Jesse Zhang, Joel Jang, Marius Memmel, [Caelan Garrett](/person/caelan-garrett), [Fabio Ramos](/person/fabio-ramos), Dieter Fox, [Anqi Li](/person/anqi-li), Abhishek Gupta, [Ankit Goyal](/person/ankit-goyal)



[International Conference on Learning Representations (ICLR)](https://openreview.net/forum?id=xZ6ZrbpBRp)









[Guiding Long-Horizon Task and Motion Planning with Vision Language Models](/publication/2024-11_guiding-long-horizon-task-and-motion-planning-vision-language-models)

Zhutian Yang, [Caelan Garrett](/person/caelan-garrett), Dieter Fox, Tomás Lozano-Pérez, Leslie Pack Kaelbling



[IEEE International Conference on Robotics &amp; Automation (ICRA)](https://arxiv.org/abs/2410.02193)









[GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators](/publication/2024-08_gentranslate-large-language-models-are-generative-multilingual-speech-and)

Yuchen Hu, Chen Chen, [Huck Yang](/person/huck-yang), Ruizhe Li, Zhehuai Chen, Eng Siong Chng



[ACL 2024](https://arxiv.org/pdf/2402.06894)









[DoRA: Weight-Decomposed Low-Rank Adaptation](/publication/2024-07_dora-weight-decomposed-low-rank-adaptation)

Shih-Yang Liu, Chien-Yi Wang, [Hongxu Danny Yin](/person/danny-yin), [Pavlo Molchanov](/person/pavlo-molchanov), [Frank Wang](/person/frank-wang), Kwang-Ting Cheng, [Min-Hung Chen](/person/min-hung-chen)



[International Conference on Machine Learning (ICML) 2024](https://icml.cc/Conferences/2024)









[An Empirical Study of Mamba-based Language Models](/publication/2024-06_empirical-study-mamba-based-language-models)

Roger Waleffe, [Wonmin Byeon](/person/wonmin-byeon), Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, [Ali Hatamizadeh](/person/ali-hatamizadeh), Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, [Jan Kautz](/person/jan-kautz), Mohammad Shoeybi, Bryan Catanzaro



<https://arxiv.org/pdf/2406.07887>









[Large Language Models are Efficient Learners of Noise-Robust Speech Recognition](/index.php/publication/2024-05_large-language-models-are-efficient-learners-noise-robust-speech-recognition)

YuChen Hu, Chen Chen, [Huck Yang](/index.php/person/huck-yang), Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng



[ICLR 2024](https://iclr.cc/Conferences/2024)









[It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition](/publication/2024-05_it-s-never-too-late-fusing-acoustic-information-large-language-models-automatic)

Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng, [Huck Yang](/person/huck-yang)



[ICLR 2024](https://iclr.cc/Conferences/2024)









[A Chat about Boring Problems: Studying GPT-Based Text Normalization](/index.php/publication/2024-03_chat-about-boring-problems-studying-gpt-based-text-normalization)

Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg



[ICASSP](https://ieeexplore.ieee.org/xpl/conhome/10445798/proceeding)









### 2023 

[HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models](/publication/2023-12_hyporadise-open-baseline-generative-speech-recognition-large-language-models)

Chen Chen, YuChen Hu, [Huck Yang](/person/huck-yang), Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng



[NeurIPS 2023](https://openreview.net/forum?id=cAjZ3tMye6)









[Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition](/publication/2023-12_whispering-llama-cross-modal-generative-error-correction-framework-speech)

Srijith Radhakrishnan, [Huck Yang](/person/huck-yang), Sumeer Khan, Rohit Kumar, Narsis Kiani, David Gomez-Cabrero, Jesper Tegnér



[EMNLP](https://aclanthology.org/2023.emnlp-main.618/)









[NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails](/index.php/publication/2023-10_nemo-guardrails-toolkit-controllable-and-safe-llm-applications-programmable)

Traian Rebedea, Razvan Dinu, Makesh Sreedhar, Christopher Parisien, Jonathan Cohen













[SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF](/index.php/publication/2023-10_steerlm-attribute-conditioned-sft-user-steerable-alternative-rlhf)

Yi Dong, Zhilin Wang, Makesh Narsimhan Sreedhar, Xianchao Wu, Oleksii Kuchaiev













### 2022 

[Evaluating Parameter Efficient Learning for Generation](/index.php/publication/2022-10_evaluating-parameter-efficient-learning-generation)

Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro













[Thutmose Tagger: Single-pass neural model for Inverse Text Normalization](/index.php/publication/2022-07_thutmose-tagger-single-pass-neural-model-inverse-text-normalization)

Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg













[Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization](/index.php/publication/2022-03_shallow-fusion-weighted-finite-state-transducer-and-language-model-text)

Evelina Bakhturina, Yang Zhang, Boris Ginsburg













### 2021 

[Text Mining Drug/Chemical-Protein Interactions using an Ensemble of BERT and T5 Based Models](/index.php/publication/2021-11_text-mining-drugchemical-protein-interactions-using-ensemble-bert-and-t5-based)

Virginia Adams, Hoo-Chang Shin, Carol Anderson, Bo Liu, Anas Abidin













[A Unified Transformer-based Framework for Duplex Text Normalization](/publication/2021-08_unified-transformer-based-framework-duplex-text-normalization)

Tuan Manh Lai, Yang Zhang, Evelina Bakhturina , Boris Ginsburg, Heng Ji













[SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services](/index.php/publication/2021-05_sgd-qa-fast-schema-guided-dialogue-state-tracking-unseen-services)

Yang Zhang, Vahid Noroozi, Evelina Bakhturina, Boris Ginsburg













[NeMo Inverse Text Normalization: From Development To Production](/index.php/publication/2021-04_nemo-inverse-text-normalization-development-production)

Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg













### 2020 

[BioMegatron: Larger Biomedical Domain Language Model](/index.php/publication/2020-10_biomegatron-larger-biomedical-domain-language-model)

Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, Raghav Mani



[ACL Anthology](https://aclanthology.org/2020.emnlp-main.379/)









[A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided Dialogue Dataset](/index.php/publication/2020-08_fast-and-robust-bert-based-dialogue-state-tracker-schema-guided-dialogue)

Vahid Noroozi, Yang Zhang, Evelina Bakhturina, Tomasz Kornuta













 

 



 ### Researchers

 

[Ali Hatamizadeh](/person/ali-hatamizadeh)



[Chia-Wen Kuo](/person/chia-wen-kuo)



[Ekta Prashnani](/person/ekta-prashnani)



[Fengyuan Hu](/person/fengyuan-hu)



[Jaesung Choe](/person/jaesung-choe)



[Sameer Dharur](/index.php/person/sameer-dharur)



[Sung-Feng Huang](/index.php/person/sung-feng-huang)



[Ximing Lu](/index.php/person/ximing-lu)



[Yoad Tewel](/person/yoad-tewel)



[Yonggan Fu](/person/yonggan-fu)



[Yuqi Xie](/person/yuqi-xie)



[Yusuke Hirota](/person/yusuke-hirota)