  Huck Yang  

 



  ![](/sites/default/files/person/genai.jpeg)

  

 I am a Sr. Research Scientist at **NV Research**. 

I obtained my Ph.D. and M.Sc. from Georgia Institute of Technology, USA with Wallace H. Coulter fellowship and my B.Sc. from National Taiwan University.

> My primary research lies in the area of Multilingual Model Alignments and Speech-Language Modeling. Specifically:

- **Speech-Language Alignment**: I study new cross-modal alignment algorithms ([task-activating prompting](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10389673), [whispering-LLaMA](https://github.com/Srijith-rkr/Whispering-LLaMA), [LLM-ASR](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10389632)) for adapting large language model (LLM) for [noise-robust](https://arxiv.org/abs/2401.10446) speech processing, audio captioning, and generative error correction.
- **Post-Training Methods**: I explore new [in-context learning](https://arxiv.org/pdf/2309.07081.pdf), prompt-tuning, adapter, [neural structured state space models](https://arxiv.org/pdf/2306.00331.pdf), and theoretical justifications ([Voice2Series](https://proceedings.mlr.press/v139/yang21j.html)) of parameter-efficient learning to improve the current class of large-scale acoustic model adaptation ([TIH](https://arxiv.org/pdf/2306.01015.pdf)) and general time series understanding.
- **Multilingual and Robust Evaluation**: My earlier works include developing multilingual [privacy-preserving](https://arxiv.org/pdf/2104.01271.pdf) and intervention-resilient algorithms ([Causal-Inference Q-Network](https://ojs.aaai.org/index.php/AAAI/article/view/20862)) and benchmarks ([HyPoradise](https://openreview.net/pdf?id=cAjZ3tMye6)) for audio and general deep reinforcement learning that comply with data protection regulations, aimed at human-oriented interaction with conversational signals.

I have served as the special session chair for ICASSP 2024, focusing on In-Context Learning for Speech and Spoken Language Processing, and for ICASSP 2022 on Quantum Machine Learning. I received a Best Student Paper Award Nomination at Interspeech 2023, an Outstanding Reviewer Award from NeurIPS 2021, and the 1st Prize Award from Xanadu Quantum ML Research Global Competition in 2019.

As a member of the Senior Technical Committee in Applied Signal Processing Systems of IEEE SPS, I am also interested in the open-source development of variational quantum circuit learning on Quantum CUDA and alignment topics in language model at NV Research.

Previously, I was a research intern hosted by Google Bard and DeepMind, Amazon Alexa, and Hitachi Central Lab during my Ph.D journey. I worked full-time at Amazon's AGI organization for one year before joining Nvidia, and interned at TSMC for mixed-signal IC design before starting my Ph.D.

Open to collaborations with forthright and highly motivated researchers and working on open-source projects.

**Selected Tutorials:**

- "Spoken Conversational Agents with Large Language Models," to be hosted at EMNLP 2025 *Tutorial*
- "Large-Scale and Parameter-Efficient Language Modeling for Speech Processing," *ASRU 2023 Tutorial*
- "[Resource-Efficient and Cross-Modal Learning Toward Foundation Models](https://www.youtube.com/watch?v=k_egHWj09l4)," *Interspeech 2023 Tutorial*
- *"*[*Adversarial Robustness, Reprogramming and Prompting for Speech and Language Processing*](https://www.youtube.com/watch?v=-iirkbYkyXI)*," ICASSP 2022 Tutorial*
- "[Quantum Neural Networks for Speech and Language Processing](https://www.youtube.com/watch?v=ltaiNcW1buo)," *IJCAI 2021 Tutorial*

**Recent Invited Talks:**

- "Speech Language Alignments in Large-Scale Pre-Trained Models," CMU LTI, PA, 2024
- "Data Privacy and Evaluation Challenges of Large Language Model Based Speech Recognition", ISCA SIG-SPSC, 2024
- "Characterizing Large LMs for Generative Speech Recognition Error Correction," MIT CSAIL, MA, USA, 2023
- "Trainable Input Perturbation as Frozen Pre-trained Model Adaptation," Mila, Montreal, Canada, 2022



   Research Area(s)

[Generative AI](/index.php/research-area/generative-ai)

[Machine Translation](/index.php/research-area/machine-translation)

[Quantum Computing](/index.php/research-area/quantum-computing)

[Speech Processing](/index.php/research-area/speech-processing)

 

 

  

 Main Field of Interest

[Natural Language Processing](/index.php/research-area/natural-language-processing)

 

  

 Google Scholar

<https://scholar.google.com/citations?user=TT3XJW8AAAAJ>

 

  

 

 

 



 ### Publications

 

### 2026 

[Test-Time Alignment for Large Language Models via Textual Model Predictive Control](/index.php/publication/2026-04_test-time-alignment-large-language-models-textual-model-predictive-control)

Kuang-Da Wang, Teng-Ruei Chen, Yu Heng Hung, Guo-Xun Ko, Shuoyang Ding, [Frank Wang](/index.php/person/frank-wang), [Huck Yang](/index.php/person/huck-yang), Wen-Chih Peng, Ping-Chun Hsieh



[ICLR](https://openreview.net/forum?id=DsS3xRPSs5)









[TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models](/publication/2026-04_timeomni-1-incentivizing-complex-reasoning-time-series-large-language-models)

Tong Guan, [Huck Yang](/person/huck-yang), Sabato Marco Siniscalchi, Qingsong Wen, Ming Jin, Shirui Pan



[ICLR](https://openreview.net/forum?id=kOIclg7muL)









### 2025 

[VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations](/index.php/publication/2025-08_voicenong-robust-high-quality-speech-editing-model-without-hallucinations)

[Sung-Feng Huang](/index.php/person/sung-feng-huang), Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Pin-Jui Ku, Ante Jukić, [Huck Yang](/index.php/person/huck-yang), Yu Tsao, [Frank Wang](/index.php/person/frank-wang), Hung-yi Lee, [Szu-Wei Fu](/index.php/person/szu-wei-fu)



[Interspeech 2025](https://www.interspeech2025.org/home)









[Fugatto 1 - Foundational Generative Audio Transformer Opus 1](/index.php/publication/2025-04_fugatto-1-foundational-generative-audio-transformer-opus-1)

Rafael Valle, Rohan Badlani, Zhifeng Kong, Sang-gil Lee, Arushi Goel, Sungwon Kim, Joao Felipe Santos, Shuqi Dai, [Siddharth Gururani](/index.php/person/siddharth-gururani), Aya AIJa'fari, Alex Liu, Kevin Shih, Wei Ping, [Huck Yang](/index.php/person/huck-yang), Bryan Catanzaro



[ICLR 2025](https://openreview.net/forum?id=B2Fqu7Y2cd)









[Audio Large Language Models Can Be Descriptive Speech Quality Evaluators](/publication/2025-04_audio-large-language-models-can-be-descriptive-speech-quality-evaluators)

Chen Chen, Yuchen Hu, Siyin Wang, Helin Wang, Zhehuai Chen, Chao Zhang, [Huck Yang](/person/huck-yang), EngSiong Chng



[ICLR 2025](https://openreview.net/forum?id=U42TkrEDzb)









[UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation](/index.php/publication/2025-04_uniwav-towards-unified-pre-training-speech-representation-learning-and)

Alexander H. Liu, Sang-gil Lee, [Huck Yang](/index.php/person/huck-yang), Yuan Gong, [Frank Wang](/index.php/person/frank-wang), James R. Glas, Rafael Valle



[ICLR 2025](https://openreview.net/forum?id=yj9lLwMjnE)









[Towards Neural Scaling Laws for Time Series Foundation Models](/index.php/publication/2025-04_towards-neural-scaling-laws-time-series-foundation-models)

Qingren Yao, [Huck Yang](/index.php/person/huck-yang), Renhe Jiang, Ming Jin, Shirui Pan



[ICLR 2025](https://openreview.net/forum?id=uCqxDfLYrB)









### 2024 

[Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition](/index.php/publication/2024-12_large-language-model-based-generative-error-correction-challenge-and-baselines)

[Huck Yang](/index.php/person/huck-yang), Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, yen-ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Shinji Watanabe, Andreas Stolcke



[SLT 2024](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10832176)









[Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models](/index.php/publication/2024-12_self-taught-recognizer-toward-unsupervised-adaptation-speech-foundation-models)

Yuchen Hu, Chen Chen, [Huck Yang](/index.php/person/huck-yang), Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang



[NeurIPS](https://arxiv.org/pdf/2405.14161)









[Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits](/index.php/publication/2024-12_detecting-undetectable-assessing-efficacy-current-spoof-detection-methods)

[Sung-Feng Huang](/index.php/person/sung-feng-huang), Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, [Huck Yang](/index.php/person/huck-yang), Yu Tsao, [Frank Wang](/index.php/person/frank-wang), Hung-yi Lee, [Szu-Wei Fu](/index.php/person/szu-wei-fu)



[IEEE SLT 2024](https://2024.ieeeslt.org/)









[From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment](/publication/2024-11_descriptive-richness-bias-unveiling-dark-side-generative-image-caption)

Yusuke Hirota, [Ryo Hachiuma](/person/ryo-hachiuma), [Huck Yang](/person/huck-yang), Yuta Nakashima



[EMNLP](https://arxiv.org/pdf/2406.13912)









[FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model](/publication/2024-11_fastadasp-multitask-adapted-efficient-inference-large-speech-language-model)

Yichen Lu, Jiaqi Song, [Huck Yang](/person/huck-yang), Shinji Watanabe



[EMNLP](https://aclanthology.org/2024.emnlp-industry.33.pdf)









[Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities](/publication/2024-11_bayesian-example-selection-improves-context-learning-speech-text-and-visual)

Siyin Wang, [Huck Yang](/person/huck-yang), Ji Wu, Chao Zhang



[EMNLP](https://arxiv.org/pdf/2404.14716)









[GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators](/publication/2024-08_gentranslate-large-language-models-are-generative-multilingual-speech-and)

Yuchen Hu, Chen Chen, [Huck Yang](/person/huck-yang), Ruizhe Li, Zhehuai Chen, Eng Siong Chng



[ACL 2024](https://arxiv.org/pdf/2402.06894)









[Large Language Models are Efficient Learners of Noise-Robust Speech Recognition](/publication/2024-05_large-language-models-are-efficient-learners-noise-robust-speech-recognition)

YuChen Hu, Chen Chen, [Huck Yang](/person/huck-yang), Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng



[ICLR 2024](https://iclr.cc/Conferences/2024)









[It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition](/index.php/publication/2024-05_it-s-never-too-late-fusing-acoustic-information-large-language-models-automatic)

Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng, [Huck Yang](/index.php/person/huck-yang)



[ICLR 2024](https://iclr.cc/Conferences/2024)









### 2023 

[HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models](/publication/2023-12_hyporadise-open-baseline-generative-speech-recognition-large-language-models)

Chen Chen, YuChen Hu, [Huck Yang](/person/huck-yang), Sabato Marco Siniscalchi, Pin-Yu Chen, Ensiong Chng



[NeurIPS 2023](https://openreview.net/forum?id=cAjZ3tMye6)









[Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition](/publication/2023-12_whispering-llama-cross-modal-generative-error-correction-framework-speech)

Srijith Radhakrishnan, [Huck Yang](/person/huck-yang), Sumeer Khan, Rohit Kumar, Narsis Kiani, David Gomez-Cabrero, Jesper Tegnér



[EMNLP](https://aclanthology.org/2023.emnlp-main.618/)