Huck Yang

I am a Sr. Research Scientist at NV Research.

I obtained my Ph.D. and M.Sc. from Georgia Institute of Technology, USA with Wallace H. Coulter fellowship and my B.Sc. from National Taiwan University.

My primary research lies in the area of Multilingual Model Alignments and Speech-Language Modeling. Specifically:

Speech-Language Alignment: I study new cross-modal alignment algorithms (task-activating prompting, whispering-LLaMA, LLM-ASR) for adapting large language model (LLM) for noise-robust speech processing, audio captioning, and generative error correction.
Post-Training Methods: I explore new in-context learning, prompt-tuning, adapter, neural structured state space models, and theoretical justifications (Voice2Series) of parameter-efficient learning to improve the current class of large-scale acoustic model adaptation (TIH) and general time series understanding.
Multilingual and Robust Evaluation: My earlier works include developing multilingual privacy-preserving and intervention-resilient algorithms (Causal-Inference Q-Network) and benchmarks (HyPoradise) for audio and general deep reinforcement learning that comply with data protection regulations, aimed at human-oriented interaction with conversational signals.

I have served as the special session chair for ICASSP 2024, focusing on In-Context Learning for Speech and Spoken Language Processing, and for ICASSP 2022 on Quantum Machine Learning. I received a Best Student Paper Award Nomination at Interspeech 2023, an Outstanding Reviewer Award from NeurIPS 2021, and the 1st Prize Award from Xanadu Quantum ML Research Global Competition in 2019.

As a member of the Senior Technical Committee in Applied Signal Processing Systems of IEEE SPS, I am also interested in the open-source development of variational quantum circuit learning on Quantum CUDA and alignment topics in language model at NV Research.

Previously, I was a research intern hosted by Google Bard and DeepMind, Amazon Alexa, and Hitachi Central Lab during my Ph.D journey. I worked full-time at Amazon's AGI organization for one year before joining Nvidia, and interned at TSMC for mixed-signal IC design before starting my Ph.D.

Open to collaborations with forthright and highly motivated researchers and working on open-source projects.

Selected Tutorials:

"Spoken Conversational Agents with Large Language Models," to be hosted at EMNLP 2025 Tutorial
"Large-Scale and Parameter-Efficient Language Modeling for Speech Processing," ASRU 2023 Tutorial
"Resource-Efficient and Cross-Modal Learning Toward Foundation Models," Interspeech 2023 Tutorial
"Adversarial Robustness, Reprogramming and Prompting for Speech and Language Processing," ICASSP 2022 Tutorial
"Quantum Neural Networks for Speech and Language Processing," IJCAI 2021 Tutorial

Recent Invited Talks:

"Speech Language Alignments in Large-Scale Pre-Trained Models," CMU LTI, PA, 2024
"Data Privacy and Evaluation Challenges of Large Language Model Based Speech Recognition", ISCA SIG-SPSC, 2024
"Characterizing Large LMs for Generative Speech Recognition Error Correction," MIT CSAIL, MA, USA, 2023
"Trainable Input Perturbation as Frozen Pre-trained Model Adaptation," Mila, Montreal, Canada, 2022

Research Area(s)

Main Field of Interest

Natural Language Processing

Google Scholar

https://scholar.google.com/citations?user=TT3XJW8AAAAJ

Publications

2025

VoiceNoNG: Robust High-Quality Speech Editing Model without Hallucinations

Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Pin-Jui Ku, Ante Jukić, Huck Yang, Yu Tsao, Frank Wang, Hung-yi Lee, Szu-Wei Fu

Interspeech 2025

Fugatto 1 - Foundational Generative Audio Transformer Opus 1

Rafael Valle, Rohan Badlani, Zhifeng Kong, Sang-gil Lee, Arushi Goel, Sungwon Kim, Joao Felipe Santos, Shuqi Dai, Siddharth Gururani, Aya AIJa'fari, Alex Liu, Kevin Shih, Wei Ping, Huck Yang, Bryan Catanzaro

ICLR 2025

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

Chen Chen, Yuchen Hu, Siyin Wang, Helin Wang, Zhehuai Chen, Chao Zhang, Huck Yang, EngSiong Chng

ICLR 2025

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation

Alexander H. Liu, Sang-gil Lee, Huck Yang, Yuan Gong, Frank Wang, James R. Glas, Rafael Valle

ICLR 2025

Towards Neural Scaling Laws for Time Series Foundation Models

Qingren Yao, Huck Yang, Renhe Jiang, Ming Jin, Shirui Pan

ICLR 2025

2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, yen-ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Shinji Watanabe, Andreas Stolcke

SLT 2024

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Yuchen Hu, Chen Chen, Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang

NeurIPS

Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits

Sung-Feng Huang, Heng-Cheng Kuo, Zhehuai Chen, Xuesong Yang, Huck Yang, Yu Tsao, Frank Wang, Hung-yi Lee, Szu-Wei Fu

IEEE SLT 2024

From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment

Yusuke Hirota, Ryo Hachiuma, Huck Yang, Yuta Nakashima

EMNLP

FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model

Yichen Lu, Jiaqi Song, Huck Yang, Shinji Watanabe