I am a Sr. Research Scientist at NV Research, mainly based in Taipei and occasionally in Sunnyvale, and I work closely with the NV Research US, Applied Research teams, and University Collaboration in the US and Asian Pacific.
I obtained my Ph.D. and M.Sc. from Georgia Institute of Technology, USA with Wallace H. Coulter fellowship and my B.Sc. from National Taiwan University. My dissertation is on "A Perturbation Approach to Differential Privacy for Deep Learning based Speech Processing."
My primary research lies in the area of Model Alignments and Speech-Language Modeling. Specifically:
- Speech-Language Model Alignment: I study new cross-modal alignment algorithms (task-activating prompting, whispering-LLaMA, LLM-ASR) for adapting large language model (LLM) for noise-robust speech processing, audio captioning, and generative error correction.
- Parameter-Efficient Acoustic Modeling: I explore new in-context learning, prompt-tuning, adapter, neural structured state space models, and theoretical justifications (Voice2Series) of parameter-efficient learning to improve the current class of large-scale acoustic model adaptation (TIH) and general time series understanding.
- Data Privacy and Robust Evaluation: My earlier works include developing privacy-preserving and intervention-resilient algorithms (Causal-Inference Q-Network) and benchmarks (HyPoradise) for audio and general deep reinforcement learning that comply with data protection regulations, aimed at human-oriented interaction with conversational signals.
I have served as the special session chair for ICASSP 2024, focusing on In-Context Learning for Speech and Spoken Language Processing, and for ICASSP 2022 on Quantum Machine Learning. I received a Best Student Paper Award Nomination at Interspeech 2023, an Outstanding Reviewer Award from NeurIPS 2021, and the 1st Prize Award from Xanadu Quantum ML Research Global Competition in 2019.
As a member of the Senior Technical Committee in Applied Signal Processing Systems of IEEE SPS, I am also interested in the open-source development of variational quantum circuit learning on Quantum CUDA and alignment topics in language model at NV Research.
Previously, I was a research intern hosted by Google Bard and DeepMind, Amazon Alexa, and Hitachi Central Lab during my Ph.D journey. I worked full-time at Amazon's AGI organization for one year before joining Nvidia, and interned at TSMC for mixed-signal IC design before starting my Ph.D.
Open to collaborations with forthright and highly motivated researchers and working on open-source projects.
- "Large-Scale and Parameter-Efficient Language Modeling for Speech Processing," ASRU 2023 Tutorial
- "Resource-Efficient and Cross-Modal Learning Toward Foundation Models," Interspeech 2023 Tutorial
- "Adversarial Robustness, Reprogramming and Prompting for Speech and Language Processing," ICASSP 2022 Tutorial
- "Quantum Neural Networks for Speech and Language Processing," IJCAI 2021 Tutorial
Recent Invited Talks:
- "Data Privacy and Evaluation Challenges of Large Language Model Based Speech Recognition", ISCA SIG-SPSC, 2024
- "Characterizing Large LMs for Generative Speech Recognition Error Correction," MIT CSAIL, MA, USA, 2023
- "Trainable Input Perturbation as Frozen Pre-trained Model Adaptation," Mila, Montreal, Canada, 2022