Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) based on a few examples presented in dialogue history without any model parameter update. Despite such convenience, the performance of ICL heavily depends on the quality of the in-context examples presented, which makes the in-context example selection approach a critical choice. This paper proposes a novel Bayesian in-Context example Selection method (ByCS) for ICL.

From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment

Large language models (LLMs) have enhanced the capacity of vision-language models to caption visual text. This generative approach to image caption enrichment further makes textual captions more descriptive, improving alignment with the visual context. However, while many studies focus on benefits of generative caption enrichment (GCE), are there any negative side effects? We compare standard-format captions and recent GCE processes from the perspectives of "gender bias" and "hallucination", showing that enriched captions suffer from increased gender bias and hallucination.

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e.g., Whisper, Canary).

Learning to Move Like Professional Counter-Strike Players

In multiplayer, first-person shooter games like Counter-Strike: Global Offensive (CS:GO), coordinated movement is a critical component of high-level strategic play. However, the complexity of team coordination and the variety of conditions present in popular game maps make it impractical to author hand-crafted movement policies for every scenario. We show that it is possible to take a data-driven approach to creating human-like movement controllers for CS:GO.

Guan-Ting (Danny) Liu

Guan-Ting (Danny) Liu completed his Ph.D. in the Graduate Institute of Networking and Multimedia at National Taiwan University in Taipei, Taiwan. During his Ph.D. program, he is advised by Pu-Jen ChengIris Hui-Ru Jiang, and Shao-Hua Sun.

Prithvijit Chattopadhyay

I am a Research Scientist in Deep Imagination Research. I earned my Ph.D. in Computer Science in August 2024 at Georgia Tech, where I was advised by Prof. Judy Hoffman. During my Ph.D., I broadly worked on distribution shift problems in computer vision. My doctoral thesis (see here) was focused on utilizing synthetic data to train robust and reliable vision models.

Drew Zagieboylo

Drew joined NVIDIA's Security and Privacy research team in the summer of 2024 after having received his PhD in Computer Science from Cornell University in 2023. His research focuses on applying programming language design, tools, and techniques to security problems across the hardware—software stack. In particular, he is interested in enabling engineers and designers to build resilient systems whose confidentiality and integrity can be trusted and measured.