| Research

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Long-context capability is critical for multi-modal foundation models, especially for long video understanding. We introduce LongVILA, a full-stack solution for long-context visual-language models by co-designing the algorithm and system. For model training, we upgrade existing VLMs to support long video understanding by incorporating two additional stages, i.e., long context extension and long video supervised fine-tuning. However, training on long video is computationally and memory intensive.

Read more about LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Yujun Lin

Yujun Lin is a research scientist at NVIDIA. He finished his PhD at MIT, advised by Prof. Song Han. His research area is efficient deep learning, with a special focus on the co-design of algorithm, system and hardware for foundation models (diffusion models, LLMs, etc). His work has been featured as oral and spotlight presentations at conferences such as ICLR, NeurIPS, Micro, HPCA and MLSys.

Read more about Yujun Lin

Yonggan Fu

Yonggan Fu obtained his PhD from Georgia Institute of Technology in May 2025. Prior to that, he received his Bachelor's degree with a dual major in Applied Physics and Computer Science from the School of The Gifted Young at the University of Science and Technology of China in 2019. He is a recipient of IBM PhD Fellowship and was selected as Machine Learning and Systems Rising Stars 2023.

Read more about Yonggan Fu

Yukang Chen

Hello! I am a Research Scientist in NVIDIA Research, working with Prof. Song Han. I got my Ph.D degree in CUHK. My research projects focuses on LongAI, that is "Boost AI's Long ability while staying Efficient". My representative works include VoxelNeXt, LongLoRA, and LongVILA.

Read more about Yukang Chen

Baptiste Nicolet

Baptiste Nicolet is a Senior Research Scientist in the Graphics, Communications, and Machine Learning team at NVIDIA Research. He obtained a Ph.D. in Computer Science from EPFL in 2025, focusing on inverse light transport simulation. In his free time, Baptiste likes to climb, swim, and tinker with 3D printers.

Read more about Baptiste Nicolet

Zhengyi Luo

Read more about Zhengyi Luo

Yuyang Zhao

Dr. Yuyang Zhao is a Research Scientist at NVIDIA Research, working with Prof. Song Han. He obtained his Ph.D. degree from National University of Singapore, advised by A/P Gim Hee Lee.

His research interests mainly lie in image, video and 3D generation.

Read more about Yuyang Zhao

Fengyuan Hu

Read more about Fengyuan Hu

Yuqi Xie

Read more about Yuqi Xie

Jindong Jiang

Jindong is a research scientist in the Learning and Perception Research (LPR) team of NVIDIA Research. Prior to joining NVIDIA, Jindong was a PhD student at Rutgers University under the supervision of Prof. Sungjin Ahn. His research interests lie at the intersection of representation learning and visual reasoning, with a strong interests in developing novel architectures that can improve agent's visual reasoning capabilities.

Read more about Jindong Jiang

Subscribe to