Home
News
Members
Publications
NVIDIA Research
Light
Dark
Automatic
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Yukang Chen
,
Fuzhao Xue
,
Dacheng Li
,
Qinghao Hu
,
Ligeng Zhu
,
Xiuyu Li
,
Yunhao Fang
,
Haotian Tang
,
Shang Yang
,
Zhijian Liu
,
Yihui He
,
Hongxu (Danny) Yin
,
Pavlo Molchanov
,
Jan Kautz
,
Linxi Fan
,
Yuke Zhu
,
Yao Lu
,
Song Han
April 2025
Cite
arXiv
Type
Conference paper
Publication
International Conference on Learning Representations (ICLR)
Hongxu (Danny) Yin
Pavlo Molchanov
Jan Kautz
Team Leader
Yuke Zhu
Related
NVILA: Efficient Frontier Visual Language Models
Scaling RL to Long Videos
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
3D Aware Region Prompted Vision Language Model
VILA-U: Efficient and Unified Visual Language Understanding and Generation
Cite
×