Home
News
Members
Publications
NVIDIA Research
Light
Dark
Automatic
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Yukang Chen
,
Fuzhao Xue
,
Dacheng Li
,
Qinghao Hu
,
Ligeng Zhu
,
Xiuyu Li
,
Yunhao Fang
,
Haotian Tang
,
Shang Yang
,
Zhijian Liu
,
Yihui He
,
Hongxu (Danny) Yin
,
Pavlo Molchanov
,
Jan Kautz
,
Linxi Fan
,
Yuke Zhu
,
Yao Lu
,
Song Han
April 2025
Cite
arXiv
Type
Conference paper
Publication
International Conference on Learning Representations (ICLR)
Hongxu (Danny) Yin
Pavlo Molchanov
Jan Kautz
Team Leader
Yuke Zhu
Related
NVILA: Efficient Frontier Visual Language Models
VILA-U: Efficient and Unified Visual Language Understanding and Generation
Scaling Vision Pre-Training to 4K Resolution
VILA: On pretraining for vision language models
RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models
Cite
×