Efficient AI
Efficient AI
News
Publications
Light
Dark
Automatic
VILA
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
VILA-U is a Unified foundation model that integrates Video, Image, Language understanding and generation. Traditional visual language …
Yecheng Wu
,
Zhuoyang Zhang
,
Junyu Chen
,
Haotian Tang
,
Dacheng Li
,
Yunhao Fang
,
Ligeng Zhu
,
Enze Xie
,
Hongxu Yin
,
Li Yi
,
Song Han
,
Yao (Jason) Lu
PDF
Cite
Code
Project
Model
Demo
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Long-context capability is critical for multi-modal foundation models, especially for long video understanding. We introduce LongVILA, …
Yukang Chen
,
Fuzhao Xue
,
Dacheng Li
,
Qinghao Hu
,
Ligeng Zhu
,
Xiuyu Li
,
Yunhao Fang
,
Haotian Tang
,
Shang Yang
,
Zhijian Liu
,
Ethan He
,
Hongxu Yin
,
Pavlo Molchanov
,
Jan Kautz
,
Linxi Fan
,
Yuke Zhu
,
Yao (Jason) Lu
,
Song Han
PDF
Cite
Code
Project
Model
Cite
×