Home
News
Members
Publications
NVIDIA Research
Light
Dark
Automatic
Wolf: Dense Video Captioning with a World Summarization Framework
Boyi Li
,
Ligeng Zhu
,
Ran Tian
,
Shuhan Tan
,
Yuxiao Chen
,
Yao Lu
,
Yin Cui
,
Sushant Veer
,
Max Ehrlich
,
Jonah Philion
,
Xinshuo Weng
,
Fuzhao Xue
,
Linxi Fan
,
Yuke Zhu
,
Jan Kautz
,
Andrew Tao
,
Ming-Yu Liu
,
Sanja Fidler
,
Boris Ivanovic
,
Trevor Darrell
,
Jitendra Malik
,
Song Han
,
Marco Pavone
September 2025
Cite
arXiv
Type
Journal article
Publication
Transactions on Machine Learning Research
Yuke Zhu
Jan Kautz
Team Leader
Related
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Scaling Vision Pre-Training to 4K Resolution
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Scaling RL to Long Videos
VILA: On pretraining for vision language models
Cite
×