Home
Publications
NVIDIA Research
Light
Dark
Automatic
Grounded 3D-Aware Spatial Vision-Language Modeling
An-Chieh Cheng
,
Yang Fu
,
Yatai Ji
,
Ligeng Zhu
,
Guanqi Zhan
,
Zhuoyang Zhang
,
Zhaojing Yang
,
Song Han
,
Yao Lu
,
Pavlo Molchanov
,
Vidya Nariyambut Murali
,
Jan Kautz
,
Xiaolong Wang
,
Hongxu (Danny) Yin
,
Sifei Liu
June 2026
Cite
arXiv
Type
Conference paper
Publication
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Highlight
Pavlo Molchanov
Jan Kautz
Team Leader
Xiaolong Wang
Hongxu (Danny) Yin
Sifei Liu
Related
3D Aware Region Prompted Vision Language Model
NVILA: Efficient Frontier Visual Language Models
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Scaling RL to Long Videos
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Cite
×