NVIDIA Research Taiwan
NVIDIA Research Taiwan
Home
News
Members
Research
Publications
Contact
Light
Dark
Automatic
Video Understanding
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
We present Omni-RGPT, a multimodal large language model designed to facilitate region-level comprehension for both images and videos. To achieve consistent region representation across spatio-temporal dimensions, we introduce Token Mark, a set of …
Cite
×