Search

Home
Publications
NVIDIA Research

Light Dark Automatic

De-An Huang

NVIDIA

Interests

Video Understanding
Embodied AI

Latest

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
NVILA: Efficient Frontier Visual Language Models
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
LITA: Language Instructed Temporal-localization Assistant
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
I^2SB: Image-to-Image Schrödinger Bridge
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
MinVIS: A minimal video instance segmentation framework without video-based training
Test-time prompt tuning for zero-shot generalization in vision-language models

Privacy Policy — Your Privacy Choices — Terms of Service — Accessibility — Corporate Policies — Contact
Published with Wowchemy — the free, open source website builder that empowers creators.

Cite