Home
News
Members
Publications
NVIDIA Research
Light
Dark
Automatic
De-An Huang
NVIDIA
Interests
Video Understanding
Embodied AI
Latest
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
NVILA: Efficient Frontier Visual Language Models
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
LITA: Language Instructed Temporal-localization Assistant
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
I^2SB: Image-to-Image Schrödinger Bridge
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
MinVIS: A minimal video instance segmentation framework without video-based training
Test-time prompt tuning for zero-shot generalization in vision-language models
Cite
×