Home
News
Members
Publications
NVIDIA Research
Light
Dark
Automatic
Sifei Liu
NVIDIA
Interests
Computer Vision
Self-Supervised Learning
Latest
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
NaVILA: Legged Robot Vision-Language-Action Model for Navigation
NVILA: Efficient Frontier Visual Language Models
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Parallel Sequence Modeling via Generalization Spatial Propagation Network (GSPN)
Recreating 1940s Tom and Jerry with Test-Time Training
Scaling Vision Pre-Training to 4K Resolution
BlobGEN-3D: Compositional 3D-Consistent Freeview Image Generation with 3D Blobs
CosAE: Learnable Fourier Series for Image Restoration
Cosine Autoencoder with Extremely Narrow Bottleneck for Image Restoration
SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models
AGG: Amortized Generative 3D Gaussians for Single Image to 3D
Compositional Text-to-Image Generation with Dense Blob Representations
COLMAP-Free 3D Gaussian Splatting
Dream-in-4D: A Unified Approach for Text- and Image-guided 4D Scene Generation
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
RegionGPT: Towards Region Understanding Vision Language Model
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
3D Reconstruction with Generalizable Neural Fields using Scene Priors
Generalizable One-shot Neural Head Avatar
Affordance Diffusion: Synthesizing Hand-Object Interactions
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Zero-shot Pose Transfer for Unrigged Stylized 3D Characters
Autoregressive 3D shape generation via canonical mapping
Scraping Textures from Natural Images for Synthesis and Editing
CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs
GroupViT: Semantic Segmentation Emerges From Text Supervision
Learning contrastive representation for semantic correspondence
Coupled Segmentation and Edge Learning Using Dynamic Graph Propagation
Coupled Segmentation and Edge Learning via Dynamic Graph Propagation
Hierarchical Contrastive Motion Learning for Video Action Recognition
Self-Supervised Object Detection via Generative Image Synthesis
Learning to Track Instances without Video Annotations
Learning continuous environment fields via implicit functions
Online adaptation for consistent mesh reconstruction in the wild
Self-supervised single-view 3D reconstruction via semantic consistency
Self-Supervised Viewpoint Learning from Image Collections
Cite
×