Home
News
Members
Publications
NVIDIA Research
Light
Dark
Automatic
Zhiding Yu
NVIDIA
Interests
Computer Vision
Visual Recognition
Representation Learning
Latest
End-to-end 3D Tracking with Decoupled Queries
FB-BEV: BEV Representation from Forward-Backward View Transformations
FocalFormer3D: Focusing on Hard Instance for 3D Object Detection
Fully Attentional Networks with Self-emerging Token Labeling
Real-Time Radiance Fields for Single-Image Portrait View Synthesis
FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation
Vision Transformers Are Good Mask Auto-Labelers
VoxFormer: Sparse voxel transformer for camera-based 3D semantic scene completion
Prismer: A Vision-Language Model with An Ensemble of Experts
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
MinVIS: A minimal video instance segmentation framework without video-based training
Test-time prompt tuning for zero-shot generalization in vision-language models
Partial Convolution for Padding, Inpainting, and Image Synthesis
Understanding The Robustness in Vision Transformers
CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs
FreeSOLO: Learning to Segment Objects without Annotations
Panoptic SegFormer: Delving deeper into panoptic segmentation with transformers
M$^2$BEV: Multi-camera joint 3D detection and segmentation with unified birds-eye view representation
Learning contrastive representation for semantic correspondence
Coupled Segmentation and Edge Learning Using Dynamic Graph Propagation
Coupled Segmentation and Edge Learning via Dynamic Graph Propagation
SegFormer: Simple and efficient design for semantic segmentation with transformers
Domain Stylization: A Fast Covariance Matching Framework towards Domain Adaptation
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision
Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification
UFO2: A Unified Framework towards Omni-supervised Object Detection
Angular Visual Hardness
Transposer: Universal Texture Synthesis Using Feature Maps as Transposed Convolution Filter
Instance-aware, Context-focused, and Memory-efficient Weakly-Supervised Object Detection
Cite
×