Publications

PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Timing as an Action: Learning When to Observe and Act

Mobile AR Depth Estimation: Challenges \& Prospects

Physics-informed neural operators with exact differentiation on arbitrary geometries

Generalizable One-shot Neural Head Avatar

Vicinity Vision Transformer

SMRD: SURE-based Robust MRI Reconstruction with Diffusion Models

PhysDiff: Physics-Guided Human Motion Diffusion Model

Differentially Private Diffusion Models

Vision Transformers Are Good Mask Auto-Labelers

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

Heterogeneous Continual Learning

Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids

Pseudoinverse-Guided Diffusion Models for Inverse Problems

Online Consistent Video Depth using Continuous Geometric Representations

Dual Diffusion Implicit Bridges for Image-to-Image Translation

LISA: Learning Interpretable Skill Abstractions from Language

GENIE: Higher-Order Denoising Diffusion Solvers

Embodied Scene-aware Human Pose Estimation

Denoising Diffusion Restoration Models

Concrete Score Matching: Generalized Score Matching for Discrete Data

When to Prune? A Policy towards Early Structural Pruning

Sound-Guided Semantic Image Manipulation

GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras

Score-based Generative Modeling in Latent Space

GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds

Weakly-Supervised Physically Unconstrained Gaze Estimation

Self-Supervised Learning on 3D Point Clouds by Learning Discrete Generative Models

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

Learning to Track Instances without Video Annotations

Dual Contrastive Loss and Attention for GANs

Binary TTC: A Temporal Geofence for Autonomous Navigation

NVAE: A Deep Hierarchical Variational Autoencoder

Neural FFTs for Universal Texture Image Synthesis

Convolutional Tensor-Train LSTM for Spatio-temporal Learning

Accelerating reinforcement learning through GPU atari emulation

World-Consistent Video-to-Video Synthesis

Undirected Graphical Models as Approximate Posteriors

Angular Visual Hardness

Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild

UNAS: Differentiable Architecture Search Meets Reinforcement Learning

Two-shot Spatially-varying BRDF and Shape Estimation

Panoptic-Based Image Synthesis

Meshlet Priors for 3D Mesh Reconstruction

NRMVS: Non-Rigid Multi-View Stereo