Neural Temporal Adaptive Sampling and Denoising

Despite recent advances in Monte Carlo path tracing at interactive rates, denoised image sequences generated with few samples per-pixel often yield temporally unstable results and loss of high-frequency details. We present a novel adaptive rendering method that increases temporal stability and image fidelity of low sample count path tracing by distributing samples via spatio-temporal joint optimization of sampling and denoising.

Self-Supervised Viewpoint Learning From Image Collections

Training deep neural networks to estimate the viewpoint of objects requires large labeled training datasets. However, manually labeling viewpoints is notoriously hard, error-prone, and time-consuming. On the other hand, it is relatively easy to mine many unlabeled images of an object category from the internet, e.g., of cars or faces. We seek to answer the research question of whether such unlabeled collections of in-the-wild images can be successfully utilized to train viewpoint estimation networks for general object categories purely via self-supervision.

Tucker Hermans

Tucker Hermans is a senior research scientist at NVIDIA working on robotic manipulation focusing on the interaction of multi-sensory perception, learning, and control. Tucker is also an assosciate professor in the School of Computing at the University of Utah where he is a member of the Utah Robotics Center. Tucker earned his Ph.D. in Robotics from the Georgia Institute of Technology.

Convolutional Tensor-Train LSTM for Spatio-Temporal Learning

Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation. However, existing methods still perform poorly on challenging video tasks suchas long-term forecasting. The gap partially is because these kinds of challenging tasks require learning long-term spatio-temporal correlations in the video sequence. We propose a higher-order convolutional LSTM model that can efficiently learn these correlations with a succinct representation of the history.

Toward Sim-to-Real Directional Semantic Grasping

We address the problem of directional semantic grasping, that is, grasping a specific object from a specific direction. We approach the problem using deep reinforcement learning via a double deep Q-network (DDQN) that learns to map downsampled RGB input images from a wrist-mounted camera to Q-values, which are then translated into Cartesian robot control commands via the cross-entropy method (CEM). The network is learned entirely on simulated data generated by a custom robot simulator that models both physical reality (contacts) and perceptual quality (high-quality rendering).

Camera-to-Robot Pose Estimation from a Single Image

We present an approach for estimating the pose of an external camera with respect to a robot using a single RGB image of the robot. The image is processed by a deep neural network to detect 2D projections of keypoints (such as joints) associated with the robot. The network is trained entirely on simulated data using domain randomization to bridge the reality gap. Perspective-n-point (PnP) is then used to recover the camera extrinsics, assuming that the camera intrinsics and joint configuration of the robot manipulator are known.

Jean Kossaifi

Jean Kossaifi leads research at NVIDIA in the field of AI for Scientific Simulation, where he advances new algorithmic paradigms to solve complex physics-based problems. His core research focuses on fundamental algorithms, including combining tensor methods with deep learning, to develop efficient and powerful neural architectures.

Wen-mei Hwu

Wen-mei Hwu joined NVIDIA in February 2020 as Senior Distinguished Research Scientist, after spending 32 years at the University of Illinois at Urbana-Champaign, where he was a Professor, Sanders-AMD Endowed Chair, Acting Department Head and Chief Scientist of the Parallel Computing Institute. Hwu and his Illinois team developed the superblock compiler scheduling and optimization framework that has been adopted by virtually all modern vendor and open-source compilers today.  In 2008, Hwu became the director of NVIDIA's first CUDA Center of Excellence.