On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach

We revisit the problem of visual depth estimation in the context of autonomous vehicles. Despite the progress on monocular depth estimation in recent years, we show that the gap between monocular and stereo depth accuracy remains large---a particularly relevant result due to the prevalent reliance upon monocular cameras by vehicles that are expected to be self-driving. We argue that the challenges of removing this gap are significant, owing to fundamental limitations of monocular vision. As a result, we focus our efforts on depth estimation by stereo.

Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation

We present a new dataset, called Falling Things (FAT), for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics.

SPLATNet: Sparse Lattice Networks for Point Cloud Processing

We present a network architecture for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a high-dimensional lattice. Naively applying convolutions on this lattice scales poorly, both in terms of memory and computational cost, as the size of the lattice increases. Instead, our network uses sparse bilateral convolutional layers as building blocks.

Combining Analytic Direct Illumination and Stochastic Shadows

In this paper, we propose a ratio estimator of the direct-illumination equation that allows us to combine analytic illumination techniques with stochastic raytraced shadows while maintaining correctness. Our main contribution is to show that the shadowed illumination can be split into the product of the unshadowed illumination and the illumination-weighted shadow.

Modeling Soft Error Propagation in Programs

As technology scales to lower feature sizes, devices become more susceptible to soft errors. Soft errors can lead to silent data corruptions (SDCs), seriously compromising the reliability of a system. Traditional hardware-only techniques to avoid SDCs are energy hungry, and hence not suitable for commodity systems. Researchers have proposed selective software-based protection techniques to tolerate hardware faults at lower costs.

Riemannian Motion Policies

A new mathematical framework called Riemannian Motion Policies (RMPs) shapes a robot’s behavior. We derive optimal and practical tools for intuitively constructing policies, demonstrate the framework’s flexibility for distributed computation, use it to unify many previously distinct motion generation techniques, and demonstrate its performance on three dual arm manipulation platforms in both simulation and reality.

A Variable Shape and Variable Stiffness Controller for Haptic Virtual Interactions

This paper presents an entirely compliant controller handle for use in virtual and augmented reality environments. The controller handle transitions between two static states: a semi-rigid, large diameter state when pneumatically pressurized and a soft, compressible, smaller diameter state when depressurized. We integrated the controller with a modified version of NVIDIA’s VR Funhouse employing the two controller states to simulate the physical feel of two virtual objects.

2002 Grad Fellows

Congratulations to the recipients of the 2002 International Graduate Fellowship Award. Thank you to all of you who applied and to the professors that nominated you. We truly appreciate your interest in NVIDIA and our Graduate Fellowship Program.

It was extremely difficult to select the 2002 Graduate Fellowship recipients this year. All of the research projects were very exciting to us, and we understand the dedication and commitment required to pursue new ideas on the cutting edge of research.

Reblur2Deblur: Deblurring Videos via Self-Supervised Learning

Motion blur is a fundamental problem in computer vision as it impacts image quality and hinders inference. Traditional deblurring algorithms leverage the physics of the image formation model and use hand-crafted priors: they usually produce results that better reflect the underlying scene, but present artifacts. Recent learning-based methods implicitly extract the distribution of natural images directly from the data and use it to synthesize plausible images. Their results are impressive, but they are not always faithful to the content of the latent image.

Geometry-Aware Learning of Maps for Camera Localization

Maps are a key component in image-based camera localization and visual SLAM systems: they are used to establish geometric constraints between images, correct drift in relative pose estimation, and relocalize cameras after lost tracking. The exact definitions of maps, however, are often application-specific and hand-crafted for different scenarios (e.g., 3D landmarks, lines, planes, bags of visual words). We propose to represent maps as a deep neural net called MapNet, which enables learning a data-driven map representation.