GPU-Accelerated Atari Emulation for Reinforcement Learning

We designed and implemented a CUDA port of the Atari Learning Environment (ALE), a system for developing and evaluating deep reinforcement algorithms using Atari games. Our CUDA Learning Environment (CuLE) overcomes many limitations of existing CPU-based Atari emulators and scales naturally to multi-GPU systems. It leverages the parallelization capability of GPUs to run thousands of Atari games simultaneously; by rendering frames directly on the GPU, CuLE avoids the bottleneck arising from the limited CPU-GPU communication bandwidth.

Video Stitching for Linear Camera Arrays

Despite the long history of image and video stitching research, existing academic and commercial solutions still produce strong artifacts. In this work, we propose a wide-baseline video stitching algorithm for linear camera arrays that is temporally stable and tolerant to strong parallax. Our key insight is that stitching can be cast as a problem of learning a smooth spatial interpolation between the input videos.

Neural Inverse Rendering of an Indoor Scene from a Single Image

Inverse rendering aims to estimate physical attributes of a scene, e.g., reflectance, geometry, and lighting, from image(s). Inverse rendering has been studied primarily for single objects or with methods that solve for only one of the scene attributes. We propose the first learning based approach that jointly estimates albedo, normals, and lighting of an indoor scene from a single image.

Few-Shot Viewpoint Estimation

Viewpoint estimation for known categories of objects has been improved significantly thanks to deep networks and large datasets, but generalization to unknown categories is still very challenging. With an aim towards improving performance on unknown categories, we introduce the problem of category-level few-shot viewpoint estimation. We design a novel framework to successfully train viewpoint networks for new categories with few examples (10 or less).

Po-An Tsai

Po-An Tsai joined NVIDIA Research in July 2019. His research focuses on computer system and architecture. Specifically, he works on memory hierarchy design, resource management in parallel systems, and software/hardware co-optimization. Po-An received his S.M. and Ph.D. in EECS from MIT, advised by Professor Daniel Sanchez.

Matching Prescription & Visual Acuity: Towards AR for Humans

Inspired by human visual perception, we demonstrate two novel wearable augmented reality displays. The first "Prescription AR" integrates prescription correction in a 5mm-thick image combiner. The static prototype is 50g and eyeglasses form factor. The second "Foveated AR" adapts to user gaze by adjusting the resolution and focal depth.

Foveated AR: Dynamically-Foveated Augmented Reality Display

We present a near-eye augmented reality display with resolution and focal depth dynamically driven by gaze tracking. The display combines a traveling microdisplay relayed off a concave half-mirror magnifier for the high-resolution foveal region, with a wide field-of-view peripheral display using a projector-based Maxwellian-view display whose nodal point is translated to follow the viewer’s pupil during eye movements using a traveling holographic optical element.

Dynamic Many-Light Sampling for Real-Time Ray Tracing

Monte Carlo ray tracing offers the capability of rendering scenes with large numbers of area light sources---lights can be sampled stochastically and shadowing can be accounted for by tracing rays, rather than using shadow maps or other rasterization-based techniques that do not scale to many lights or work well with area lights. Current GPUs only afford the capability of tracing a few rays per pixel at real-time frame rates, making it necessary to focus sampling on important light sources.

Pixel-Adaptive Convolutional Neural Networks

We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially-varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling.