Video Stitching for Linear Camera Arrays

Despite the long history of image and video stitching research, existing academic and commercial solutions still produce strong artifacts. In this work, we propose a wide-baseline video stitching algorithm for linear camera arrays that is temporally stable and tolerant to strong parallax. Our key insight is that stitching can be cast as a problem of learning a smooth spatial interpolation between the input videos.

Neural Inverse Rendering of an Indoor Scene from a Single Image

Inverse rendering aims to estimate physical attributes of a scene, e.g., reflectance, geometry, and lighting, from image(s). Inverse rendering has been studied primarily for single objects or with methods that solve for only one of the scene attributes. We propose the first learning based approach that jointly estimates albedo, normals, and lighting of an indoor scene from a single image.

Few-Shot Viewpoint Estimation

Viewpoint estimation for known categories of objects has been improved significantly thanks to deep networks and large datasets, but generalization to unknown categories is still very challenging. With an aim towards improving performance on unknown categories, we introduce the problem of category-level few-shot viewpoint estimation. We design a novel framework to successfully train viewpoint networks for new categories with few examples (10 or less).

Po-An Tsai

Po-An Tsai joined NVIDIA Research in July 2019. His research focuses on computer system and architecture. Specifically, he works on memory hierarchy design, resource management in parallel systems, and software/hardware co-optimization. Po-An received his S.M. and Ph.D. in EECS from MIT, advised by Professor Daniel Sanchez.

Matching Prescription & Visual Acuity: Towards AR for Humans

Inspired by human visual perception, we demonstrate two novel wearable augmented reality displays. The first "Prescription AR" integrates prescription correction in a 5mm-thick image combiner. The static prototype is 50g and eyeglasses form factor. The second "Foveated AR" adapts to user gaze by adjusting the resolution and focal depth.

Foveated AR: Dynamically-Foveated Augmented Reality Display

We present a near-eye augmented reality display with resolution and focal depth dynamically driven by gaze tracking. The display combines a traveling microdisplay relayed off a concave half-mirror magnifier for the high-resolution foveal region, with a wide field-of-view peripheral display using a projector-based Maxwellian-view display whose nodal point is translated to follow the viewer’s pupil during eye movements using a traveling holographic optical element.

Dynamic Many-Light Sampling for Real-Time Ray Tracing

Monte Carlo ray tracing offers the capability of rendering scenes with large numbers of area light sources---lights can be sampled stochastically and shadowing can be accounted for by tracing rays, rather than using shadow maps or other rasterization-based techniques that do not scale to many lights or work well with area lights. Current GPUs only afford the capability of tracing a few rays per pixel at real-time frame rates, making it necessary to focus sampling on important light sources.

Pixel-Adaptive Convolutional Neural Networks

We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially-varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling.

A 1.17-pJ/b, 25-Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication Using a Process- and Temperature-Adaptive Voltage Regulator

This paper describes a short-reach serial link to connect chips mounted on the same package or on neighboring packages on a printed circuit board (PCB). The link employs an energy-efficient, single-ended ground-referenced signaling scheme. Implemented in 16-nm FinFET CMOS technology, the link operates at a data rate of 25 Gb/s/pin with 1.17-pJ/bit energy efficiency and uses a simple but robust matched-delay clock forwarding scheme that cancels most sources of jitter.

A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm

This work presents a scalable deep neural network (DNN) accelerator consisting of 36 chips connected in a mesh network on a multi-chip-module (MCM) using ground-referenced signaling (GRS). While previous accelerators fabricated on a single monolithic die are limited to specific network sizes, the proposed architecture enables flexible scaling for efficient inference on a wide range of DNNs, from mobile to data center domains.