Po-An Tsai

Po-An Tsai joined NVIDIA Research in July 2019. His research focuses on computer system and architecture. Specifically, he works on memory hierarchy design, resource management in parallel systems, and software/hardware co-optimization. Po-An received his S.M. and Ph.D. in EECS from MIT, advised by Professor Daniel Sanchez.

Matching Prescription & Visual Acuity: Towards AR for Humans

Inspired by human visual perception, we demonstrate two novel wearable augmented reality displays. The first "Prescription AR" integrates prescription correction in a 5mm-thick image combiner. The static prototype is 50g and eyeglasses form factor. The second "Foveated AR" adapts to user gaze by adjusting the resolution and focal depth.

Foveated AR: Dynamically-Foveated Augmented Reality Display

We present a near-eye augmented reality display with resolution and focal depth dynamically driven by gaze tracking. The display combines a traveling microdisplay relayed off a concave half-mirror magnifier for the high-resolution foveal region, with a wide field-of-view peripheral display using a projector-based Maxwellian-view display whose nodal point is translated to follow the viewer’s pupil during eye movements using a traveling holographic optical element.

Dynamic Many-Light Sampling for Real-Time Ray Tracing

Monte Carlo ray tracing offers the capability of rendering scenes with large numbers of area light sources---lights can be sampled stochastically and shadowing can be accounted for by tracing rays, rather than using shadow maps or other rasterization-based techniques that do not scale to many lights or work well with area lights. Current GPUs only afford the capability of tracing a few rays per pixel at real-time frame rates, making it necessary to focus sampling on important light sources.

Pixel-Adaptive Convolutional Neural Networks

We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially-varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling.

A 1.17-pJ/b, 25-Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication Using a Process- and Temperature-Adaptive Voltage Regulator

This paper describes a short-reach serial link to connect chips mounted on the same package or on neighboring packages on a printed circuit board (PCB). The link employs an energy-efficient, single-ended ground-referenced signaling scheme. Implemented in 16-nm FinFET CMOS technology, the link operates at a data rate of 25 Gb/s/pin with 1.17-pJ/bit energy efficiency and uses a simple but robust matched-delay clock forwarding scheme that cancels most sources of jitter.

A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm

This work presents a scalable deep neural network (DNN) accelerator consisting of 36 chips connected in a mesh network on a multi-chip-module (MCM) using ground-referenced signaling (GRS). While previous accelerators fabricated on a single monolithic die are limited to specific network sizes, the proposed architecture enables flexible scaling for efficient inference on a wide range of DNNs, from mobile to data center domains.

Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference

Analog/mixed-signal (AMS) computation can be more energy efficient than digital approaches for deep learning inference, but incurs an accuracy penalty from precision loss. Prior AMS approaches focus on small networks/datasets, which can maintain accuracy even with 2b precision. We analyze applicability of AMS approaches to larger networks by proposing a generic AMS error model, implementing it in an existing training framework, and investigating its effect on ImageNet classification with ResNet-50.

PRIMAL: Power Inference using Machine Learning

This paper introduces PRIMAL, a novel learning-based framework that enables fast and accurate power estimation for ASIC designs. PRIMAL trains machine learning (ML) models with design verification testbenches for characterizing the power of reusable circuit building blocks. The trained models can then be used to generate detailed power profiles of the same blocks under different workloads. We evaluate the performance of several established ML models on this task, including ridge regression, gradient tree boosting, multi-layer perceptron, and convolutional neural network (CNN).

High Performance Graph Convolutional Networks with Applications in Testability Analysis

Applications of deep learning to electronic design automation (EDA) have recently begun to emerge, although they have mainly been limited to processing of regular structured data such as images. However, many EDA problems require processing irregular structures, and it can be non-trivial to manually extract important features in such cases. In this paper, a high performance graph convolutional network (GCN) model is proposed for the purpose of processing irregular graph representations of logic circuits. A GCN classifier is firstly trained to predict observation point candidates in a netlist.