Real-Time Path Tracing and Beyond

In this keynote presented at High Performance Graphics 2022, Petrik Clarberg shares an update on real-time path tracing and the next steps for real-time graphics research.

Abstract

Routability-Aware Placement for Advanced FinFET Analog Circuits with Satisfiability Modulo Theories

Due to the increasingly complex design rules and geometric layout constraints within advanced FinFET nodes, automated placement of full-custom analog/mixed-signal (AMS) designs has become increasingly challenging. Compared with traditional planar nodes, AMS circuit layout is dramatically different for FinFET technologies due to strict design rules and grid-based restrictions for both placement and routing. This limits previous analog placement approaches in effectively handling all of the new constraints while adhering to the new layout style.

AutoCRAFT: Layout Automation for Custom Circuits in Advanced FinFET Technologies

Despite continuous efforts in layout automation for full-custom circuits, including analog/mixed-signal (AMS) designs, automated layout tools have not yet been widely adopted in current industrial full-custom design flows due to the high circuit complexity and sensitivity to layout parasitics. Nevertheless, the strict design rules and grid-based restrictions in nanometer-scale FinFET nodes limit the degree of freedom in full-custom layout design and thus reduce the gap between automation tools and human experts.

LANA: Latency Aware Network Acceleration

We introduce latency-aware network acceleration (LANA) - an approach that builds on neural architecture search techniques and teacher-student distillation to accelerate neural networks. LANA consists of two phases: in the first phase, it trains many alternative operations for every layer of the teacher network using layer-wise feature map distillation. In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization (ILP) approach.

Diffusion Models for Adversarial Purification

Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods.

Learning Perceptual Concepts by Bootstrapping from Human Queries

Robots need to be able to learn concepts from their users in order to adapt their capabilities to each user’s unique task. But when the robot operates on high-dimensional inputs, like images or point clouds, this is impractical: the robot needs an unrealistic amount of human effort to learn the new concept. To address this challenge, we propose a new approach whereby the robot learns a low-dimensional variant of the concept and uses it to generate a larger data set for learning the concept in the high-dimensional space.

Chia-Tung (Mark) Ho

Chia-Tung Ho received the B.S. and M.S. degrees in electrical engineering and computer science from National Chiao Tung University, Hsinchu, Taiwan, in 2011 and 2013, respectively, and the Ph.D. degree in electrical and computer engineering from the University of California San Diego, USA, in 2022. Chia-Tung has several years of industrial EDA experience under his belt. Before coming to US, he worked for IDM and EDA companies in Taiwan, developing in-house design for manufacturing (DFM) flow at Macronix, and fastSPICE at Mentor Graphics and Synopsys.

Fair and Comprehensive Benchmarking of Machine Learning Processing Chips

With the rise of custom silicon chips for AI acceleration, fair and comprehensive benchmarking of hardware innovations has become increasingly important. While benchmarking at the application- and system-level provides the most complete picture of tradeoffs across multiple design dimensions, this can hide the impact of innovations at lower levels. Moreover, system-level benchmarking is not always feasible, especially for academic and industrial research chips.

A 17–95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm

We present a deep neural network (DNN) accelerator designed for efficient execution of transformer-based DNNs, which have become ubiquitous for natural language processing tasks. DNN inference accelerators often employ specialized hardware techniques such as reduced precision to improve energy efficiency, but many of these techniques result in catastrophic accuracy loss on transformers. The proposed accelerator supports per-vector scaled quantization and approximate softmax to enable the use of 4-bit arithmetic with little accuracy loss.

A bidirectional formulation for Walk on Spheres

Numerically solving partial differential equations (PDEs) is central to many applications in computer graphics and scientific modeling. Conventional methods for solving PDEs often need to discretize the space first, making them less efficient for complex geometry. Unlike conventional methods, the walk on spheres (WoS) algorithm recently introduced to graphics is a grid-free Monte Carlo method that can provide numerical solutions of Poisson equations without discretizing space.