LANA: Latency Aware Network Acceleration

We introduce latency-aware network acceleration (LANA) - an approach that builds on neural architecture search techniques and teacher-student distillation to accelerate neural networks. LANA consists of two phases: in the first phase, it trains many alternative operations for every layer of the teacher network using layer-wise feature map distillation. In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization (ILP) approach.

Diffusion Models for Adversarial Purification

Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods.

Learning Perceptual Concepts by Bootstrapping from Human Queries

Robots need to be able to learn concepts from their users in order to adapt their capabilities to each user’s unique task. But when the robot operates on high-dimensional inputs, like images or point clouds, this is impractical: the robot needs an unrealistic amount of human effort to learn the new concept. To address this challenge, we propose a new approach whereby the robot learns a low-dimensional variant of the concept and uses it to generate a larger data set for learning the concept in the high-dimensional space.

Chia-Tung (Mark) Ho

Chia-Tung Ho received the B.S. and M.S. degrees in electrical engineering and computer science from National Chiao Tung University, Hsinchu, Taiwan, in 2011 and 2013, respectively, and the Ph.D. degree in electrical and computer engineering from the University of California San Diego, USA, in 2022. Chia-Tung has several years of industrial EDA experience under his belt. Before coming to US, he worked for IDM and EDA companies in Taiwan, developing in-house design for manufacturing (DFM) flow at Macronix, and fastSPICE at Mentor Graphics and Synopsys.

Fair and Comprehensive Benchmarking of Machine Learning Processing Chips

With the rise of custom silicon chips for AI acceleration, fair and comprehensive benchmarking of hardware innovations has become increasingly important. While benchmarking at the application- and system-level provides the most complete picture of tradeoffs across multiple design dimensions, this can hide the impact of innovations at lower levels. Moreover, system-level benchmarking is not always feasible, especially for academic and industrial research chips.

A 17–95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm

We present a deep neural network (DNN) accelerator designed for efficient execution of transformer-based DNNs, which have become ubiquitous for natural language processing tasks. DNN inference accelerators often employ specialized hardware techniques such as reduced precision to improve energy efficiency, but many of these techniques result in catastrophic accuracy loss on transformers. The proposed accelerator supports per-vector scaled quantization and approximate softmax to enable the use of 4-bit arithmetic with little accuracy loss.

A bidirectional formulation for Walk on Spheres

Numerically solving partial differential equations (PDEs) is central to many applications in computer graphics and scientific modeling. Conventional methods for solving PDEs often need to discretize the space first, making them less efficient for complex geometry. Unlike conventional methods, the walk on spheres (WoS) algorithm recently introduced to graphics is a grid-free Monte Carlo method that can provide numerical solutions of Poisson equations without discretizing space.

A Position-Free Path Integral for Homogeneous Slabs and Multiple Scattering on Smith Microfacets

We consider the problem of multiple scattering on Smith microfacets. This problem is equivalent to computing volumetric light transport in a homogeneous slab. Although the symmetry of the slab allows for significant simplification, fully analytic solutions are scarce and not general enough for most applications. Standard Monte Carlo simulation, although general, is expensive and leads to variance that must be dealt with. We present the first unbiased, truly position-free path integral for evaluating the BSDF of a homogeneous slab.

Conor Hoekstra

Conor joined NVIDIA Research in 2022 to research array programming models and languages. He completed his Bachelors of Science at SFU in 2014 and his Masters of Science at TMU in 2022. Before NVIDIA Research, Conor worked on the RAPIDS team at NVIDIA for over two years. He also worked at Amazon and in a previous career was an actuary at Moody's Analytics for four years.

Mike Pritchard

I am interested in the interface between next generation climate simulation and machine learning. My main focus is on accelerating cloud resolving climate simulations using physics informed machine learning. I am also interested in reinforcement learning approaches to climate model calibration, understanding the limits of autoregressive weather simulations trained on observational data, and AI-assisted low-latency analysis of large high-resolution climate simulation datasets.