Amrita Mazumdar

Amrita Mazumdar joined NVIDIA Research in 2021. Her research interests are at the intersection of computer systems and computer graphics.

She received her PhD (2020) and MS (2017) from the University of Washington, and her BS (2014) from Columbia University. Previously, she founded Vignette AI, where she worked on perception-aware video compression and storage.

Karen Leung

I am a research scientist working with NVIDIA's Autonomous Vehicle Research Group. My research interests include safe and interaction-aware planning and control for autonomous vehicles, and developing structured and interpretable deep learning models grounded by logic.

Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers

Transformers have transformed the field of natural language processing. This performance is largely attributed to the use of stacked self-attention layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the softmax operation accounts for a significant fraction of the total run-time of Transformers. To address this, we propose Softermax, a hardware-friendly softmax design. Softermax consists of base replacement, low-precision softmax computations, and an online normalization calculation.

Simba: scaling deep-learning inference with chiplet-based architecture

Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication.

Verifying High-Level Latency-Insensitive Designs with Formal Model Checking

Latency-insensitive design mitigates increasing interconnect delay and enables productive component reuse in complex digital systems. This design style has been adopted in high-level design flows because untimed functional blocks connected through latency-insensitive interfaces provide a natural communication abstraction. However, latency-insensitive design with high-level languages also introduces a unique set of verification challenges that jeopardize functional correctness.

Opportunities for RTL and Gate Level Simulation using GPUs

This paper summarizes the opportunities in accelerating simulation on parallel processing hardware platforms such as GPUs.  First, we give a summary of prior art. Then, we propose the idea that coding frameworks usually used for popular machine learning (ML) topics, such as PyTorch/DGL.ai, can also be used for exploring simulation purposes. We demo a crude oblivious two-value cycle gate-level simulator using the higher level ML framework APIs that exhibits >20X speedup, despite its simplistic construction.

Standard Cell Routing with Reinforcement Learning and Genetic Algorithm in Advanced Technology Nodes

Automated standard cell routing in advanced technology nodes with unidirectional metal are challenging because of the constraints of exploding design rules. Previous approaches leveraged mathematical optimization methods such as SAT and MILP to find optimum solution under those constraints. The assumption those methods relied on is that all the design rules can be expressed in the optimization framework and the solver is powerful enough to solve them. In this paper we propose a machine learning based approach that does not depend on this assumption.

MAVIREC: ML-Aided Vectored IR-Drop Estimation and Classification

Vectored IR drop analysis is a critical step in chip signoff that checks the power integrity of an on-chip power delivery network. Due to the prohibitive runtimes of dynamic IR drop analysis, the large number of test patterns must be whittled down to a small subset of worstcase IR vectors. Unlike the traditional slow heuristic method that select a few vectors with incomplete coverage, MAVIREC uses machine learning techniques—3D convolutions and regression-like layers—for accurately recommending a larger subset of test patterns that exercise worst-case scenarios.