| Research

Linxi "Jim" Fan

Please visit my personal website! https://jimfan.me

Read more about Linxi "Jim" Fan

Vinu Joseph

Working on Privacy Preserving Deep Learning using:

Read more about Vinu Joseph

Amrita Mazumdar

Amrita Mazumdar joined NVIDIA Research in 2021. Her research interests are at the intersection of computer systems and computer graphics.

She received her PhD (2020) and MS (2017) from the University of Washington, and her BS (2014) from Columbia University. Previously, she founded Vignette AI, where she worked on perception-aware video compression and storage.

Read more about Amrita Mazumdar

Danfei Xu

Read more about Danfei Xu

Karen Leung

I am a research scientist working with NVIDIA's Autonomous Vehicle Research Group. My research interests include safe and interaction-aware planning and control for autonomous vehicles, and developing structured and interpretable deep learning models grounded by logic.

Read more about Karen Leung

Benedikt Bitterli

Read more about Benedikt Bitterli

Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers

Transformers have transformed the field of natural language processing. This performance is largely attributed to the use of stacked self-attention layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the softmax operation accounts for a significant fraction of the total run-time of Transformers. To address this, we propose Softermax, a hardware-friendly softmax design. Softermax consists of base replacement, low-precision softmax computations, and an online normalization calculation.

Read more about Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers

Simba: scaling deep-learning inference with chiplet-based architecture

Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication.

Read more about Simba: scaling deep-learning inference with chiplet-based architecture

Verifying High-Level Latency-Insensitive Designs with Formal Model Checking

Latency-insensitive design mitigates increasing interconnect delay and enables productive component reuse in complex digital systems. This design style has been adopted in high-level design flows because untimed functional blocks connected through latency-insensitive interfaces provide a natural communication abstraction. However, latency-insensitive design with high-level languages also introduces a unique set of verification challenges that jeopardize functional correctness.

Read more about Verifying High-Level Latency-Insensitive Designs with Formal Model Checking

Subscribe to