Ligeng Zhu

My research interests focus on efficient and scalable training for deep learning models. Please check homepage https://lzhu.me/ for latest update.

Shizhe Diao

Shizhe Diao is a research scientist at NVIDIA Learning and Perception Research Group. He completed his Ph.D. at the Hong Kong University of Science and Technology, advised by Professor Tong Zhang. Previously, He was a visiting scholar at BLENDER LAB@UIUC, working with Professor Heng Ji. He was a research intern at ByteDance AI Lab with Dr.

Zhijian Liu

Zhijian Liu (https://zhijianliu.com) is a research scientist at NVIDIA . He finished his PhD at MIT, advised by Prof. Song Han. His research focuses on efficient machine learning and systems. His work has been featured as oral and spotlight presentations at conferences such as NeurIPS, ICLR, and CVPR. He has received the Qualcomm Innovation Fellowship and has been recognized as a Rising Star in ML and Systems by MLCommons and a Rising Star in Data Science by UChicago and UCSD.

Rose Abramson

Rose Abramson received the Bachelor of Science (B.S.) and Master of Engineering (M.Eng.) degrees in Electrical Engineering from the Massachusetts Institute of Technology in 2015 and 2016, respectively. After graduating, she worked in industry in both the automotive and home-lighting sectors. She received her Ph.D. degree in Electrical Engineering and Computer Science from the University of California, Berkeley, in 2024. During her Ph.D. she interned at NVIDIA in 2022 and at Analog Devices in 2023, working on data center power delivery.

Youssef Elasser

Youssef Elasser received his B.S. degree in Electrical Engineering and Computer Science with a concentration in electric power from Rensselaer Polytechnic Institute in 2018 and received his M.A. and Ph.D. degrees in Electrical and Computer Engineering from Princeton University in 2024. His research interests include power delivery for data center microprocessors, magnetics design and optimization, and dc-dc power conversion. He interned in the NVIDIA Circuits Research Group during the summer of 2023 and joined NVIDIA Research full time in June 2024. 

Tobias Zirr

Tobias Zirr is a research scientist at NVIDIA interested in machine learning, real-time rendering, and Monte Carlo simulation. Previously, he was a research scientist at Intel, working on the interface of classic and neural rendering, and as a research program lead to bring path tracing to a wider range of practical real-time applications. As a PhD student in the computer graphics group at Karlsruhe Institute of Technology, his research included MC and MCMC light transport algorithms, as well as real-time rendering and visualization techniques.

Lorenzo Maggi

Lorenzo Maggi is a Senior Research Scientist at NVIDIA, specializing in the convergence of wireless communications and machine learning. 

Before joining NVIDIA, Lorenzo developed algorithmic solutions for 5G networks at Nokia Bell Labs France, focusing on energy efficiency, beamforming, scheduling, and radiation mitigation. Prior to this, he worked on network routing algorithms at Huawei France.

Lorenzo holds a master’s degree in telecommunication engineering from the University of Pavia, Italy, and a Ph.D. in applied mathematics from Eurecom, France. 

DiffiT: Diffusion Vision Transformers for Image Generation

Diffusion models with their powerful expressivity and high sample quality have achieved State-Of-The-Art (SOTA) performance in the generative domain. The pioneering Vision Transformer (ViT) has also demonstrated strong modeling capabilities and scalability, especially for recognition tasks. In this paper, we study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT).

An Empirical Study of Mamba-based Language Models

Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory requirements from the key-value cache. Moreover, recent studies have shown that SSMs can match or exceed the language modeling capabilities of Transformers, making them an attractive alternative.