Merlin Nimier-David

Merlin is a senior research scientist at NVIDIA. His research focuses on differentiable physically based rendering, including how to efficiently and accurately computing gradients through rendering algorithms. These gradients can then be leveraged in a variety of inverse tasks, such as recovering the materials and lighting from photographs. He contributed to the development of the Mitsuba differentiable renderer.

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the parameters in a factorized form. Prior efforts used manual or heuristic factorization settings without hardware-aware customization, resulting in poor hardware efficiencies and large performance degradation. 

Display Size and Targeting Performance: Small Hurts, Large May Help

Which display size helps gamers win? Recommendations from the research and PC gaming communities are contradictory. We find that as display size grows, targeting performance improves. When size increases from 13" to 26", targeting time drops by over 3%. Further size increases from 26" through 39", 52" and 65", bring more modest improvements, with targeting time dropping a further 1%. While such improvements may not be meaningful for novice gamers, they are extremely important to skilled and competitive players.

Esports and expertise: what competitive gaming can teach us about mastery

Historically, much research and development in human computer interaction has focused on atomic and generalizable tasks, where task completion time indicates productivity. However, the emergence of competitive games and esports reminds us of an alternative perspective on human performance in HCI: mastery of higher-level, holistic practices. Just as a world-renowned artist is rarely evaluated for their individual brush strokes, so skilled competitive gamers rarely succeed solely by completing individual mouse movements or keystrokes as quickly as possible.

Szu-Wei Fu

Szu-Wei Fu joined NVIDIA Research in November 2022. His current interests include ML-based audio-visual processing, speech processing/enhancement, and quality estimation. Before Joined NVIDIA, he was an applied scientist in Microsoft.

Mingjie Liu

Mingjie Liu is currently a Research Scientist at NVIDIA, where he actively conduct research on Electronic Design Automation. He received his PhD degree in electrical and computer engineering from the The University of Texas at Austin in 2022. His research interest include applied machine learning for design automation and design automation for analog and mixed-signal integrated circuits.

Min-Hung Chen

Min-Hung (Steve) Chen is a Senior Research Scientist at NVIDIA Research Taiwan, working on Vision+X Multi-Modal AI. He received his Ph.D. degree from Georgia Tech, advised by Prof. Ghassan AlRegib and in collaboration with Prof. Zsolt Kira.

Beyond CPO: A Motivation and Approach for Bringing Optics onto the Silicon Interposer

Co-packaged optics (CPO) technology is well positioned to break through the bottlenecks that impede efficient bandwidth scaling in key near-term commercial integrated circuits. In this paper, we begin by providing some historical context for this important sea change in the optical communications industry. Then, motivated by GPU-based accelerated computing requirements, we investigate the next pain points that are poised to constrain bandwidth and efficiency in future CPO-based systems.

Elucidating the Design Space of Diffusion-Based Generative Models

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs.