Designing Efficient Sorting Algorithms for Manycore GPUs

We describe the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix sort is the fastest GPU sort and our merge sort is the fastest comparison-based sort reported in the literature. Our radix sort is up to 4 times faster than the graphics-based GPUSort and greater than 2 times faster than other CUDA-based radix sorts. It is also 23% faster, on average, than even a very carefully optimized multicore CPU sorting routine.

Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors

Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns.

Ming-Yu Liu

Ming-Yu Liu is a Vice President of Research at NVIDIA and a Fellow of IEEE. He leads the Deep Imagination Research group at NVIDIA, which focuses on deep generative models and their applications in content creation. NVIDIA Cosmos, NVIDIA Canvas [GauGAN] and NVIDIA Maxine [LivePortrait] are three products enabled by research from his research group.

NVIDIA Demo Wins the Laval Virtual Award at the SIGGRAPH 2016 Emerging Technologies Event:

Date

NVIDIA Demo Wins the Laval Virtual Award at the SIGGRAPH 2016 Emerging Technologies Event:

Anjul Patney, Joohwan Kim, Marco Salvi, Anton Kaplanyan, Chris Wyman, Nir Benty, Aaron Lefohn, David Luebke, “Perceptually-Based Foveated Virtual Reality”, Emerging Technologies, SIGGRAPH, Anaheim, CA, July 24-28, 2016

NVIDIA Secures Runner-Up Best Paper Position at ASYNC 2016

Date

NVIDIA Secures Runner-Up Best Paper Position at ASYNC 2016 along with the University of Virginia:

Divya Akella Kamakshi (U. Virginia), Matthew Fojtik (NVIDIA), Brucek Khailany (NVIDIA), Sudhir Kudva (NVIDIA), Yaping Zhou (U. Virginia), Benton H. Calhoun (U. Virginia), “Modeling and Analysis of Power Supply Noice Tolerance with Fine-grained GALS Adaptive Clocks

Saurav Muralidharan

Saurav Muralidharan is a Senior Research Scientist in the Deep Learning Efficiency Research (DLER) team. He specializes in LLM efficiency and performance optimization and has worked on model compression (pruning, distillation, low-rank factorization), SLMs, elastic and mixture-of-expert (MoE) networks. Please visit sauravm.com for more details on his research.