Understanding the Efficiency of Ray Traversal on GPUs

We discuss the mapping of elementary ray tracing operations---acceleration structure traversal and primitive intersection---onto wide SIMD/SIMT machines. Our focus is on NVIDIA GPUs, but some of the observations should be valid for other wide machines as well. While several fast GPU tracing methods have been published, very little is actually understood about their performance. Nobody knows whether the methods are anywhere near the theoretically obtainable limits, and if not, what might be causing the discrepancy.

Fast Tridiagonal Solvers on the GPU

We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU programs in terms of memory access, computation, and control overhead.We find that CR enjoys linear algorithm complexity but suffers from more algorithmic steps and bank conflicts, while PCR and RD have fewer algorithmic steps but do more work each step.

Designing Efficient Sorting Algorithms for Manycore GPUs

We describe the design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA. Our radix sort is the fastest GPU sort and our merge sort is the fastest comparison-based sort reported in the literature. Our radix sort is up to 4 times faster than the graphics-based GPUSort and greater than 2 times faster than other CUDA-based radix sorts. It is also 23% faster, on average, than even a very carefully optimized multicore CPU sorting routine.

Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors

Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns.

Ming-Yu Liu

Ming-Yu Liu is a Vice President of Research at NVIDIA and a Fellow of IEEE. He leads the Deep Imagination Research group at NVIDIA, which focuses on deep generative models and their applications in content creation. NVIDIA Picasso [Edify], NVIDIA Canvas [GauGAN] and NVIDIA Maxine [LivePortrait] are three products enabled by research from his research group.

NVIDIA Demo Wins the Laval Virtual Award at the SIGGRAPH 2016 Emerging Technologies Event:

Date

NVIDIA Demo Wins the Laval Virtual Award at the SIGGRAPH 2016 Emerging Technologies Event:

Anjul Patney, Joohwan Kim, Marco Salvi, Anton Kaplanyan, Chris Wyman, Nir Benty, Aaron Lefohn, David Luebke, “Perceptually-Based Foveated Virtual Reality”, Emerging Technologies, SIGGRAPH, Anaheim, CA, July 24-28, 2016

NVIDIA Secures Runner-Up Best Paper Position at ASYNC 2016

Date

NVIDIA Secures Runner-Up Best Paper Position at ASYNC 2016 along with the University of Virginia:

Divya Akella Kamakshi (U. Virginia), Matthew Fojtik (NVIDIA), Brucek Khailany (NVIDIA), Sudhir Kudva (NVIDIA), Yaping Zhou (U. Virginia), Benton H. Calhoun (U. Virginia), “Modeling and Analysis of Power Supply Noice Tolerance with Fine-grained GALS Adaptive Clocks

Saurav Muralidharan

Saurav Muralidharan is a Senior Research Scientist in the Programming Systems & Applications research group. His work focuses on improving the performance and efficiency of deep neural networks. More broadly, he's interested in research problems that lie at the intersection of systems and machine learning. Please visit sauravm.com for more details on his research.