Pervasive Massively Multithreaded GPU Processors

This talk presents an overview of NVIDIA's SIMT architecture and some brief insights on how some CUDA programming paradigms map onto it. A brief history of SIMT is provided to explain how NVIDIA ended up implementing a unified SIMT processor core in its GPUs including how graphics shaders are mapped onto SIMT threads. In addition, a conceptual view of how a SIMT microarchitecture executes threads in parallel is provided. The talk wraps up by describing some pitfalls related to thread synchronization, memory access, and cache management and describes some key problem areas in SIMT programming that NVIDIA would like to address in the future.

Authors: 
Michael Shebanow (NVIDIA)
Publication Date: 
Friday, May 1, 2009