Speculative Reconvergence for Improved SIMT Efficiency

GPUs perform most efficiently when all threads in a warp execute the same sequence of instructions convergently. However, when threads in a warp encounter a divergent branch, the hardware serializes the execution of diverged paths. We consider a class of convergence opportunities wherein multiple threads are expected to eventually execute a given segment of code, but not all threads arrive at the same time, resulting in serialized duplicate execution of common code subsequences such as function calls and loop bodies. Our goal is to promote convergence by helping threads that execute common code arrive together before allowing execution to proceed. We propose a new user-guided compiler mechanism, Speculative Reconvergence, to help identify and exploit previously untapped convergence opportunities that increase SIMT efficiency and improve performance. For the set of workloads we study, we see improvements ranging from 10% to 3× in both SIMT efficiency and in performance.

Authors: 
Sana Damani (Georgia Institute of Technology)
Eddie Yan (University of Washington)
Olivier Giroux (NVIDIA)
Michael McKeown (Esperanto Technologies)
Stephen W. Keckler (NVIDIA)
Publication Date: 
Saturday, February 22, 2020