Measuring the Radiation Reliability of SRAM Structures in GPUs Designed for HPC

Processing Units specifically designed for High Performance Computing applications require a higher reliability than GPUs used for graphic rendering or gaming. Particular attention should be given to GPU memory structures because these components have been shown to be the most vulnerable for various codes. This paper describes a test framework to assess neutron sensitivity of GPU caches and register files. It also presents results from an extensive radiation test campaign that was performed at LANSCE in Los Alamos, New Mexico. Results show that the neutron sensitivity of the latest GPUs designed for HPC is significantly lower than a previous generation device. This paper also discusses the occurrences of Multiple Bits and Cells Upset and efficacy of the available ECC mechanisms.


Paolo Rech (Universidade Federal do Rio Grande do Sul)
Luigi Carro (Universidade Federal do Rio Grande do Sul)
Nicholas Wang (NVIDIA)

