Measuring the Radiation Reliability of SRAM Structures in GPUs Designed for HPC
Processing Units specifically designed for High Performance Computing applications require a higher reliability than GPUs used for graphic rendering or gaming. Particular attention should be given to GPU memory structures because these components have been shown to be the most vulnerable for various codes. This paper describes a test framework to assess neutron sensitivity of GPU caches and register files. It also presents results from an extensive radiation test campaign that was performed at LANSCE in Los Alamos, New Mexico. Results show that the neutron sensitivity of the latest GPUs designed for HPC is significantly lower than a previous generation device. This paper also discusses the occurrences of Multiple Bits and Cells Upset and efficacy of the available ECC mechanisms.
This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to firstname.lastname@example.org.