Anatomy of GPU Memory System for Multi-Application Execution

Publication image

As GPUs make headway in the computing landscape spanning mobile platforms, supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of multiple applications in GPUs becomes essential for unlocking their full potential. However, unlike CPUs, multi-application execution in GPUs is little explored. In this paper, we study the memory system of GPUs in a concurrently executing multi-application environment. We first present an analytical performance model for many-threaded architectures and show that the common use of misses-per-kilo-instruction (MPKI) as a proxy for performance is not accurate without considering the bandwidth usage of applications. We characterize the memory interference of applications and discuss the limitations of existing memory schedulers in mitigating this interference. We extend the analytical model to multiple applications and identify the key metrics to control various performance metrics. We conduct extensive simulations using an enhanced version of GPGPU-Sim targeted for concurrently executing multiple applications, and show that memory scheduling decisions based on MPKI and bandwidth information are more effective in enhancing throughput compared to the traditional FR-FCFS and the recently proposed RR FR-FCFS policies.

Authors

Adwait Jog (Pennsylvania State University)
Onur Kayiran (Pennsylvania State University)
Tuba Kesten (Pennsylvania State University)
Ashutosh Pattnaik (Pennsylvania State University)
Evgeny Bolotin (NVIDIA)
Mahmut T. Kandemir (Pennsylvania State University)
Chita R. Das (Pennsylvania State University)

Publication Date

Research Area

Uploaded Files