1. [Publications](/publications)
2. Unlocking Bandwidth for GPUs in CC-NUMA systems
 
 # Unlocking Bandwidth for GPUs in CC-NUMA systems

  ![](/sites/default/files/styles/wide/public/publications/agarwal.hpca2015.png?itok=J6r6KOAP)

 Historically, GPU-based HPC applications have had a substantial memory bandwidth advantage over CPU-based workloads due to using GDDR rather than DDR memory. However, past GPUs required a restricted programming model where application data was allocated up front and explicitly copied into GPU memory before launching a GPU kernel by the programmer. Recently, GPUs have eased this requirement and now can employ on-demand software page migration between CPU and GPU memory to obviate explicit copying. In the near future, CC-NUMA GPU-CPU systems will appear where software page migration is an optional choice and hardware cache-coherence can also support the GPU accessing CPU memory directly. In this work, we describe the trade-offs and considerations in relying on hardware cache-coherence mechanisms versus using software page migration to optimize the performance of memory-intensive GPU workloads. We show that page migration decisions based on page access frequency alone are a poor solution and that a broader solution using virtual address-based program locality to enable aggressive memory prefetching combined with bandwidth balancing is required to maximize performance. We present a software runtime system requiring minimal hardware support that, on average, outperforms CC-NUMA-based accesses by 1.95 ×, performs 6% better than the legacy CPU to GPU memcpy regime by intelligently using both CPU and GPU memory bandwidth, and comes within 28% of oracular page placement, all while maintaining the relaxed memory semantics of modern GPUs.


 ## Authors


Neha Agarwal (University of Michigan)

[David Nellans](/person/david-nellans)

[Mike O'Connor](/person/mike-o-connor)

[Steve Keckler](/person/stephen-keckler)

Thomas Wenisch (University of Michigan)

 
 ## Publication Date


Saturday, February 7, 2015

 
 ## Published in


[International Symposium on High Performance Computer Architecture (HPCA)](http://ieeexplore.ieee.org/document/7056046/)

 
 ## Research Area


[Computer Architecture](/research-area/computer-architecture)

 
 ## External Links


[IEEE Digital Library](http://ieeexplore.ieee.org/document/7056046/)

 
 ## Uploaded Files


[Published manuscript](https://research.nvidia.com/sites/default/files/pubs/2015-02_Unlocking-bandwidth-for//agarwal.hpca2015.pdf "Open file in new window")2.07 MB

 
 ## Copyright


This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to <pubs-permissions@ieee.org>.