1. [Publications](/index.php/publications)
2. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems
 
 # Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems

  ![](/sites/default/files/styles/wide/public/publications/hseih.isca2016.png?itok=pRWj1C53)

 Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to significantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance benefits from an architecture that connects multiple such 3D-stacked memories and offloads bandwidth-intensive computations to a GPU in each of the logic layers. An unsolved key challenge in such a system is how to enable computation offloading and data mapping to multiple 3D-stacked memories without burdening the programmer such that any application can transparently benefit from near-data processing capabilities in the logic layer. Our paper develops two new mechanisms to address this key challenge. First, a compiler-based technique that automatically identifies code to offload to a logic-layer GPU based on a simple cost-benefit analysis. Second, a software/hardware cooperative mechanism that predicts which memory pages will be accessed by offloaded code, and places those pages in the memory stack closest to the offloaded code, to minimize off-chip bandwidth consumption. We call the combination of these two programmer-transparent mechanisms TOM: Transparent Offloading and Mapping. Our extensive evaluations across a variety of modern memory-intensive GPU workloads show that, without requiring any program modification, TOM significantly improves performance (by 30% on average, and up to 76%) compared to a baseline GPU system that cannot offload computation to 3D-stacked memories.



 ## Authors



Kevin Hsieh (Carnegie Mellon University)

Eiman Ebrahimi (NVIDIA)

Gwangsun Kim (Korea Advanced Institute of Science and Technology (KAIST))

[Niladrish Chatterjee](/index.php/person/niladrish-chatterjee)

[Mike O'Connor](/index.php/person/mike-o-connor)

Nandita Vijaykumar (Carnegie Mellon University)

Onur Mutlu (Carnegie Mellon University / ETH Zurich)

[Steve Keckler](/index.php/person/stephen-keckler)

 

 

 ## Publication Date



Saturday, June 18, 2016

 

 ## Published in



[International Symposium on Computer Architecture (ISCA)](http://ieeexplore.ieee.org/document/7551394/)

 

 ## Research Area



[Computer Architecture](/index.php/research-area/computer-architecture)

 

 

 ## External Links



[IEE Digital Library](http://ieeexplore.ieee.org/document/7551394/)

 

 

 ## Uploaded Files



[Published manuscript](https://d1qx31qr3h6wln.cloudfront.net/publications/ISCA_2016_Near_Data_Processing.pdf "Open file in new window")1.06 MB

 

 

 ## Copyright



This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to <pubs-permissions@ieee.org>.