Aamer Jaleel  

 
  ![](/sites/default/files/person/aj_head.jpg)

  
 Dr. Aamer Jaleel joined NVIDIA in 2015 and is a member of the Architecture Research Group (ARG). His research work focuses on cache and DRAM systems, workload scheduling, performance modeling, and workload characterization. Prior to joining NVIDIA, he was a Principal Engineer at Intel Massachusetts Inc. in the VSSAD research group. During his decade-long career at Intel, his research work contributed towards enhancement in performance modeling and cache hierarchy improvements of Intel’s next generation microprocessors. In the Fall of 2014, during his extended sabbatical from Intel, he also served as a Visiting Professor at the University of Minnesota, Minneapolis-St. Paul where he co-taught a graduate computer architecture course.

Jaleel received his Ph.D. in Electrical Engineering from the University of Maryland, College Park in 2006. He received his B.S. and M.S. in Computer Engineering also from the University of Maryland, College Park in 2000 and 2002 respectively. Jaleel has co-authored more than a dozen patents and over 30 technical publications.


   Research Area(s)

[Computer Architecture](/index.php/research-area/computer-architecture)

[High Performance Computing](/index.php/research-area/high-performance-computing)

[Artificial Intelligence and Machine Learning ](/index.php/research-area/machine-learning-artificial-intelligence)

[Networking](/index.php/research-area/networking)

[Programming Languages, Systems and Tools](/index.php/research-area/programming-languages-systems)

 
 Main Field of Interest

[Computer Architecture](/index.php/research-area/computer-architecture)

 
 Google Scholar

[https://scholar.google.com/citations?user=Ln3yVGoAAAAJ&amp;hl=en](https://scholar.google.com/citations?user=Ln3yVGoAAAAJ&hl=en)

 
 ### Publications

 
### 2023 

[cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications](/publication/2023-06_cucatch-debugging-tool-efficiently-catching-memory-safety-violations-cuda)

[Mohamed Tarek Ibn Ziad](/person/mohamed-tarek-ibn-ziad), [Sana Damani](/person/sana-damani), [Aamer Jaleel](/person/aamer-jaleel), [Stephen W. Keckler](/person/stephen-keckler), [Mark Stephenson](/person/mark-stephenson)


[ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)](https://dl.acm.org/doi/10.1145/3591225)


[Implicit Memory Tagging: No-Overhead Memory Safety Using Alias-Free Tagged ECC](/publication/2023-06_implicit-memory-tagging-no-overhead-memory-safety-using-alias-free-tagged-ecc)

[Michael B. Sullivan](/person/mike-sullivan), [Mohamed Tarek Ibn Ziad](/person/mohamed-tarek-ibn-ziad), [Aamer Jaleel](/person/aamer-jaleel), [Stephen W. Keckler](/person/stephen-keckler)


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/abs/10.1145/3579371.3589102)


### 2021 

[P-OPT: Practical Optimal Cache Replacement for Graph Analytics](/publication/2021-02_p-opt-practical-optimal-cache-replacement-graph-analytics)

Vignesh Balaji, [Neal Crago](/person/neal-crago), [Aamer Jaleel](/person/aamer-jaleel), Brandon Lucia


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/9407090)


Best Paper nominee


### 2020 

[HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems](/publication/2020-02_hmg-extending-cache-coherence-protocols-across-modern-hierarchical-multi-gpu)

Xiaowei Ren, [Daniel Lustig](/person/daniel-lustig), Evgeny Bolotin, [Aamer Jaleel](/person/aamer-jaleel), Oreste Villa, [David Nellans](/person/david-nellans)


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/9065597)


### 2019 

[ExTensor: An Accelerator for Sparse Tensor Algebra](/index.php/publication/2019-10_extensor-accelerator-sparse-tensor-algebra)

Kartik Hegde, Hadi Asghari-Moghaddam, [Michael Pellauer](/index.php/person/michael-pellauer), [Neal Crago](/index.php/person/neal-crago), [Aamer Jaleel](/index.php/person/aamer-jaleel), Edgar Solomonik, [Joel Emer](/index.php/person/joel-emer), Christopher W. Fletcher


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3352460.3358275)


IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[Adaptive Memory-Side Last-Level GPU Caching](/publication/2019-06_adaptive-memory-side-last-level-gpu-caching)

Xia Zhao, Almutaz Adileh, Zhibin Yu, Zhiying Wang, [Aamer Jaleel](/person/aamer-jaleel), Lieven Eeckhout


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3307650.3322235)


[DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems](/publication/2019-03_ducati-high-performance-address-translation-extending-tlb-reach-gpu-accelerated)

[Aamer Jaleel](/person/aamer-jaleel), Eiman Ebrahimi, Sam Duncan


[ACM Transactions on Architecture and Code Optimization (TACO)](https://dl.acm.org/doi/abs/10.1145/3309710)


### 2018 

[Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems](/publication/2018-10_combining-hwsw-mechanisms-improve-numa-performance-multi-gpu-systems)

Vinson Young, [Aamer Jaleel](/person/aamer-jaleel), Evgeny Bolotin, Eiman Ebrahimi, [David Nellans](/person/david-nellans), Oreste Villa


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1109/MICRO.2018.00035)


[ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction](/publication/2018-06_accord-enabling-associativity-gigascale-dram-caches-coordinating-way-install)

Vinson Young, Chiachen Chou, [Aamer Jaleel](/person/aamer-jaleel), Moinuddin Qureshi


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/8416838)


### 2017 

[Beyond the Socket: NUMA-Aware GPUs](/publication/2017-10_beyond-socket-numa-aware-gpus)

Ugljesa Milic, Oreste Villa, Evgeny Bolotin, Akhil Arunkumar, Eiman Ebrahimi, [Aamer Jaleel](/person/aamer-jaleel), Alex Ramirez, [David Nellans](/person/david-nellans)


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/citation.cfm?id=3124534)


[BATMAN: Maximizing Bandwidth Utilization of Hybrid Memory Systems](/publication/2017-10_batman-maximizing-bandwidth-utilization-hybrid-memory-systems)

Chiachen Chou, [Aamer Jaleel](/person/aamer-jaleel), Moinuddin Qureshi


[International Symposium on Memory Systems (MEMSYS)](https://dl.acm.org/doi/10.1145/3132402.3132404)


[MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability](/publication/2017-06_mcm-gpu-multi-chip-module-gpus-continued-performance-scalability)

Akhil Arunkumar , Evgeny Bolotin, Benjamin Cho, Ugljesa Milic , Eiman Ebrahimi, Oreste Villa, [Aamer Jaleel](/person/aamer-jaleel), Carole-Jean Wu , [David Nellans](/person/david-nellans)


[International Symposium on Computer Architecture (ISCA)](https://doi.org/10.1145/3079856.3080231)


### 2016 

[CANDY: Enabling Coherent DRAM Caches for Multi-Node Systems](/publication/2016-10_candy-enabling-coherent-dram-caches-multi-node-systems)

Chiachen Chou, [Aamer Jaleel](/person/aamer-jaleel), Moinuddin Qureshi


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.5555/3195638.3195680)


[The Bunker Cache for Spatio-Value Approximation](/index.php/publication/2016-10_bunker-cache-spatio-value-approximation)

Joshua San Miguel, Jorge Albericio, [Aamer Jaleel](/index.php/person/aamer-jaleel), Natalie Enright Jerger


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7783746)


[LAP: Loop-Block Aware Inclusion Properties for Energy-Efficient Asymmetric Last Level Caches](/publication/2016-06_lap-loop-block-aware-inclusion-properties-energy-efficient-asymmetric-last)

Hsiang-Yun Cheng, Jishen Zhao, Jack Sampson, Mary Jane Irwin, [Aamer Jaleel](/person/aamer-jaleel), Yu Lu, Yuan Xie


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/7551386)


### 2015 

[Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures](/publication/2015-09_efficient-control-and-communication-paradigms-coarse-grained-spatial)

[Michael Pellauer](/person/michael-pellauer), [Angshuman Parashar](/person/angshuman-parashar), Michael Adler, Bushra Ahsan, Randy Almon, [Neal Crago](/person/neal-crago), Kermin Fleming, Mohit Gambhir, [Aamer Jaleel](/person/aamer-jaleel), Tushar Krishna, [Daniel Lustig](/person/daniel-lustig), Stephen Maresh, Vladimir Pavlov, Rachid Rayess, Antonia Zhai, [Joel Emer](/person/joel-emer)


[ACM Transactions on Computing Systems (TOCS)](https://dl.acm.org/doi/10.1145/2754930)


[High Performing Cache Hierarchies for Server Workloads -- Relaxing Inclusion to Capture the Latency Benefits of Exclusive Caches](/index.php/publication/2015-02_high-performing-cache-hierarchies-server-workloads-relaxing-inclusion-capture)

[Aamer Jaleel](/index.php/person/aamer-jaleel), Joseph Nuzman, Adrian Moga, Simon C. Steely Jr., [Joel Emer](/index.php/person/joel-emer)


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/7056045)