Harini Muthukrishnan  

 
  ![](/sites/default/files/person/harinim.jpg)

  
 Harini joined NVIDIA in February 2022 as a part of the System Architecture Research Group. Her research focuses on developing GPU and interconnect solutions for scalable multi-GPU systems.

Fine-grained peer-to-peer stores as a communication paradigm has the potential to improve strong scaling, but existing GPU and interconnect architectures are unable to benefit from such transfers due to several limitations. One such limitation is the poor interconnect efficiency of small peer-to-peer stores. Small stores arise naturally in multi-GPU programming models based on a single address space shared across all devices, but they do not map well onto current inter-GPU interconnects, which remain optimized for bulk transfers rather than small (4-32B) operations. Her present work explores GPU HW enhancements to address this challenge while remaining transparent to the programmer.

Prior to joining NVIDIA, Harini was a Ph.D. candidate at the Univeristy of Michigan, where she was advised by Prof. Thomas Wenisch.


   Research Area(s)

[Computer Architecture](/research-area/computer-architecture)

[High Performance Computing](/research-area/high-performance-computing)

[Networking](/research-area/networking)

 
 Main Field of Interest

[Computer Architecture](/research-area/computer-architecture)

 
 ### Publications

 
### 2021 

[GPS: A Global Publish-Subscribe Model for Multi-GPU Memory Management](/publication/2021-10_gps-global-publish-subscribe-model-multi-gpu-memory-management)

[Harini Muthukrishnan](/person/harini-muthukrishnan), [Daniel Lustig](/person/daniel-lustig), [David Nellans](/person/david-nellans), Thomas Wenisch


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3466752.3480088)


Best Paper nominee, IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers](/publication/2021-06_efficient-multi-gpu-shared-memory-automatic-optimization-fine-grained-transfers)

[Harini Muthukrishnan](/person/harini-muthukrishnan), [David Nellans](/person/david-nellans), [Daniel Lustig](/person/daniel-lustig), Jeffrey Fessler, Thomas Wenisch


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/9499752)