Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Skip to main content
Artificial Intelligence Computing Leadership from NVIDIA
Login
Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Search
Search
Enter the terms you wish to search for.
Research Areas
Programming Languages, Systems and Tools
Associated Publications
2025
Composing Distributed Computations Through Task and Kernel Fusion
Rohan Yadav, Shiv Sundrum, Wonchan Lee,
Michael Garland
,
Michael Bauer
, Alex Aiken, Fredrik Kjolstad
ASPLOS
Automatic Tracing in Task-Based Runtime Systems
Rohan Yadav,
Michael Bauer
, David Broman,
Michael Garland
, Alex Aiken, Fredrik Kjolstad
ASPLOS
2023
Legate Sparse: Distributed Sparse Computing in Python
Rohan Yadav, Wonchan Lee,
Melih Elibol
,
Taylor Patti
, Manolis Papadakis,
Michael Garland
, Alex Aiken, Fredrik Kjolstad,
Michael Bauer
Supercomputing
cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications
Mohamed Tarek Ibn Ziad
,
Sana Damani
,
Aamer Jaleel
,
Stephen W. Keckler
,
Mark Stephenson
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
Visibility Algorithms for Dynamic Dependence Analysis and Distributed Coherence
Michael Bauer
, Elliott Slaughter, Sean Treichler, Wonchan Lee,
Michael Garland
, Alex Aiken
PPoPP
Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows
Vijay Kandiah,
Daniel Lustig
,
Oreste Villa
,
David Nellans
, Nikos Hardavellas
International Symposium on Code Generation and Optimization
2022
Demystifying Map Space Exploration for NPUs
Sheng-Chun Kao,
Angshuman Parashar
,
Po-An Tsai
, Tushar Krishna
International Symposium on Workload Characterization (IISWC)
Slang Shading Language Advances
Yong He
,
Petrik Clarberg
, Theresa Foley
Video on YouTube
Research Advances Toward Real-Time Path Tracing
Petrik Clarberg
,
Simon Kallweit
,
Craig Kolb
, Pawel Kozlowski,
Yong He
,
Lifan Wu
, Edward Liu
GDC 2022
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators
Prasanth Chatarasi, Hyoukjun Kwon,
Angshuman Parashar
,
Michael Pellauer
, Tushar Krishna, Vivek Sarkar
Transactions on Architecture and Code Optimization (TACO)
2021
Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi,
Angshuman Parashar
,
Po-An Tsai
, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna
Parallel Architectures and Compilation Techniques (PACT)
Cooperative Profile Guided Optimization
Mark Stephenson
, Ram Rangan,
Steve Keckler
Computer Graphics Forum (Proceedings of High Performance Graphics)
Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
Harini Muthukrishnan
,
David Nellans
,
Daniel Lustig
, Jeffrey Fessler, Thomas Wenisch
International Symposium on Computer Architecture (ISCA)
PGZ: Automatic Zero-Value Code Specialization
Mark Stephenson
, Ram Rangan
International Conference on Compiler Construction (CC)
Scaling Implicit Parallelism via Dynamic Control Replication
Michael Bauer
, Wonchan Lee, Elliott Slaughter, Zhihao Jia, Mario Di Renzo, Manolis Papadakis, Galen Shipman, Patrick McCormick,
Michael Garland
, Alex Aiken
Principles and Practices of Parallel Programming (PPoPP)
Hardware Abstractions for Targeting EDDO Architectures with the Polyhedral Model
Angshuman Parashar
, Prasanth Chatarasi,
Po-An Tsai
International Workshop on Polyhedral Compilation Techniques (IMPACT)
2020
Locality-Centric Data and Threadblock Management for Massive GPUs
Mahmoud Khairy, Vadim Nikiforov,
David Nellans
, Timothy G. Rogers
International Symposium on Microarchitecture (MICRO)
A Programmable Approach to Neural Network Compression
Vinu Joseph, Ganesh L. Gopalakrishnan,
Saurav Muralidharan
,
Michael Garland
, Animesh Garg
IEEE Micro: Special Issue on Machine Learning for Systems
Zeroploit: Exploiting Zero Valued Operands in Interactive Gaming Applications
Ram Rangan,
Mark Stephenson
, Aditya Ukarande, Shyam Murthy, Virat Agarwal, Marc Blackstein
ACM Transactions on Architecture and Code Optimization (TACO)
There’s Plenty of Room at the Top: What Will Drive Computer Performance after Moore’s Law?
Charles E. Leiserson, Neil C. Thompson,
Joel Emer
, Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez , Tao B. Schardl
Science
Speculative Reconvergence for Improved SIMT Efficiency
Sana Damani, Daniel Johnson,
Mark Stephenson
, Eddie Yan, Olivier Giroux, Michael McKeown,
Steve Keckler
International Symposium on Code Generation and Optimization
2019
Legate NumPy: Accelerated and Distributed Array Computing
Michael Bauer
,
Michael Garland
The International Conference for High Performance Computing, Networking, Storag…
NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs
Oreste Villa
,
Mark Stephenson
,
David Nellans
,
Steve Keckler
International Symposium on Microarchitecture (MICRO)
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance
Elliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Qinglei Cao, George Bosilca, Seema Mirchandaney, Wonchan Lee, Sean Treichler, Patrick McCormick, Alex Aiken
arXiv
Timeloop: A Systematic Approach to DNN Accelerator Evaluation
Angshuman Parashar
, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara,
Rangharajan Venkatesan
,
Brucek Khailany
,
Steve Keckler
,
Joel Emer
International Symposium on Performance Analysis of Systems and Software (ISPASS)
Throughput-oriented GPU memory allocation
Isaac Gelado
,
Michael Garland
Proceedings of the 24th Symposium on Principles and Practice of Parallel Progra…
2018
Optimizing Software-Directed Instruction Replication for GPU Error Detection
Abdulrahman Mahmoud,
Siva Hari
,
Michael B. Sullivan
, Timothy Tsai,
Steve Keckler
International Conference for High-Performance Computing, Networking, Storage a…
Dynamic Tracing: Memoization of Task Graphs for Dynamic Task-based Runtimes
Wonchan Lee, Elliott Slaughter,
Michael Bauer
, Sean Treichler, Todd Warszawski,
Michael Garland
, Alex Aiken
International Conference for High Performance Computing and Communications (SC'…
Slang: Language Mechanisms for Extensible Real-time Shading Systems
Yong He, Theresa Foley, Kayvon Fatahalian
Proceedings of ACM SIGGRAPH 2018
Isometry: A Path-Based Distributed Data Transfer System
Zhihao Jia, Sean Treichler, Galen Shipman, Patrick McCormick, Alex Aiken
International Conference on Supercomputing (ICS)
Scalable Collectives for Distributed Asynchronous Many-Task Runtimes
Matthew Whitlock, Hemanth Kolla, Sean Treichler, Philippe Pebay, Janine C. Bennett
International Parallel and Distributed Processing Symposium (IPDPS) - workshops
BabelFlow: An Embedded Domain Specific Language for Parallel Analysis and Visualization
Steve Petruzza, Sean Treichler, Valerio Pascucci, Peer-Timo Bremer
International Parallel and Distributed Processing Symposium (IPDPS)
2017
Integrating External Resources with a Task-Based Programming Model
Zhihao Jia, Sean Treichler, Galen Shipman,
Michael Bauer
, Noah Watkins, Carlos Maltzahn, Patrick McCormick, Alex Aiken
International Conference on High Performance Computing (HiPC)
A Novel Shard-Based Approach for Asynchronous Many-Task Models for In Situ Analysis
Philippe P. Pébaÿ, Giulio Borghesi, Hemanth Kolla, Janine C. Bennett, Sean Treichler
Workshop on In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visu…
Control Replication: Compiling Implicit Parallelism to Efficient SPMD with Logical Regions
Elliott Slaughter, Wonchan Lee, Sean Treichler, Wen Zhang,
Michael Bauer
, Galen Shipman, Patrick McCormick, Alex Aiken
International Conference for High Performance Computing and Communications (SC…
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors
Benjamin Klenk, Holger Fröning,
Hans Eberle
,
Larry Dennison
32nd IEEE International Parallel and Distributed Processing
Best Paper Award
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
Caroline Trippel, Yatin A. Manerkar,
Daniel Lustig
,
Michael Pellauer
, Margaret Martonosi
International Conference on Architectural Support for Programming Languages and…
IEEE Micro Top Picks in Computer Architecture
Automated Synthesis of Comprehensive Memory Model Litmus Test Suites
Daniel Lustig
, Andrew Wright, Alexandros Papakonstantinou, Olivier Giroux
International Conference on Architectural Support for Programming Languages and…
2016
A System for Rapid Exploration of Shader Optimization Choices
Yong He, Theresa Foley, Kayvon Fatahalian
Proceedings of ACM SIGGRAPH 2016
2015
MemcachedGPU: Scaling-up Scale-out Key-value Stores
Tayler Hetherington,
Mike O'Connor
, Tor Aamodt
Sixth ACM Symposium on Cloud Computing (SoCC '15)
Flexible Software Profiling of GPU Architectures
Mark Stephenson
,
Siva Hari
, Yunsup Lee, Eiman Ebrahimi, Daniel Johnson,
David Nellans
,
Mike O'Connor
,
Steve Keckler
International Symposium on Computer Architecture (ISCA)
2014
Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures
Yunsup Lee, Vinod Grover, Ronny Krashinsky,
Mark Stephenson
,
Steve Keckler
, Krste Asanovic
International Symposium on Microarchitecture (MICRO)
Scaling Irregular Applications through Data Aggregation and Software Multithreading
Alessandro Morari, Antonino Tumeo, Daniel Chavarria-Miranda,
Oreste Villa
, Mateo Valero
International Parallel and Distributed Processing Symposium (IPDPS)
2013
NOVA: A Functional Language for Data Parallelism
Alex Collins, Dominik Grewe, Vinod Grover, Sean Lee, Adriana Susnea
Convergence and Scalarization for Data-Parallel Architectures
Yunsup Lee, Ronny Krashinsky, Vinod Grover,
Steve Keckler
, Krste Asanovic
International Symposium on Code Generation and Optimization (CGO)
2012
Policy-based Tuning for Performance Portability and Library Co-optimization
Duane Merrill
,
Michael Garland
, Andrew Grimshaw
Proc. Innovative Parallel Computing
2011
Processing Device Arrays with C++ Metaprogramming
Jonathan Cohen
GPU Computing Gems, Jade Edition, Edited by Wen-mei W. Hwu
Copperhead: Compiling an Embedded Data Parallel Language
Bryan Catanzaro,
Michael Garland
, Kurt Keutzer
16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (…
2010
Programming Massively Parallel Processors: A Hands-on Approach
David Kirk, Wen-mei Hwu
Morgan Kauffman
2008
Parallel Computing Experiences with CUDA
Michael Garland
, Scott Le Grand, John Nickolls, Joshua Anderson, Jim Hardwick, Scott Morton, Everett Phillips, Yao Zhang, Vasily Volkov
IEEE Micro
Scalable Parallel Programming with CUDA
John Nickolls, Ian Buck,
Michael Garland
, Kevin Skadron
Queue
Researchers
Aamer Jaleel
Aaron Lefohn
Albert Sidelnik
Andrei Alexandrescu
Benjamin Klenk
Cédric Augonnet
Chaz Gouert
Conor Hoekstra
Cris Cecka
Daniel Lustig
David Nellans
Drew Zagieboylo
Isaac Gelado
Jared Hoberock
Jean-Luc Watson
Mark Kilgard
Mark Stephenson
Maryam Mehri Dehnavi
Melih Elibol
Michael Garland
Michael Bauer
Mohamed Tarek Ibn Ziad
Nicolai Oswald
Oreste Villa
Sai Bangaru
Sana Damani
Saurav Muralidharan
Simon Cooksey
Steven Dalton
Vinu Joseph
William Dally
Yaosheng Fu
Yong He