Programming Languages, Systems and Tools | Research

Programming Languages, Systems and Tools

Associated Publications

2026

Fearless Concurrency on the GPU

Melih Elibol, Jared Roesch, Isaac Gelado, Eric Buehler, Michael Garland

arXiv:2606.15991

SuperCollider: Scalable and Effective Data Race Detection for CUDA

Mark Stephenson, Sana Damani, Mohamed Tarek Ibn Ziad, Anis Ladram, Michael Garland

Hunting CUDA Bugs at Scale with cuFuzz

Mohamed Tarek Ibn Ziad, Christos Kozyrakis

International Conference on Object-Oriented Programming Systems, Languages, and…

2025

Task-Based Tensor Computations on Modern GPUs

Rohan Yadav, Michael Garland, Alex Aiken, Michael Bauer

Adaptive Algebraic Reuse of Reordering in Cholesky Factorizations with Dynamic Sparsity Patterns

Behrooz Zarebavani, Danny Kaufman, David Levin, Maryam Mehri Dehnavi

Composing Distributed Computations Through Task and Kernel Fusion

Rohan Yadav, Shiv Sundrum, Wonchan Lee, Michael Garland, Michael Bauer, Alex Aiken, Fredrik Kjolstad

Automatic Tracing in Task-Based Runtime Systems

Rohan Yadav, Michael Bauer, David Broman, Michael Garland, Alex Aiken, Fredrik Kjolstad

2023

Legate Sparse: Distributed Sparse Computing in Python

Rohan Yadav, Wonchan Lee, Melih Elibol, Taylor Patti, Manolis Papadakis, Michael Garland, Alex Aiken, Fredrik Kjolstad, Michael Bauer

cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications

Mohamed Tarek Ibn Ziad, Sana Damani, Aamer Jaleel, Stephen W. Keckler, Mark Stephenson

ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

Visibility Algorithms for Dynamic Dependence Analysis and Distributed Coherence

Michael Bauer, Elliott Slaughter, Sean Treichler, Wonchan Lee, Michael Garland, Alex Aiken

Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows

Vijay Kandiah, Daniel Lustig, Oreste Villa, David Nellans, Nikos Hardavellas

International Symposium on Code Generation and Optimization

2022

Demystifying Map Space Exploration for NPUs

Sheng-Chun Kao, Angshuman Parashar, Po-An Tsai, Tushar Krishna

International Symposium on Workload Characterization (IISWC)

Slang Shading Language Advances

Yong He, Petrik Clarberg, Theresa Foley

Video on YouTube

Research Advances Toward Real-Time Path Tracing

Petrik Clarberg, Simon Kallweit, Craig Kolb, Pawel Kozlowski, Yong He, Lifan Wu, Edward Liu

Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators

Prasanth Chatarasi, Hyoukjun Kwon, Angshuman Parashar, Michael Pellauer, Tushar Krishna, Vivek Sarkar

Transactions on Architecture and Code Optimization (TACO)

2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna

Parallel Architectures and Compilation Techniques (PACT)

Cooperative Profile Guided Optimization

Mark Stephenson, Ram Rangan, Steve Keckler

Computer Graphics Forum (Proceedings of High Performance Graphics)

Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers

Harini Muthukrishnan, David Nellans, Daniel Lustig, Jeffrey Fessler, Thomas Wenisch

International Symposium on Computer Architecture (ISCA)

PGZ: Automatic Zero-Value Code Specialization

Mark Stephenson, Ram Rangan

International Conference on Compiler Construction (CC)

Scaling Implicit Parallelism via Dynamic Control Replication

Michael Bauer, Wonchan Lee, Elliott Slaughter, Zhihao Jia, Mario Di Renzo, Manolis Papadakis, Galen Shipman, Patrick McCormick, Michael Garland, Alex Aiken

Principles and Practices of Parallel Programming (PPoPP)

Hardware Abstractions for Targeting EDDO Architectures with the Polyhedral Model

Angshuman Parashar, Prasanth Chatarasi, Po-An Tsai

International Workshop on Polyhedral Compilation Techniques (IMPACT)

2020

Locality-Centric Data and Threadblock Management for Massive GPUs

Mahmoud Khairy, Vadim Nikiforov, David Nellans, Timothy G. Rogers

International Symposium on Microarchitecture (MICRO)

A Programmable Approach to Neural Network Compression

Vinu Joseph, Ganesh L. Gopalakrishnan, Saurav Muralidharan, Michael Garland, Animesh Garg

IEEE Micro: Special Issue on Machine Learning for Systems

Zeroploit: Exploiting Zero Valued Operands in Interactive Gaming Applications

Ram Rangan, Mark Stephenson, Aditya Ukarande, Shyam Murthy, Virat Agarwal, Marc Blackstein

ACM Transactions on Architecture and Code Optimization (TACO)

There’s Plenty of Room at the Top: What Will Drive Computer Performance after Moore’s Law?

Charles E. Leiserson, Neil C. Thompson, Joel Emer, Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez , Tao B. Schardl

Speculative Reconvergence for Improved SIMT Efficiency

Sana Damani, Daniel Johnson, Mark Stephenson, Eddie Yan, Olivier Giroux, Michael McKeown, Steve Keckler

International Symposium on Code Generation and Optimization

2019

Legate NumPy: Accelerated and Distributed Array Computing

Michael Bauer, Michael Garland

The International Conference for High Performance Computing, Networking, Storag…

NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs

Oreste Villa, Mark Stephenson, David Nellans, Steve Keckler

International Symposium on Microarchitecture (MICRO)

Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance

Elliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Qinglei Cao, George Bosilca, Seema Mirchandaney, Wonchan Lee, Sean Treichler, Patrick McCormick, Alex Aiken

Timeloop: A Systematic Approach to DNN Accelerator Evaluation

Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Steve Keckler, Joel Emer

International Symposium on Performance Analysis of Systems and Software (ISPASS)

Throughput-oriented GPU memory allocation

Isaac Gelado, Michael Garland

Proceedings of the 24th Symposium on Principles and Practice of Parallel Progra…

2018

Optimizing Software-Directed Instruction Replication for GPU Error Detection

Abdulrahman Mahmoud, Siva Hari, Michael B. Sullivan, Timothy Tsai, Steve Keckler

International Conference for High-Performance Computing, Networking, Storage a…

Dynamic Tracing: Memoization of Task Graphs for Dynamic Task-based Runtimes

Wonchan Lee, Elliott Slaughter, Michael Bauer, Sean Treichler, Todd Warszawski, Michael Garland, Alex Aiken

International Conference for High Performance Computing and Communications (SC'…

Slang: Language Mechanisms for Extensible Real-time Shading Systems

Yong He, Theresa Foley, Kayvon Fatahalian

Proceedings of ACM SIGGRAPH 2018

Isometry: A Path-Based Distributed Data Transfer System

Zhihao Jia, Sean Treichler, Galen Shipman, Patrick McCormick, Alex Aiken

International Conference on Supercomputing (ICS)

Scalable Collectives for Distributed Asynchronous Many-Task Runtimes

Matthew Whitlock, Hemanth Kolla, Sean Treichler, Philippe Pebay, Janine C. Bennett

International Parallel and Distributed Processing Symposium (IPDPS) - workshops

BabelFlow: An Embedded Domain Specific Language for Parallel Analysis and Visualization

Steve Petruzza, Sean Treichler, Valerio Pascucci, Peer-Timo Bremer

International Parallel and Distributed Processing Symposium (IPDPS)

2017

Integrating External Resources with a Task-Based Programming Model

Zhihao Jia, Sean Treichler, Galen Shipman, Michael Bauer, Noah Watkins, Carlos Maltzahn, Patrick McCormick, Alex Aiken

International Conference on High Performance Computing (HiPC)

A Novel Shard-Based Approach for Asynchronous Many-Task Models for In Situ Analysis

Philippe P. Pébaÿ, Giulio Borghesi, Hemanth Kolla, Janine C. Bennett, Sean Treichler

Workshop on In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visu…

Control Replication: Compiling Implicit Parallelism to Efficient SPMD with Logical Regions

Elliott Slaughter, Wonchan Lee, Sean Treichler, Wen Zhang, Michael Bauer, Galen Shipman, Patrick McCormick, Alex Aiken

International Conference for High Performance Computing and Communications (SC…

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors

Benjamin Klenk, Holger Fröning, Hans Eberle, Larry Dennison

32nd IEEE International Parallel and Distributed Processing

Best Paper Award

Automated Synthesis of Comprehensive Memory Model Litmus Test Suites

Daniel Lustig, Andrew Wright, Alexandros Papakonstantinou, Olivier Giroux

International Conference on Architectural Support for Programming Languages and…

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

Caroline Trippel, Yatin A. Manerkar, Daniel Lustig, Michael Pellauer, Margaret Martonosi

International Conference on Architectural Support for Programming Languages and…

IEEE Micro Top Picks in Computer Architecture

2016

A System for Rapid Exploration of Shader Optimization Choices

Yong He, Theresa Foley, Kayvon Fatahalian

Proceedings of ACM SIGGRAPH 2016

2015

MemcachedGPU: Scaling-up Scale-out Key-value Stores

Tayler Hetherington, Mike O'Connor, Tor Aamodt

Sixth ACM Symposium on Cloud Computing (SoCC '15)

Flexible Software Profiling of GPU Architectures

Mark Stephenson, Siva Hari, Yunsup Lee, Eiman Ebrahimi, Daniel Johnson, David Nellans, Mike O'Connor, Steve Keckler

International Symposium on Computer Architecture (ISCA)

Verification of Producer-Consumer Synchronization in GPU Programs

Michael Bauer, Rahul Sharma, Alex Aiken

2014

Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures

Yunsup Lee, Vinod Grover, Ronny Krashinsky, Mark Stephenson, Steve Keckler, Krste Asanovic

International Symposium on Microarchitecture (MICRO)

Scaling Irregular Applications through Data Aggregation and Software Multithreading

Alessandro Morari, Antonino Tumeo, Daniel Chavarria-Miranda, Oreste Villa, Mateo Valero

International Parallel and Distributed Processing Symposium (IPDPS)

2013

NOVA: A Functional Language for Data Parallelism

Alex Collins, Dominik Grewe, Vinod Grover, Sean Lee, Adriana Susnea

Convergence and Scalarization for Data-Parallel Architectures

Yunsup Lee, Ronny Krashinsky, Vinod Grover, Steve Keckler, Krste Asanovic

International Symposium on Code Generation and Optimization (CGO)

2012

Policy-based Tuning for Performance Portability and Library Co-optimization

Duane Merrill, Michael Garland, Andrew Grimshaw

Proc. Innovative Parallel Computing

2011

Processing Device Arrays with C++ Metaprogramming

Jonathan Cohen

GPU Computing Gems, Jade Edition, Edited by Wen-mei W. Hwu

Copperhead: Compiling an Embedded Data Parallel Language

Bryan Catanzaro, Michael Garland, Kurt Keutzer

16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (…

2010

Programming Massively Parallel Processors: A Hands-on Approach

David Kirk, Wen-mei Hwu

Morgan Kauffman

2008

Parallel Computing Experiences with CUDA

Michael Garland, Scott Le Grand, John Nickolls, Joshua Anderson, Jim Hardwick, Scott Morton, Everett Phillips, Yao Zhang, Vasily Volkov

Scalable Parallel Programming with CUDA

John Nickolls, Ian Buck, Michael Garland, Kevin Skadron