Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Skip to main content
Artificial Intelligence Computing Leadership from NVIDIA
Login
Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Search
Search
Enter the terms you wish to search for.
Publications
Our publications provide insight into some of our leading-edge research.
Filters
Search
Apply
Filters
Filters
Publication Year
2025
(2)
2023
(4)
2022
(4)
2021
(6)
2020
(5)
2019
(5)
2018
(6)
2017
(6)
2016
(1)
2015
(2)
2014
(2)
2013
(2)
2012
(1)
2011
(2)
2010
(1)
2008
(2)
Facet Publication Year
Research Areas
Programming Languages, Systems and Tools
(51)
Computer Architecture
(21)
High Performance Computing
(17)
Computer Graphics
(7)
Artificial Intelligence and Machine Learning
(5)
Real-Time Rendering
(5)
Networking
(3)
Algorithms and Numerical Methods
(1)
Resilience and Safety
(1)
Events
PLDI
(1)
51 results found
Programming Languages, Systems and Tools
Clear all
Programming Languages, Systems and Tools
2025
Composing Distributed Computations Through Task and Kernel Fusion
Rohan Yadav, Shiv Sundrum, Wonchan Lee,
Michael Garland
,
Michael Bauer
, Alex Aiken, Fredrik Kjolstad
Automatic Tracing in Task-Based Runtime Systems
Rohan Yadav,
Michael Bauer
, David Broman,
Michael Garland
, Alex Aiken, Fredrik Kjolstad
2023
Legate Sparse: Distributed Sparse Computing in Python
Rohan Yadav, Wonchan Lee,
Melih Elibol
,
Taylor Patti
, Manolis Papadakis,
Michael Garland
, Alex Aiken, Fredrik Kjolstad,
Michael Bauer
cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications
Mohamed Tarek Ibn Ziad
,
Sana Damani
,
Aamer Jaleel
,
Stephen W. Keckler
,
Mark Stephenson
PLDI
Visibility Algorithms for Dynamic Dependence Analysis and Distributed Coherence
Michael Bauer
, Elliott Slaughter, Sean Treichler, Wonchan Lee,
Michael Garland
, Alex Aiken
Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows
Vijay Kandiah,
Daniel Lustig
,
Oreste Villa
,
David Nellans
, Nikos Hardavellas
2022
Demystifying Map Space Exploration for NPUs
Sheng-Chun Kao,
Angshuman Parashar
,
Po-An Tsai
, Tushar Krishna
Slang Shading Language Advances
Yong He,
Petrik Clarberg
, Theresa Foley
Research Advances Toward Real-Time Path Tracing
Petrik Clarberg
,
Simon Kallweit
,
Craig Kolb
, Pawel Kozlowski, Yong He,
Lifan Wu
, Edward Liu
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators
Prasanth Chatarasi, Hyoukjun Kwon,
Angshuman Parashar
,
Michael Pellauer
, Tushar Krishna, Vivek Sarkar
2021
Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi,
Angshuman Parashar
,
Po-An Tsai
, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna
Cooperative Profile Guided Optimization
Mark Stephenson
, Ram Rangan,
Steve Keckler
Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
Harini Muthukrishnan
,
David Nellans
,
Daniel Lustig
, Jeffrey Fessler, Thomas Wenisch
PGZ: Automatic Zero-Value Code Specialization
Mark Stephenson
, Ram Rangan
Scaling Implicit Parallelism via Dynamic Control Replication
Michael Bauer
, Wonchan Lee, Elliott Slaughter, Zhihao Jia, Mario Di Renzo, Manolis Papadakis, Galen Shipman, Patrick McCormick,
Michael Garland
, Alex Aiken
Hardware Abstractions for Targeting EDDO Architectures with the Polyhedral Model
Angshuman Parashar
, Prasanth Chatarasi,
Po-An Tsai
2020
Locality-Centric Data and Threadblock Management for Massive GPUs
Mahmoud Khairy, Vadim Nikiforov,
David Nellans
, Timothy G. Rogers
A Programmable Approach to Neural Network Compression
Vinu Joseph, Ganesh L. Gopalakrishnan,
Saurav Muralidharan
,
Michael Garland
, Animesh Garg
Zeroploit: Exploiting Zero Valued Operands in Interactive Gaming Applications
Ram Rangan,
Mark Stephenson
, Aditya Ukarande, Shyam Murthy, Virat Agarwal, Marc Blackstein
There’s Plenty of Room at the Top: What Will Drive Computer Performance after Moore’s Law?
Charles E. Leiserson, Neil C. Thompson,
Joel Emer
, Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez , Tao B. Schardl
Speculative Reconvergence for Improved SIMT Efficiency
Sana Damani, Daniel Johnson,
Mark Stephenson
, Eddie Yan, Olivier Giroux, Michael McKeown,
Steve Keckler
2019
Legate NumPy: Accelerated and Distributed Array Computing
Michael Bauer
,
Michael Garland
NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs
Oreste Villa
,
Mark Stephenson
,
David Nellans
,
Steve Keckler
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance
Elliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Qinglei Cao, George Bosilca, Seema Mirchandaney, Wonchan Lee, Sean Treichler, Patrick McCormick, Alex Aiken
Timeloop: A Systematic Approach to DNN Accelerator Evaluation
Angshuman Parashar
, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara,
Rangharajan Venkatesan
,
Brucek Khailany
,
Steve Keckler
,
Joel Emer
Throughput-oriented GPU memory allocation
Isaac Gelado
,
Michael Garland
2018
Optimizing Software-Directed Instruction Replication for GPU Error Detection
Abdulrahman Mahmoud,
Siva Hari
,
Michael B. Sullivan
, Timothy Tsai,
Steve Keckler
Dynamic Tracing: Memoization of Task Graphs for Dynamic Task-based Runtimes
Wonchan Lee, Elliott Slaughter,
Michael Bauer
, Sean Treichler, Todd Warszawski,
Michael Garland
, Alex Aiken
Slang: Language Mechanisms for Extensible Real-time Shading Systems
Yong He, Theresa Foley, Kayvon Fatahalian
Isometry: A Path-Based Distributed Data Transfer System
Zhihao Jia, Sean Treichler, Galen Shipman, Patrick McCormick, Alex Aiken
Scalable Collectives for Distributed Asynchronous Many-Task Runtimes
Matthew Whitlock, Hemanth Kolla, Sean Treichler, Philippe Pebay, Janine C. Bennett
BabelFlow: An Embedded Domain Specific Language for Parallel Analysis and Visualization
Steve Petruzza, Sean Treichler, Valerio Pascucci, Peer-Timo Bremer
Pagination
Current page
1
Page
2
Next page
Next ›
Last page
Last »