Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Skip to main content
Publications
Our publications provide insight into some of our leading-edge research.
Filters
Search
Apply
Filters
Filters
Publication Year
2025
(7)
2024
(1)
2023
(5)
2022
(5)
2021
(9)
2020
(6)
2019
(15)
2018
(14)
2017
(15)
2016
(9)
2015
(4)
2014
(1)
2012
(4)
2011
(4)
2010
(3)
2009
(2)
2008
(1)
2005
(1)
Facet Publication Year
Research Areas
High Performance Computing
(22)
Algorithms and Numerical Methods
(9)
Programming Languages, Systems and Tools
(8)
Artificial Intelligence and Machine Learning
(7)
Computer Graphics
(4)
Computer Architecture
(2)
Generative AI
(2)
Real-Time Rendering
(2)
Climate Simulation
(1)
Networking
(1)
Resilience and Safety
(1)
VR, AR and Display Technology
(1)
Events
ICML
(1)
PLDI
(1)
SIGGRAPH
(1)
22 results found
High Performance Computing
Clear all
2025
2017
High Performance Computing
2025
FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale
Boris Bonev
, Thorsten Kurth, Ankur Mahesh, Mauro Bisson,
Jean Kossaifi
, Karthik Kashinath, Anima Anandkumar, William D. Collins,
Mike Pritchard
,
Alex Keller
Task-Based Tensor Computations on Modern GPUs
Rohan Yadav,
Michael Garland
, Alex Aiken,
Michael Bauer
PLDI
Beyond the Buzz: A Pragmatic Take on Inference Disaggregation
Tiyasa Mitra, Ritika Borkar, Nidhi Bhatia, Ramon Matas, Shivam Raj, Dheevatsa Mudigere, Ritchie Zhao, Maximilian Golub, Arpan Dutta, Sailaja Madduri, Dharmesh Jani, Brian Pharris, Bita Darvish Rouhani
SLIM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression
Mohammad Mozaffari , Amir Yazdanbakhsh,
Maryam Mehri Dehnavi
ICML
Adaptive Algebraic Reuse of Reordering in Cholesky Factorizations with Dynamic Sparsity Patterns
Behrooz Zarebavani, Danny Kaufman, David Levin,
Maryam Mehri Dehnavi
SIGGRAPH
Composing Distributed Computations Through Task and Kernel Fusion
Rohan Yadav, Shiv Sundrum, Wonchan Lee,
Michael Garland
,
Michael Bauer
, Alex Aiken, Fredrik Kjolstad
Automatic Tracing in Task-Based Runtime Systems
Rohan Yadav,
Michael Bauer
, David Broman,
Michael Garland
, Alex Aiken, Fredrik Kjolstad
2017
Integrating External Resources with a Task-Based Programming Model
Zhihao Jia, Sean Treichler, Galen Shipman,
Michael Bauer
, Noah Watkins, Carlos Maltzahn, Patrick McCormick, Alex Aiken
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda, Maxim Naumov,
Michael Garland
Near-eye Light Field Holographic Rendering with Spherical Waves for Wide Field of View Interactive 3D Computer Graphics
Liang Shi, Fu-Chung Huang,
Ward Lopes
, Wojciech Matusik,
David Luebke
A Novel Shard-Based Approach for Asynchronous Many-Task Models for In Situ Analysis
Philippe P. Pébaÿ, Giulio Borghesi, Hemanth Kolla, Janine C. Bennett, Sean Treichler
Control Replication: Compiling Implicit Parallelism to Efficient SPMD with Logical Regions
Elliott Slaughter, Wonchan Lee, Sean Treichler, Wen Zhang,
Michael Bauer
, Galen Shipman, Patrick McCormick, Alex Aiken
Low Communication FMM-Accelerated FFT on GPUs
Cris Cecka
Parallel Jaccard and Related Graph Clustering Techniques
Alexandre Fender, Nahid Emad, Serge Petiton, Joe Eaton, Maxim Naumov
Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems
Mike O'Connor
,
Niladrish Chatterjee
,
Donghyuk Lee
,
John Wilson
, Aditya Agrawal,
Steve Keckler
,
William Dally
Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form
Maxim Naumov
Exploiting Budan-Fourier and Vincent’s Theorems for Ray Tracing 3D Bézier Curves
Alexander Reshetov
Parallel Modularity Clustering
Alexandre Fender, Nahid Emad, Serge Petiton, Maxim Naumov
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors
Benjamin Klenk, Holger Fröning,
Hans Eberle
,
Larry Dennison
Best Paper Award
The Iray Light Transport Simulation and Rendering System
Alex Keller
, Carsten Wächter, Matthias Raab, Daniel Seibert, Dietger van Antwerpen, Johann Korndörfer, Lutz Kettner
SASSIFI: An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation
Siva Hari
, Timothy Tsai,
Mark Stephenson
,
Steve Keckler
,
Joel Emer
Parallel Depth-First Search for Directed Acyclic Graphs
Maxim Naumov, Alysson Vrielink,
Michael Garland