## High Performance Computing

 ### Associated Publications

 
### 2026 

[ScheduleStream: Temporal Planning with Samplers for GPU-Accelerated Multi-Arm Task and Motion Planning &amp; Scheduling](/publication/2026-06_schedulestream-temporal-planning-samplers-gpu-accelerated-multi-arm-task-and)

[Caelan Garrett](/person/caelan-garrett), [Fabio Ramos](/person/fabio-ramos)


[IEEE International Conference on Robotics &amp; Automation (ICRA)](https://arxiv.org/abs/2511.04758)


### 2025 

[Augmenting Simulated Noisy Quantum Data Collection by Orders of Magnitude Using Pre-Trajectory Sampling with Batched Execution](/publication/2025-11_augmenting-simulated-noisy-quantum-data-collection-orders-magnitude-using-pre)

[Taylor Patti](/person/taylor-patti), Thien Nguyen, Justin Lietz, Alex McCaskey, [Brucek Khailany](/person/brucek-khailany)


<https://arxiv.org/abs/2504.16297>


[Huge ensembles – Part 2: Properties of a huge ensemble of hindcasts generated with spherical Fourier neural operators](/publication/2025-09_huge-ensembles-part-2-properties-huge-ensemble-hindcasts-generated-spherical)

Ankur Mahesh, William D. Collins, [Boris Bonev](/person/boris-bonev), [Noah Brenowitz](/person/noah-brenowitz), Yair Cohen, Peter Harrington, Karthik Kashinath, Thorsten Kurth, Joshua North, Travis O'Brian, [Mike Pritchard](/person/mike-pritchard), David Pruitt, Mark Risser, Shashank Subramanian, Jared Willard


[GMD Publication](https://gmd.copernicus.org/articles/18/5605/2025/)


[Huge ensembles–Part 1: Design of ensemble weather forecasts using spherical Fourier neural operators](/index.php/publication/2025-09_huge-ensembles-part-1-design-ensemble-weather-forecasts-using-spherical-fourier)

Ankur Mahesh, William D. Collins, [Boris Bonev](/index.php/person/boris-bonev), [Noah Brenowitz](/index.php/person/noah-brenowitz), Yair Cohen, Joshua Elms, Peter Harrington, Karthik Kashinath, Thorsten Kurth, Joshua North, Travis O'Brian, [Mike Pritchard](/index.php/person/mike-pritchard), David Pruitt, Mark Risser, Shashank Subramanian, Jared Willard


[GMD Publication](https://gmd.copernicus.org/articles/18/5575/2025/)


[FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale](/publication/2025-07_fourcastnet-3-geometric-approach-probabilistic-machine-learning-weather)

[Boris Bonev](/person/boris-bonev), Thorsten Kurth, Ankur Mahesh, Mauro Bisson, [Jean Kossaifi](/person/jean-kossaifi), Karthik Kashinath, Anima Anandkumar, William D. Collins, [Mike Pritchard](/person/mike-pritchard), [Alex Keller](/person/alex-keller)


[Task-Based Tensor Computations on Modern GPUs](/publication/2025-06_task-based-tensor-computations-modern-gpus)

Rohan Yadav, [Michael Garland](/person/michael-garland), Alex Aiken, [Michael Bauer](/person/mike-bauer)


[PLDI](https://pldi25.sigplan.org/)


[Beyond the Buzz: A Pragmatic Take on Inference Disaggregation](/index.php/publication/2025-06_beyond-buzz-pragmatic-take-inference-disaggregation)

Tiyasa Mitra, Ritika Borkar, Nidhi Bhatia, Ramon Matas, Shivam Raj, Dheevatsa Mudigere, Ritchie Zhao, Maximilian Golub, Arpan Dutta, Sailaja Madduri, Dharmesh Jani, Brian Pharris, Bita Darvish Rouhani 


[Arxiv](https://arxiv.org/abs/2506.05508)


[SLIM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression](/index.php/publication/2025-06_slim-one-shot-quantization-and-sparsity-low-rank-approximation-llm-weight)

Mohammad Mozaffari , Amir Yazdanbakhsh, [Maryam Mehri Dehnavi](/index.php/person/maryam-mehri-dehnavi)


[ICML 2025](https://icml.cc/virtual/2025/poster/46479)


[Adaptive Algebraic Reuse of Reordering in Cholesky Factorizations with Dynamic Sparsity Patterns](/publication/2025-06_adaptive-algebraic-reuse-reordering-cholesky-factorizations-dynamic-sparsity)

Behrooz Zarebavani, Danny Kaufman, David Levin, [Maryam Mehri Dehnavi](/person/maryam-mehri-dehnavi)


[SIGGRAPH 2025](https://s2025.siggraph.org/)


[Automatic Tracing in Task-Based Runtime Systems](/publication/2025-03_automatic-tracing-task-based-runtime-systems)

Rohan Yadav, [Michael Bauer](/person/mike-bauer), David Broman, [Michael Garland](/person/michael-garland), Alex Aiken, Fredrik Kjolstad


[ASPLOS](https://www.asplos-conference.org/asplos2025/)


[Composing Distributed Computations Through Task and Kernel Fusion](/publication/2025-03_composing-distributed-computations-through-task-and-kernel-fusion)

Rohan Yadav, Shiv Sundrum, Wonchan Lee, [Michael Garland](/person/michael-garland), [Michael Bauer](/person/mike-bauer), Alex Aiken, Fredrik Kjolstad


[ASPLOS](https://www.asplos-conference.org/asplos2025/)


### 2024 

[Differentiable GPU-Parallelized Task and Motion Planning](/publication/2024-11_differentiable-gpu-parallelized-task-and-motion-planning)

William Shen, [Caelan Garrett](/person/caelan-garrett), Nishanth Kumar, [Ankit Goyal](/person/ankit-goyal), [Tucker Hermans](/person/tucker-hermans), Leslie Pack Kaelbling, Tomás Lozano-Pérez, [Fabio Ramos](/person/fabio-ramos)


[Robotics: Science and Systems (RSS)](https://www.roboticsproceedings.org/rss21/p050.html)


### 2023 

[Legate Sparse: Distributed Sparse Computing in Python](/publication/2023-11_legate-sparse-distributed-sparse-computing-python)

Rohan Yadav, Wonchan Lee, [Melih Elibol](/person/melih-elibol), [Taylor Patti](/person/taylor-patti), Manolis Papadakis, [Michael Garland](/person/michael-garland), Alex Aiken, Fredrik Kjolstad, [Michael Bauer](/person/mike-bauer)


[Supercomputing](https://sc23.supercomputing.org/presentation/?id=pap119&sess=sess172)


[Neuralangelo: High-Fidelity Neural Surface Reconstruction](/publication/2023-06_neuralangelo-high-fidelity-neural-surface-reconstruction)

[Max Zhaoshuo Li](/person/max-zhaoshuo-li), [Thomas Müller](/person/thomas-muller), Alex Evans, Russell H. Taylor, Mathias Unberath, [Ming-Yu Liu](/person/ming-yu-liu), [Chen-Hsuan Lin](/person/chen-hsuan-lin)


[CVPR 2023](https://cvpr2023.thecvf.com/)


The Best Inventions of 2023, TIME Magazine


[Visibility Algorithms for Dynamic Dependence Analysis and Distributed Coherence](/publication/2023-02_visibility-algorithms-dynamic-dependence-analysis-and-distributed-coherence)

[Michael Bauer](/person/mike-bauer), Elliott Slaughter, Sean Treichler, Wonchan Lee, [Michael Garland](/person/michael-garland), Alex Aiken


[PPoPP](https://conf.researchr.org/home/ppopp-2023)


[Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows](/publication/2023-02_parsimony-enabling-simdvector-programming-standard-compiler-flows)

Vijay Kandiah, [Daniel Lustig](/person/daniel-lustig), Oreste Villa, [David Nellans](/person/david-nellans), Nikos Hardavellas


[International Symposium on Code Generation and Optimization](https://dl.acm.org/doi/10.1145/3579990.3580019)


[Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs](/publication/2023-01_implementing-reinforcement-learning-datacenter-congestion-control-nvidia-nics)

Benjamin Fuhrer, Yuval Shpigelman, [Chen Tessler](/person/chen-tessler), [Shie Mannor](/person/shie-mannor), [Gal Chechik](/person/gal-chechik), Eitan Zahavy, [Gal Dalal](/person/gal-dalal)


[CCGrid 2023](https://arxiv.org/abs/2207.02295)


### 2022 

[Towards Precision-Aware Fault Tolerance Approaches for Mixed-Precision Applications](/publication/2022-11_towards-precision-aware-fault-tolerance-approaches-mixed-precision-applications)

Bo Fang, [Siva Hari](/person/siva-hari), Timothy Tsai, Xinyi Li, Ganesh Gopalakrishnan, Ignacio Laguna, Kevin Barker, Ang Li


[Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS)](https://ieeexplore.ieee.org/document/10024043)


[Variable Bitrate Neural Fields](/vbnf)

Towaki Takikawa, Alex Evans, [Jonathan Tremblay](/person/jonathan-tremblay), [Thomas Müller](/person/thomas-muller), Morgan McGuire, Alec Jacobson, Sanja Fidler


[ACM SIGGRAPH 2022 Conference Proceedings](https://s2022.siggraph.org/)


[Instant Neural Graphics Primitives with a Multiresolution Hash Encoding](/publication/2022-07_instant-neural-graphics-primitives-multiresolution-hash-encoding)

[Thomas Müller](/person/thomas-muller), Alex Evans, Christoph Schied, [Alex Keller](/person/alex-keller)


[ACM Transactions on Graphics (SIGGRAPH 2022)](https://s2022.siggraph.org)


Best Technical Paper, SIGGRAPH 2022, THE BEST INVENTIONS OF 2022, TIME


[Ray/Ribbon Intersections](/publication/2022-07_rayribbon-intersections)

[Alexander Reshetov](/person/alexander-reshetov)


[Proc. ACM Comput. Graph. Interact. Tech., Vol. 5, No. 3, July 2022.](https://dl.acm.org/journal/pacmcgit)


[GATSPI: GPU Accelerated Gate-Level Simulation for Power Improvement](/publication/2022-03_gatspi-gpu-accelerated-gate-level-simulation-power-improvement)

[Yanqing Zhang](/person/yanqing-zhang), Mark Haoxing Ren, Akshay Sridharan, [Brucek Khailany](/person/brucek-khailany)


[2022 Design Automation Conference](https://www.dac.com)


### 2021 

[GPS: A Global Publish-Subscribe Model for Multi-GPU Memory Management](/publication/2021-10_gps-global-publish-subscribe-model-multi-gpu-memory-management)

[Harini Muthukrishnan](/person/harini-muthukrishnan), [Daniel Lustig](/person/daniel-lustig), [David Nellans](/person/david-nellans), Thomas Wenisch


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3466752.3480088)


Best Paper nominee, IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal in GPUs](/publication/2021-08_emogi-efficient-memory-access-out-memory-graph-traversal-gpus)

Seung Won Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, Wen-mei Hwu


[Proceedings of the VLDB Endownment (VLDB)](https://dl.acm.org/doi/10.14778/3425879.3425883)


[Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture](/publication/2021-08_large-graph-convolutional-network-training-gpu-oriented-data-communication)

Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, [Wen-mei Hwu](/person/wen-mei-hwu)


[Proceedings of the VLDB Endowment (VLDB)](https://dl.acm.org/doi/10.14778/3476249.3476264)


[Suraksha: A Quantitative AV Safety Evaluation Framework to Analyze Safety Implications of Perception Design Choices](/publication/2021-06_suraksha-quantitative-av-safety-evaluation-framework-analyze-safety)

Hengyu Zhao, [Siva Hari](/person/siva-hari), Timothy Tsai, [Michael B. Sullivan](/person/mike-sullivan), [Steve Keckler](/person/stephen-keckler), Jishen Zhao


[Workshop on Safety and Security of Intelligent Vehicles (SSIV)](https://ieeexplore.ieee.org/document/9502467)


[Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers](/index.php/publication/2021-06_efficient-multi-gpu-shared-memory-automatic-optimization-fine-grained-transfers)

[Harini Muthukrishnan](/index.php/person/harini-muthukrishnan), [David Nellans](/index.php/person/david-nellans), [Daniel Lustig](/index.php/person/daniel-lustig), Jeffrey Fessler, Thomas Wenisch


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/9499752)


[Demystifying GPU Reliability: Comparing and Combining Beam Experiments, Fault Simulation, and Profiling](/publication/2021-05_demystifying-gpu-reliability-comparing-and-combining-beam-experiments-fault)

Fernando Fernandes dos Santos, [Siva Hari](/person/siva-hari), Pedro Martins Basso, Luigi Carro, Paolo Rech


[IEEE International Parallel &amp; Distributed Processing Symposium (IPDPS)](https://ieeexplore.ieee.org/document/9460470)


[Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures](/publication/2021-03_learning-sparse-matrix-row-permutations-efficient-spmm-gpu-architectures)

Atefeh Mehrabi, [Donghyuk Lee](/person/donghyuk-lee), [Niladrish Chatterjee](/person/niladrish-chatterjee), Danial J. Sorin, Benjamin C. Lee, [Mike O'Connor](/person/mike-o-connor)


[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/9408181)


[Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture](/publication/2021-03_large-graph-convolutional-network-training-gpu-oriented-data-communication)

Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, [Wen-mei Hwu](/person/wen-mei-hwu)


[ArXiv](https://arxiv.org/abs/2103.03330)


[Scaling Implicit Parallelism via Dynamic Control Replication](/publication/2021-02_scaling-implicit-parallelism-dynamic-control-replication)

[Michael Bauer](/person/mike-bauer), Wonchan Lee, Elliott Slaughter, Zhihao Jia, Mario Di Renzo, Manolis Papadakis, Galen Shipman, Patrick McCormick, [Michael Garland](/person/michael-garland), Alex Aiken


[Principles and Practices of Parallel Programming (PPoPP)](https://ppopp21.sigplan.org/)


### 2020 

[Accelerating Reinforcement Learning through GPU Atari Emulation](/publication/2020-12_accelerating-reinforcement-learning-through-gpu-atari-emulation)

[Iuri Frosio](/person/iuri-frosio), [Steven Dalton](/person/steven-dalton)


[Advances in Neural Information Processing Systems 33 (NeurIPS 2020)](https://nips.cc/Conferences/2020)


[Locality-Centric Data and Threadblock Management for Massive GPUs](/publication/2020-10_locality-centric-data-and-threadblock-management-massive-gpus)

Mahmoud Khairy, Vadim Nikiforov, [David Nellans](/person/david-nellans), Timothy G. Rogers


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/9251964)


[EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs](/publication/2020-06_emogi-efficient-memory-access-out-memory-graph-traversal-gpus)

Seung Won Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, Wen-mei Hwu


[arXiv](https://arxiv.org/abs/2006.06890)


[Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs](/publication/2020-06_buddy-compression-enabling-larger-memory-deep-learning-and-hpc-workloads-gpus)

Esha Chouske, [Michael B. Sullivan](/person/mike-sullivan), [Mike O'Connor](/person/mike-o-connor), Mattan Erez, Jeff Pool, [David Nellans](/person/david-nellans), [Steve Keckler](/person/stephen-keckler)


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/9138915)


[An In-Network Architecture for Accelerating Shared-Memory Multiprocessor Collectives](/publication/2020-05_network-architecture-accelerating-shared-memory-multiprocessor-collectives)

[Benjamin Klenk](/person/ben-klenk), [Ted Jiang](/person/ted-jiang), Greg Thorson, [Larry Dennison](/person/larry-dennison)


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1109/ISCA45697.2020.00085)


[NWChem: Past, Present, and Future](/publication/2020-05_nwchem-past-present-and-future)

Edoardo Aprà, Many others, Oreste Villa, Many others


[The Journal of Chemical Physics](https://aip.scitation.org/doi/pdf/10.1063/5.0004997)


### 2019 

[Near-Memory Data Transformation for Efficient Sparse Matrix Multi-Vector Multiplication](/publication/2019-11_near-memory-data-transformation-efficient-sparse-matrix-multi-vector)

Daichi Fujiki, [Niladrish Chatterjee](/person/niladrish-chatterjee), [Donghyuk Lee](/person/donghyuk-lee), [Mike O'Connor](/person/mike-o-connor)


[International Conference for High-Performance Computing, Networking, Storage, a…](https://dl.acm.org/doi/10.1145/3295500.3356154)


[Highly-scalable, Physics-informed GANs for Learning Solutions of Stochastic PDEs](/publication/2019-10_highly-scalable-physics-informed-gans-learning-solutions-stochastic-pdes)

Liu Yang, Sean Treichler, Thorsten Kurth, Keno Fischer, David Barajas-Solano, Josh Romero, Valentin Churavy, Alexandre Tartakovsky, Michael Houston, Prabhat, George Karniadakis


[arXiv](https://arxiv.org/abs/1910.13444)


[Exascale Deep Learning for Scientific Inverse Problems](/publication/2019-09_exascale-deep-learning-scientific-inverse-problems)

Nouamane Laanait, Joshua Romero, Junqi Yin, M. Todd Young, Sean Treichler, Vitalii Starchenko, Albina Borisevich, Alex Sergeev, Michael Matheson


[arXiv](https://arxiv.org/abs/1909.11150)


[Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance](/publication/2019-08_task-bench-parameterized-benchmark-evaluating-parallel-runtime-performance)

Elliott Slaughter, Wei Wu, Yuankun Fu, Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, Kaleb S. Morris, Qinglei Cao, George Bosilca, Seema Mirchandaney, Wonchan Lee, Sean Treichler, Patrick McCormick, Alex Aiken


[arXiv](https://arxiv.org/abs/1908.05790)


[Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training](/publication/2019-07_optimizing-multi-gpu-parallelization-strategies-deep-learning-training)

Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar, [Yaosheng Fu](/person/yaosheng-fu), Victor Zhang, Szymon Migacz, [David Nellans](/person/david-nellans), Puneet Gupta 


[arXiv](https://arxiv.org/abs/1907.13257)


[GPU-Accelerated Atari Emulation for Reinforcement Learning](/publication/2019-07_gpu-accelerated-atari-emulation-reinforcement-learning)

[Steven Dalton](/person/steven-dalton), [Iuri Frosio](/person/iuri-frosio), [Michael Garland](/person/michael-garland)


[Arxiv](https://arxiv.org/abs/1907.08467)


[GPU Snapshot: Checkpoint Offloading for GPU-Dense Systems](/publication/2019-06_gpu-snapshot-checkpoint-offloading-gpu-dense-systems)

Kyushick Lee, [Michael B. Sullivan](/person/mike-sullivan), [Siva Hari](/person/siva-hari), Timothy Tsai, [Steve Keckler](/person/stephen-keckler), Mattan Erez


[International Conference on Supercomputing](https://dl.acm.org/doi/10.1145/3330345.3330361)


[On the Trend of Resilience for GPU-Dense Systems](/publication/2019-06_trend-resilience-gpu-dense-systems)

Kyushick Lee, [Michael B. Sullivan](/person/mike-sullivan), [Siva Hari](/person/siva-hari), Timothy Tsai, [Steve Keckler](/person/stephen-keckler), Mattan Erez


[International Conference on Dependable Systems and Networks, Supplemental (DSN-…](https://ieeexplore.ieee.org/document/8805794)


Best of SELSE (Workshop on Silicon Errors in Logic - System Effects)


[NVGaze: An Anatomically-Informed Dataset for Low-Latency, Near-Eye Gaze Estimation](/publication/2019-05_nvgaze-anatomically-informed-dataset-low-latency-near-eye-gaze-estimation)

[Joohwan Kim](/person/joohwan-kim), [Michael Stengel](/person/michael-stengel), Alexander Majercik, [Shalini De Mello](/person/shalini-de-mello), David Dunn, [Samuli Laine](/person/samuli-laine), Morgan McGuire, [David Luebke](/person/david-luebke)


ACM Conference on Human-Computer-Interaction (CHI) 2019


[Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs](/publication/2019-04_buddy-compression-enabling-larger-memory-deep-learning-and-hpc-workloads-gpus)

Esha Choukse, [Michael B. Sullivan](/person/mike-sullivan), [Mike O'Connor](/person/mike-o-connor), Mattan Erez, Jeff Pool, [David Nellans](/person/david-nellans), Stephen W. Keckler


[arXiv](https://arxiv.org/abs/1903.02596)


[DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis](/publication/2019-04_delta-gpu-performance-model-deep-learning-applications-depth-memory-system)

Sangkug Lym, [Donghyuk Lee](/person/donghyuk-lee), [Mike O'Connor](/person/mike-o-connor), [Niladrish Chatterjee](/person/niladrish-chatterjee), Mattan Erez


[arXiv](https://arxiv.org/abs/1904.01691)


[A Fast and Robust Method for Avoiding Self-Intersection](/publication/2019-03_fast-and-robust-method-avoiding-self-intersection)

Carsten Wächter, [Nikolaus Binder](/person/nikolaus-binder)


[Ray Tracing Gems](http://www.realtimerendering.com/raytracinggems/)


[Massively Parallel Path Space Filtering](/publication/2019-02_massively-parallel-path-space-filtering)

[Nikolaus Binder](/person/nikolaus-binder), Sascha Fricke, [Alex Keller](/person/alex-keller)


[arXiv](https://arxiv.org/abs/1902.05942?context=cs)


[Metaoptimization on a Distributed System for Deep Reinforcement Learning](/publication/2019-02_metaoptimization-distributed-system-deep-reinforcement-learning)

Greg Heinrich, [Iuri Frosio](/person/iuri-frosio)


[Massively Parallel Construction of Radix Tree Forests for the Efficient Sampling of Discrete Probability Distributions](/publication/2019-01_massively-parallel-construction-radix-tree-forests-efficient-sampling-discrete)

[Nikolaus Binder](/person/nikolaus-binder), [Alex Keller](/person/alex-keller)


[arXiv](https://arxiv.org/abs/1901.05423)


### 2018 

[Dynamic Tracing: Memoization of Task Graphs for Dynamic Task-based Runtimes](/index.php/publication/2018-11_dynamic-tracing-memoization-task-graphs-dynamic-task-based-runtimes)

Wonchan Lee, Elliott Slaughter, [Michael Bauer](/index.php/person/mike-bauer), Sean Treichler, Todd Warszawski, [Michael Garland](/index.php/person/michael-garland), Alex Aiken


[International Conference for High Performance Computing and Communications (SC'…](https://dl.acm.org/doi/10.5555/3291656.3291702)


[Evaluating and Accelerating High-Fidelity Error Injection for HPC](/publication/2018-11_evaluating-and-accelerating-high-fidelity-error-injection-hpc)

Chun-Kai Chang, Sangkug Lym, Nicholas Kelly, [Michael B. Sullivan](/person/mike-sullivan), Mattan Erez


[The International Conference on High Performance Computing, Networking, Storage…](https://ieeexplore.ieee.org/abstract/document/8665790)


[Exascale Deep Learning for Climate Analytics](/publication/2018-11_exascale-deep-learning-climate-analytics)

Thorsten Kurth, Sean Treichler, Joshua Romero, Mayur Mudigonda, Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, Prabhat, Michael Houston


[International Conference for High Performance Computing and Communications (SC'…](https://dl.acm.org/doi/10.5555/3291656.3291724)


[Exploiting Idle Resources in a High-Radix Switch for Supplemental Storage](/publication/2018-11_exploiting-idle-resources-high-radix-switch-supplemental-storage)

[Matthias Blumrich](/person/matthias-blumrich), [Ted Jiang](/person/ted-jiang), [Larry Dennison](/person/larry-dennison)


[Proceedings of the International Conference for High Performance Computing, Net…](https://dl.acm.org/citation.cfm?id=3291662)


[Fast, High Precision Ray/Fiber Intersection using Tight, Disjoint Bounding Volumes](/publication/2018-11_fast-high-precision-rayfiber-intersection-using-tight-disjoint-bounding-volumes)

[Nikolaus Binder](/person/nikolaus-binder), [Alex Keller](/person/alex-keller)


[arXiv](https://arxiv.org/abs/1811.03374)


[Massively Parallel Stackless Ray Tracing of Catmull-Clark Subdivision Surfaces](/publication/2018-11_massively-parallel-stackless-ray-tracing-catmull-clark-subdivision-surfaces)

[Nikolaus Binder](/person/nikolaus-binder), [Alex Keller](/person/alex-keller)


[arXiv](https://arxiv.org/abs/1811.03510)


[Exascale Deep Learning for Climate Analytics](/publication/2018-10_exascale-deep-learning-climate-analytics)

Thorsten Kurth, Sean Treichler, Joshua Romero, Mayur Mudigonda, Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, Prabhat, Michael Houston


[arXiv](https://arxiv.org/abs/1810.01993)


[CRUM: Checkpoint-Restart Support for CUDA's Unified Memory](/publication/2018-09_crum-checkpoint-restart-support-cuda-s-unified-memory)

Rohan Garg, Apoorve Mohan, [Michael B. Sullivan](/person/mike-sullivan), Gene Cooperman


[The International Conference on Cluster Computing (IEEE CLUSTER)](https://ieeexplore.ieee.org/abstract/document/8514890)


[Phantom Ray-Hair Intersector](/publication/2018-08_phantom-ray-hair-intersector)

[Alexander Reshetov](/person/alexander-reshetov), [David Luebke](/person/david-luebke)


[Proceedings of the ACM on Computer Graphics and Interactive Techniques](https://dl.acm.org/citation.cfm?id=3233307)


[Hamartia: A Fast and Accurate Error Injection Framework](/publication/2018-06_hamartia-fast-and-accurate-error-injection-framework)

Chun-Kai Chang, Sangkug Lym, Nicholas Kelly, [Michael B. Sullivan](/person/mike-sullivan), Mattan Erez


[The International Conference on Dependable Systems and Networks Workshops (DSN-…](https://ieeexplore.ieee.org/abstract/document/8416231)


[Isometry: A Path-Based Distributed Data Transfer System](/index.php/publication/2018-06_isometry-path-based-distributed-data-transfer-system)

Zhihao Jia, Sean Treichler, Galen Shipman, Patrick McCormick, Alex Aiken


[International Conference on Supercomputing (ICS)](https://dl.acm.org/doi/abs/10.1145/3205289.3205301)


[Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training](/publication/2018-06_structurally-sparsified-backward-propagation-faster-long-short-term-memory)

Maohua Zhu, [Jason Clemons](/person/jason-clemons), Jeff Pool, Minsoo Rhu, [Steve Keckler](/person/stephen-keckler), Yuan Xie


[arXiv](https://arxiv.org/abs/1806.00512)


[Scalable Collectives for Distributed Asynchronous Many-Task Runtimes](/publication/2018-05_scalable-collectives-distributed-asynchronous-many-task-runtimes)

Matthew Whitlock, Hemanth Kolla, Sean Treichler, Philippe Pebay, Janine C. Bennett


[International Parallel and Distributed Processing Symposium (IPDPS) - workshops](https://ieeexplore.ieee.org/document/8425445)


[BabelFlow: An Embedded Domain Specific Language for Parallel Analysis and Visualization](/publication/2018-05_babelflow-embedded-domain-specific-language-parallel-analysis-and-visualization)

Steve Petruzza, Sean Treichler, Valerio Pascucci, Peer-Timo Bremer


[International Parallel and Distributed Processing Symposium (IPDPS)](https://ieeexplore.ieee.org/document/8425200)


### 2017 

[Integrating External Resources with a Task-Based Programming Model](/publication/2017-12_integrating-external-resources-task-based-programming-model)

Zhihao Jia, Sean Treichler, Galen Shipman, [Michael Bauer](/person/mike-bauer), Noah Watkins, Carlos Maltzahn, Patrick McCormick, Alex Aiken


[International Conference on High Performance Computing (HiPC)](https://ieeexplore.ieee.org/document/8287762)


[AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks](/publication/2017-12_adabatch-adaptive-batch-sizes-training-deep-neural-networks)

Aditya Devarakonda, Maxim Naumov, [Michael Garland](/person/michael-garland)


[arXiv:1712.02029 \[cs.LG\]](https://arxiv.org/abs/1712.02029)


[ Near-eye Light Field Holographic Rendering with Spherical Waves for Wide Field of View Interactive 3D Computer Graphics ](/publication/2017-11_near-eye-light-field-holographic-rendering-spherical-waves-wide-field-view)

Liang Shi, Fu-Chung Huang, [Ward Lopes](/person/ward-lopes), Wojciech Matusik, [David Luebke](/person/david-luebke)


[ACM SIGGRAPH ASIA 2017](https://sa2017.siggraph.org/)


[A Novel Shard-Based Approach for Asynchronous Many-Task Models for In Situ Analysis](/publication/2017-11_novel-shard-based-approach-asynchronous-many-task-models-situ-analysis)

Philippe P. Pébaÿ, Giulio Borghesi, Hemanth Kolla, Janine C. Bennett, Sean Treichler


[Workshop on In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visu…](https://dl.acm.org/doi/10.1145/3144769.3144775)


[Low Communication FMM-Accelerated FFT on GPUs](/publication/2017-11_low-communication-fmm-accelerated-fft-gpus)

[Cris Cecka](/person/cris-cecka)


[The International Conference for High Performance Computing, Networking, Storag…](https://sc17.supercomputing.org/)


[Parallel Jaccard and Related Graph Clustering Techniques](/publication/2017-11_parallel-jaccard-and-related-graph-clustering-techniques)

Alexandre Fender, Nahid Emad, Serge Petiton, Joe Eaton, Maxim Naumov


[Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for L…](https://dl.acm.org/citation.cfm?id=3148231&CFID=830256003&CFTOKEN=94614338)


[Control Replication: Compiling Implicit Parallelism to Efficient SPMD with Logical Regions](/publication/2017-11_control-replication-compiling-implicit-parallelism-efficient-spmd-logical)

Elliott Slaughter, Wonchan Lee, Sean Treichler, Wen Zhang, [Michael Bauer](/person/mike-bauer), Galen Shipman, Patrick McCormick, Alex Aiken


[ International Conference for High Performance Computing and Communications (SC…](https://dl.acm.org/doi/10.1145/3126908.3126949)


[Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems](/publication/2017-10_fine-grained-dram-energy-efficient-dram-extreme-bandwidth-systems)

[Mike O'Connor](/person/mike-o-connor), [Niladrish Chatterjee](/person/niladrish-chatterjee), [Donghyuk Lee](/person/donghyuk-lee), [John Wilson](/person/john-wilson), Aditya Agrawal, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/citation.cfm?id=3124545)


[Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form](/publication/2017-09_feedforward-and-recurrent-neural-networks-backward-propagation-and-hessian)

Maxim Naumov


[arXiv:1709.06080 \[cs.LG\]](https://arxiv.org/abs/1709.06080)


[Exploiting Budan-Fourier and Vincent’s Theorems for Ray Tracing 3D Bézier Curves](/index.php/publication/2017-07_exploiting-budan-fourier-and-vincents-theorems-ray-tracing-3d-bezier-curves)

[Alexander Reshetov](/index.php/person/alexander-reshetov)


[High-Performance Graphics 2017 ](http://www.highperformancegraphics.org)


[Parallel Modularity Clustering](/publication/2017-06_parallel-modularity-clustering)

Alexandre Fender, Nahid Emad, Serge Petiton, Maxim Naumov


[Procedia Computer Science (Elsevier)](https://doi.org/10.1016/j.procs.2017.05.198)


[Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors](/publication/2017-06_relaxations-high-performance-message-passing-massively-parallel-simt-processors)

Benjamin Klenk, Holger Fröning, [Hans Eberle](/person/hans-eberle), [Larry Dennison](/person/larry-dennison)


[32nd IEEE International Parallel and Distributed Processing](http://www.ipdps.org)


Best Paper Award


[The Iray Light Transport Simulation and Rendering System](/publication/2017-05_iray-light-transport-simulation-and-rendering-system)

[Alex Keller](/person/alex-keller), Carsten Wächter, Matthias Raab, Daniel Seibert, Dietger van Antwerpen, Johann Korndörfer, Lutz Kettner


[arXiv](https://arxiv.org/abs/1705.01263)


[SASSIFI: An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation](/publication/2017-04_sassifi-architecture-level-fault-injection-tool-gpu-application-resilience)

[Siva Hari](/person/siva-hari), Timothy Tsai, [Mark Stephenson](/person/mark-stephenson), [Steve Keckler](/person/stephen-keckler), [Joel Emer](/person/joel-emer)


[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/7975296)


[Parallel Depth-First Search for Directed Acyclic Graphs](/publication/2017-03_parallel-depth-first-search-directed-acyclic-graphs)

Maxim Naumov, Alysson Vrielink, [Michael Garland](/person/michael-garland)


Technical Report NVR-2017-001


### 2016 

[Tensor Contractions with Extended BLAS Kernels on CPU and GPU](/index.php/publication/2016-12_tensor-contractions-extended-blas-kernels-cpu-and-gpu)

Yang Shi, U. N. Niranjan, Animashree Anandkumar, [Cris Cecka](/index.php/person/cris-cecka)


[2016 IEEE 23rd International Conference on High Performance Computing (HiPC) ](http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7837798)


[Tensor Contractions with Extended BLAS Kernels on CPU and GPU](/index.php/publication/2016-12_tensor-contractions-extended-blas-kernels-cpu-and-gpu)

Yang Shi, U. N. Niranjan, Animashree Anandkumar, [Cris Cecka](/index.php/person/cris-cecka)


[2016 IEEE 23rd International Conference on High Performance Computing (HiPC) ](http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7837798)


[Approxilyzer: Towards A Systematic Framework for Instruction-Level Approximate Computing and its Application to Hardware Resiliency](/publication/2016-10_approxilyzer-towards-systematic-framework-instruction-level-approximate)

Radha Venkatagiri, Abdulrahman Mahmoud, [Siva Hari](/person/siva-hari), Sarita Adve


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7783745)


[vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design.](/publication/2016-10_vdnn-virtualized-deep-neural-networks-scalable-memory-efficient-neural-network)

Minsoo Rhu, Natalia Gimelshein, [Jason Clemons](/person/jason-clemons), Arslan Zulfiqar, [Steve Keckler](/person/stephen-keckler)


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.5555/3195638.3195660)


[All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory](/publication/2016-06_all-inclusive-ecc-thorough-end-end-protection-reliable-computer-memory)

Jungrae Kim, [Michael B. Sullivan](/person/mike-sullivan), Sangkug Lym, Mattan Erez


[The International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3007787.3001203)


[S-Step and Communication-Avoiding Iterative Methods](/publication/2016-04_s-step-and-communication-avoiding-iterative-methods)

Maxim Naumov


Technical Report NVR-2016-003


[Towards High Performance Paged Memory for GPUs](/publication/2016-03_towards-high-performance-paged-memory-gpus)

Tianhao Zheng, [David Nellans](/person/david-nellans), Arslan Zulfiqar, [Mark Stephenson](/person/mark-stephenson), [Steve Keckler](/person/stephen-keckler)


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/7446077)


[A Case for Toggle-Aware Compression for GPU Systems](/publication/2016-03_case-toggle-aware-compression-gpu-systems)

Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, [Steve Keckler](/person/stephen-keckler)


[International Symposium on High Performance Computer Architecture (HPCA)](http://ieeexplore.ieee.org/document/7446064/)


[Selective GPU Caches to Eliminate CPU-GPU HW Cache Coherence](/index.php/publication/2016-03_selective-gpu-caches-eliminate-cpu-gpu-hw-cache-coherence)

Neha Agarwal, [David Nellans](/index.php/person/david-nellans), Eiman Ebrahimi, Thomas F. Wenisch, John Danskin, [Steve Keckler](/index.php/person/stephen-keckler)


[ International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/7446089)


[Parallel Spectral Graph Partitioning](/publication/2016-03_parallel-spectral-graph-partitioning)

Maxim Naumov, Timothy Moon


Technical Report NVR-2016-001


### 2015 

[Network Endpoint Congestion Control for Fine-Grained Communication](/index.php/publication/2015-11_network-endpoint-congestion-control-fine-grained-communication)

[Ted Jiang](/index.php/person/ted-jiang), [Larry Dennison](/index.php/person/larry-dennison), [William Dally](/index.php/person/william-dally)


[SC15](http://dl.acm.org/citation.cfm?id=2807600)


[The Light Field Stereoscope](/index.php/publication/2015-07_light-field-stereoscope)

Fu-Chung Huang, [David Luebke](/index.php/person/david-luebke), Gordon Wetzstein


[ACM SIGGRAPH 2015 Emerging Technologies](http://dl.acm.org/citation.cfm?id=2792493)


[Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU](/publication/2015-05_parallel-graph-coloring-applications-incomplete-lu-factorization-gpu)

Maxim Naumov, Patrice Castonguay, Jonathan Cohen


Technical Report NVR-2015-001


[In-Memory Graph Databases for Web-Scale Data](/publication/2015-03_memory-graph-databases-web-scale-data)

Vito Giovanni Castellana, Alessandro Morari, Jesse Weaver, Antonino Time, David Haglin, Oreste Villa, John Feo


[IEEE Computer](https://ieeexplore.ieee.org/document/7063171)


### 2014 

[Scaling the Power Wall: A Path to Exascale](/publication/2014-11_scaling-power-wall-path-exascale)

Oreste Villa, Daniel Johnson, [Mike O'Connor](/person/mike-o-connor), Evgeny Bolotin, [David Nellans](/person/david-nellans), Justin Luitjens, Nikolai Sakharnykh, Peng Wang, Paulius Micikevicius, Anthony Scudiero, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)


[SC '14](http://ieeexplore.ieee.org/abstract/document/7013055/)


### 2012 

[Preconditioned Block-Iterative Methods on GPUs](/publication/2012-12_preconditioned-block-iterative-methods-gpus)

Maxim Naumov


[Proceedings in Applied Mathematics and Mechanics](http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1617-7061)


[Efficient Parallel Merge Sort for Fixed and Variable Length Keys ](/publication/2012-05_efficient-parallel-merge-sort-fixed-and-variable-length-keys)

Andrew Davidson, David Tarjan, [Michael Garland](/person/michael-garland), John Owens


[Proc. Innovative Parallel Computing](http://innovativeparallel.org/)


[Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU](/index.php/publication/2012-05_incomplete-lu-and-cholesky-factorization-preconditioned-iterative-methods-gpu)

Maxim Naumov


Technical Report NVR-2012-003


[Scalable GPU Graph Traversal](/publication/2012-02_scalable-gpu-graph-traversal)

[Duane Merrill](/person/duane-merrill%2520iii), [Michael Garland](/person/michael-garland), Andrew Grimshaw


[17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (…](http://dynopt.org/ppopp-2012/)


### 2011 

[Allocation-oriented Algorithm Design with Application to GPU Computing, Ph.D. Dissertation](/publication/2011-12_allocation-oriented-algorithm-design-application-gpu-computing-phd-dissertation)

[Duane Merrill](/person/duane-merrill%2520iii)


[Department of Computer Science, University of Virginia](http://www.cs.virginia.edu)


[Thrust: A Productivity-Oriented Library for CUDA](/publication/2011-10_thrust-productivity-oriented-library-cuda)

Nathan Bell, [Jared Hoberock](/person/jared-hoberock)


[GPU Computing Gems, Jade Edition, Edited by Wen-mei W. Hwu](http://mkp.com/news/3405)


[High Performance and Scalable GPU Graph Traversal](/publication/2011-08_high-performance-and-scalable-gpu-graph-traversal)

Duane Merrill, [Michael Garland](/person/michael-garland), Andrew Grimshaw


[Technical Report CS-2011-05, Department of Computer Science, University of Virg…](http://www.cs.virginia.edu)


[Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU](/publication/2011-06_parallel-solution-sparse-triangular-linear-systems-preconditioned-iterative)

Maxim Naumov


Technical Report NVR-2011-001


### 2010 

[Scalable Fluid Simulation using Anisotropic Turbulence Particles](/publication/2010-12_scalable-fluid-simulation-using-anisotropic-turbulence-particles)

Tobias Pfaff, Nils Thurey, Jonathan Cohen, Sarah Tariq, Markus Gross


[ACM Transactions on Graphics (SIGGRAPH Asia 2010)](http://www.siggraph.org/asia2010/)


[Sparse Matrix-Vector Multiplication on Multicore and Accelerators](/publication/2010-12_sparse-matrix-vector-multiplication-multicore-and-accelerators)

Sam Williams, Nathan Bell, Jee Whan Choi, [Michael Garland](/person/michael-garland), Leonid Oliker, Richard Vuduc


[ Scientific Computing on Multicore and Accelerators](http://www.crcpress.com/product/isbn/9781439825365)


[Interactive Fluid-Particle Simulation using Translating Eulerian Grids](/publication/2010-02_interactive-fluid-particle-simulation-using-translating-eulerian-grids)

Jonathan Cohen, Sarah Tariq, Simon Green


[Interactive 3D Graphics and Games (I3D) 2010](http://graphics.cs.williams.edu/i3d10/)


### 2009 

[Implementing Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors](/index.php/publication/2009-11_implementing-sparse-matrix-vector-multiplication-throughput-oriented-processors)

Nathan Bell, [Michael Garland](/index.php/person/michael-garland)


[Proc. Supercomputing '09](http://sc09.supercomputing.org/)


[A Fast Double Precision CFD Code Using CUDA](/publication/2009-05_fast-double-precision-cfd-code-using-cuda)

Jonathan Cohen, M. Jeroen Molemaker


[Proceedings of Parallel Computational Fluid Dynamics 2009](http://www.parcfd.org/2009/)


### 2008 

[Low Viscosity Flow Simulations for Animation](/publication/2008-07_low-viscosity-flow-simulations-animation)

M .Jeroen Molemaker, Jonathan Cohen, Sanjit Patel, Jun-Yong Noh


[Symposium on Computer Animation (SCA) 2008](http://gv2.cs.tcd.ie/sca08/)


### 2005 

[A Survey of General-Purpose Computation on Graphics Hardware ](/publication/2005-08_survey-general-purpose-computation-graphics-hardware)

John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron Lefohn, Timothy J. Purcell


[Eurographics 2005, State of the Art Reports](http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=844)


 ### Researchers

 
[Aamer Jaleel](/person/aamer-jaleel)


[Abdul Aldossary](/person/abdul-aldossary)


[Alán Aspuru-Guzik](/person/alan-aspuru-guzik)


[Benjamin Klenk](/person/ben-klenk)


[Boris Bonev](/person/boris-bonev)


[Brent Keeth](/person/brent-keeth)


[Charles Loop](/person/charles-loop)


[Chris Wyman](/person/chris-wyman)


[Cris Cecka](/person/cris-cecka)


[David Nellans](/person/david-nellans)


[Dennis Abts](/person/dennis-abts)


[Donghyuk Lee](/person/donghyuk-lee)


[Guillermo Marcus](/person/guillermo-marcus)


[Hans Eberle](/person/hans-eberle)


[Haoyu Yang](/person/haoyu-yang)


[Harini Muthukrishnan](/person/harini-muthukrishnan)


[Isaac Gelado](/person/isaac-gelado)


[Iuri Frosio](/person/iuri-frosio)


[Jaideep Pathak](/person/jaideep-pathak)


[Jerome Gonthier](/person/jerome-gonthier)


[Matthias Blumrich](/person/matthias-blumrich)


[Michael Garland](/person/michael-garland)


[Michael Bauer](/person/mike-bauer)


[Mike O'Connor](/person/mike-o-connor)


[Nicolai Oswald](/person/nicolai-oswald)


[Nikolaus Binder](/person/nikolaus-binder)


[Niladrish Chatterjee](/person/niladrish-chatterjee)


[Samuli Laine](/index.php/person/samuli-laine)


[Saurav Muralidharan](/index.php/person/saurav-muralidharan)


[Sebastian Cammerer](/index.php/person/sebastian-cammerer)


[Siva Hari](/person/siva-hari)


[Steve Keckler](/person/stephen-keckler)


[Steven Dalton](/person/steven-dalton)


[Taylor Patti](/person/taylor-patti)


[Ted Jiang](/person/ted-jiang)


[Vinu Joseph](/index.php/person/vinu-joseph)


[Wen-mei Hwu](/person/wen-mei-hwu)


[William Dally](/person/william-dally)


[Yaosheng Fu](/person/yaosheng-fu)


[Zachary Susskind](/person/zachary-susskind)