## Computer Architecture

 ### Associated Publications

 
### 2026 

[Fast AI-Based Pre-Decoders for Surface Codes](/index.php/publication/2026-04_fast-ai-based-pre-decoders-surface-codes)

Christopher Chamberland, Jan Olle, Muyuan Li, Scott Thornton, Igor Baratta


[Hunting CUDA Bugs at Scale with cuFuzz](/publication/2026-03_hunting-cuda-bugs-scale-cufuzz)

[Mohamed Tarek Ibn Ziad](/person/mohamed-tarek-ibn-ziad), [Christos Kozyrakis](/person/christos-kozyrakis)


[International Conference on Object-Oriented Programming Systems, Languages, and…](https://doi.org/10.1145/3798231)


[Alpha-Vision: A Real-Time Always-on Vision Processor with 787µs Face Detection Latency in &lt;5mW](/index.php/publication/2026-02_alpha-vision-real-time-always-vision-processor-787ms-face-detection-latency)

[Ben Keller](/index.php/person/ben-keller), [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Steve Dai](/index.php/person/steve-dai), [Jason Clemons](/index.php/person/jason-clemons), [Matt Fojtik](/index.php/person/matt-fojtik), [Muya Chang](/index.php/person/muya-chang), Thierry Tambe, [Nathaniel Pinckney](/index.php/person/nathaniel-pinckney), [Stephen Tell](/index.php/person/stephen-tell), [Qijing Jenny Huang](/index.php/person/qijing-jenny-huang), [Shalini De Mello](/index.php/person/shalini-de-mello), [Brucek Khailany](/index.php/person/brucek-khailany)


[ISSCC 2026](https://www.isscc.org/)


### 2025 

[GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting](/publication/2025-06_gaurast-enhancing-gpu-triangle-rasterizers-accelerate-3d-gaussian-splatting)

Sixu Li, [Ben Keller](/person/ben-keller), Yingyan Celine Lin, [Brucek Khailany](/person/brucek-khailany)


[Design Automation Conference (DAC)](https://arxiv.org/abs/2503.16681)


### 2023 

[Unity ECC: Unified Memory Protection Against Bit and Chip Errors](/index.php/publication/2023-11_unity-ecc-unified-memory-protection-against-bit-and-chip-errors)

Dongwhee Kim, Jaeyoon Lee, Wonyeong Jung, [Michael B. Sullivan](/index.php/person/mike-sullivan), Jungrae Kim


[International Conference for High Performance Computing, Networking, Storage an…](https://dl.acm.org/doi/abs/10.1145/3581784.3607081)


[VaPr: Variable-Precision Tensors to Accelerate Robot Motion Planning](/publication/2023-10_vapr-variable-precision-tensors-accelerate-robot-motion-planning)

Yu-Shun Hsiao, [Siva Hari](/person/siva-hari), [Balakumar Sundaralingam](/person/balakumar-sundaralingam), Jason Yik, Thierry Tambe, [Charbel Sakr](/person/charbel-sakr), [Steve Keckler](/person/stephen-keckler), Vijay Janapa Reddi


[IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)](https://ieee-iros.org/)


[Efficient Transformer Inference with Statically Structured Sparse Attention](/publication/2023-07_efficient-transformer-inference-statically-structured-sparse-attention)

[Steve Dai](/person/steve-dai), Hasan Genc, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany)


[2023 60th ACM/IEEE Design Automation Conference (DAC)](https://ieeexplore.ieee.org/xpl/conhome/10247654/proceeding)


[Implicit Memory Tagging: No-Overhead Memory Safety Using Alias-Free Tagged ECC](/publication/2023-06_implicit-memory-tagging-no-overhead-memory-safety-using-alias-free-tagged-ecc)

[Michael B. Sullivan](/person/mike-sullivan), [Mohamed Tarek Ibn Ziad](/person/mohamed-tarek-ibn-ziad), [Aamer Jaleel](/person/aamer-jaleel), [Stephen W. Keckler](/person/stephen-keckler)


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/abs/10.1145/3579371.3589102)


[CuRobo: Parallelized Collision-Free Robot Motion Generation](/publication/2023-05_curobo-parallelized-collision-free-robot-motion-generation)

[Balakumar Sundaralingam](/person/balakumar-sundaralingam), [Siva Hari](/person/siva-hari), Adam Fishman, [Caelan Garrett](/person/caelan-garrett), Karl Van Wyk, [Valts Blukis](/person/valts-blukis), Alexander Millane, Helen Oleynikova, Ankur Handa, [Fabio Ramos](/person/fabio-ramos), Nathan Ratliff, Dieter Fox


[IEEE International Conference on Robotics and Automation (ICRA)](https://www.icra2023.org/)


[Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows](/publication/2023-02_parsimony-enabling-simdvector-programming-standard-compiler-flows)

Vijay Kandiah, [Daniel Lustig](/person/daniel-lustig), Oreste Villa, [David Nellans](/person/david-nellans), Nikos Hardavellas


[International Symposium on Code Generation and Optimization](https://dl.acm.org/doi/10.1145/3579990.3580019)


[A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm](/publication/2023-01_956-topsw-deep-learning-inference-accelerator-vector-scaled-4-bit-quantization)

[Ben Keller](/person/ben-keller), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Stephen Tell](/person/stephen-tell), [Brian Zimmer](/person/brian-zimmer), [Charbel Sakr](/person/charbel-sakr), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany)


[Journal of Solid-State Circuits](https://ieeexplore.ieee.org/document/10019275)


### 2022 

[HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression](/index.php/publication/2022-12_heat-hardware-efficient-automatic-tensor-decomposition-transformer-compression)

Jiaqi Gu, [Ben Keller](/index.php/person/ben-keller), [Jean Kossaifi](/index.php/person/jean-kossaifi), Anima Anandkumar, [Brucek Khailany](/index.php/person/brucek-khailany), David Z. Pan


[Workshop on ML for Systems at NeurIPS](http://mlforsystems.org)


Spotlight Paper


[LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update](/publication/2022-12_lns-madam-low-precision-training-logarithmic-number-system-using-multiplicative)

Jiawei Zhao, [Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), Mustafa Ali, [Ming-Yu Liu](/person/ming-yu-liu), [Brucek Khailany](/person/brucek-khailany), [William Dally](/person/william-dally), Anima Anandkumar


[IEEE Transactions on Computers (Volume: 71, Issue: 12, 01 December 2022)](https://www.computer.org/csdl/journal/tc)


[Towards Precision-Aware Fault Tolerance Approaches for Mixed-Precision Applications](/publication/2022-11_towards-precision-aware-fault-tolerance-approaches-mixed-precision-applications)

Bo Fang, [Siva Hari](/person/siva-hari), Timothy Tsai, Xinyi Li, Ganesh Gopalakrishnan, Ignacio Laguna, Kevin Barker, Ang Li


[Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS)](https://ieeexplore.ieee.org/document/10024043)


[The Implications of Page Size Management on Graph Analytics](/index.php/publication/2022-11_implications-page-size-management-graph-analytics)

Aninda Manocha, [Zi Yan](/index.php/person/zi-yan), Esin Tureci, Juan Luis Aragón, [David Nellans](/index.php/person/david-nellans), Margaret Martonosi


[International Symposium on Workload Characterization (IISWC)](https://ieeexplore.ieee.org/document/9975438)


[Demystifying Map Space Exploration for NPUs](/publication/2022-11_demystifying-map-space-exploration-npus)

Sheng-Chun Kao, [Angshuman Parashar](/person/angshuman-parashar), [Po-An Tsai](/person/po-an-tsai), Tushar Krishna


[International Symposium on Workload Characterization (IISWC)](https://ieeexplore.ieee.org/document/9975389)


[Sparseloop: An Analytical Approach to Sparse Tensor Accelerator Modeling](/index.php/publication/2022-10_sparseloop-analytical-approach-sparse-tensor-accelerator-modeling)

Yannan Nellie Wu, [Po-An Tsai](/index.php/person/po-an-tsai), [Angshuman Parashar](/index.php/person/angshuman-parashar), Vivienne Sze, [Joel Emer](/index.php/person/joel-emer)


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/9923807)


Distinguished Artifact award


[SEC-BADAEC: An Efficient ECC With No Vacancy for Strong Memory Protection](/publication/2022-08_sec-badaec-efficient-ecc-no-vacancy-strong-memory-protection)

Yuseok Song, Sangjae Park, [Michael B. Sullivan](/person/mike-sullivan), Jungrae Kim


[IEEE Access](https://ieeexplore.ieee.org/abstract/document/9866743)


[Self Adaptive Reconfigurable Arrays (SARA): Learning Flexible GEMM Accelerator Configuration and Mapping-space using ML](/publication/2022-08_self-adaptive-reconfigurable-arrays-sara-learning-flexible-gemm-accelerator)

Ananda Samajdar, Eric Qin, [Michael Pellauer](/person/michael-pellauer), Tushar Krishna


[Design Automation Conference (DAC)](https://dl.acm.org/doi/abs/10.1145/3489517.3530506)


[Zhuyi: Perception Processing Rate Estimation for Safety in Autonomous Vehicles](/publication/2022-07_zhuyi-perception-processing-rate-estimation-safety-autonomous-vehicles)

Yu-Shun Hsiao, [Siva Hari](/person/siva-hari), Michał Filipiuk, Timothy Tsai, [Michael B. Sullivan](/person/mike-sullivan), Vijay Janapa Reddi, Vasu Singh, [Steve Keckler](/person/stephen-keckler)


[Design Automation Conference (DAC)](https://dl.acm.org/doi/10.1145/3489517.3530445)


[Ruby: Improving Hardware Efficiency for Tensor Algebra Accelerators Through Imperfect Factorization](/index.php/publication/2022-06_ruby-improving-hardware-efficiency-tensor-algebra-accelerators-through)

Mark Horeni, Pooria Taheri, [Po-An Tsai](/index.php/person/po-an-tsai), [Angshuman Parashar](/index.php/person/angshuman-parashar), [Joel Emer](/index.php/person/joel-emer), Siddharth Joshi


[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/9804679)


[Exploiting Temporal Data Diversity for Detecting Safety-critical Faults in AV Compute Systems](/index.php/publication/2022-06_exploiting-temporal-data-diversity-detecting-safety-critical-faults-av-compute)

Saurabh Jha, Shengkun Cui, Timothy Tsai, [Siva Hari](/index.php/person/siva-hari), [Michael B. Sullivan](/index.php/person/mike-sullivan), Zbigniew T. Kalbarczyk, [Steve Keckler](/index.php/person/stephen-keckler), Ravishankar K. Iyer


[International Conference on Dependable Systems and Networks (DSN)](https://ieeexplore.ieee.org/document/9833576)


[Mixed-Proxy Extensions for the NVIDIA PTX Memory Consistency Model](/index.php/publication/2022-06_mixed-proxy-extensions-nvidia-ptx-memory-consistency-model)

[Daniel Lustig](/index.php/person/daniel-lustig), Simon Cooksey, Olivier Giroux


[International Symposium on Computer Architecture (ISCA), Industry Track](https://dl.acm.org/doi/10.1145/3470496.3533045)


IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[SIMD^2: A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM](/index.php/publication/2022-06_simd2-generalized-matrix-instruction-set-accelerating-tensor-computation-beyond)

Yunan Zhang, [Po-An Tsai](/index.php/person/po-an-tsai), Hung-Wei Tseng


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3470496.3527411)


[A Formalism of DNN Accelerator Flexibility](/publication/2022-06_formalism-dnn-accelerator-flexibility)

Sheng-Chun Kao, Hyoukjun Kwon, [Michael Pellauer](/person/michael-pellauer), [Angshuman Parashar](/person/angshuman-parashar), Tushar Krishna


[SIGMETRICS](https://dl.acm.org/doi/abs/10.1145/3530907)


[Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design](/publication/2022-05_learning-continuous-and-reconstructible-latent-space-hardware-accelerator)

[Qijing Jenny Huang](/person/qijing-jenny-huang), Charles Hong, John Wawrzynek, Mahesh Subedar, Yakun Sophia Shao


[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/9804604)


[Zhuyi: Perception Processing Rate Estimation for Safety in Autonomous Vehicles](/publication/2022-05_zhuyi-perception-processing-rate-estimation-safety-autonomous-vehicles)

Yu-Shun Hsiao, [Siva Hari](/person/siva-hari), Michał Filipiuk, Timothy Tsai, [Michael B. Sullivan](/person/mike-sullivan), Vijay Janapa Reddi, Vasu Singh, [Steve Keckler](/person/stephen-keckler)


[arXiv](https://arxiv.org/abs/2205.03347)


[Saving PAM4 Bus Energy with SMOREs: Sparse Multi-level Opportunistic Restricted Encodings](/publication/2022-04_saving-pam4-bus-energy-smores-sparse-multi-level-opportunistic-restricted)

[Mike O'Connor](/person/mike-o-connor), [Donghyuk Lee](/person/donghyuk-lee), [Niladrish Chatterjee](/person/niladrish-chatterjee), [Michael B. Sullivan](/person/mike-sullivan), [Steve Keckler](/person/stephen-keckler)


[International Symposium on High-Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/9773229)


[Improving Locality of Irregular Updates with Hardware Assisted Propagation Blocking](/index.php/publication/2022-04_improving-locality-irregular-updates-hardware-assisted-propagation-blocking)

[Vignesh Balaji](/index.php/person/vignesh-balaji), Brandon Lucia


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/9773262)


Best Paper nominee


[Characterizing and Mitigating Soft Errors in GPU DRAM](/index.php/publication/2022-03_characterizing-and-mitigating-soft-errors-gpu-dram)

[Michael B. Sullivan](/index.php/person/mike-sullivan), Nirmal R. Saxena, [Mike O'Connor](/index.php/person/mike-o-connor), [Donghyuk Lee](/index.php/person/donghyuk-lee), Paul Racunas, Saurabh Hukerikar, Timothy Tsai, [Siva Kumar Sastry Hari](/index.php/person/siva-hari), [Stephen W. Keckler](/index.php/person/stephen-keckler)


[IEEE Micro (Issue: Top Picks of the 2021 Computer Architecture Conferences)](https://ieeexplore.ieee.org/document/9744333)


[DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators](/publication/2022-03_digamma-domain-aware-genetic-algorithm-hw-mapping-co-optimization-dnn)

Sheng-Chun Kao, [Michael Pellauer](/person/michael-pellauer), [Angshuman Parashar](/person/angshuman-parashar), Tushar Krishna


[Design, Automation &amp; Test in Europe (DATE)](https://dl.acm.org/doi/abs/10.5555/3539845.3539906)


[Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators](/publication/2022-03_marvel-data-centric-approach-mapping-deep-learning-operators-spatial)

Prasanth Chatarasi, Hyoukjun Kwon, [Angshuman Parashar](/person/angshuman-parashar), [Michael Pellauer](/person/michael-pellauer), Tushar Krishna, Vivek Sarkar


[Transactions on Architecture and Code Optimization (TACO)](https://dl.acm.org/doi/full/10.1145/3485137)


[DAGguise: Mitigating Memory Timing Side Channels](/publication/2022-02_dagguise-mitigating-memory-timing-side-channels)

Peter W. Deutsch, Yuheng Yang, Thomas Bourgeat, Jules Drean, [Joel Emer](/person/joel-emer), Mengjia Yan


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3503222.3507747)


[GPU Subwarp Interleaving](/index.php/publication/2022-01_gpu-subwarp-interleaving)

Sana Damani, [Mark Stephenson](/index.php/person/mark-stephenson), Ram Rangan, Daniel Johnson, Rishkul Kulkarni, [Steve Keckler](/index.php/person/stephen-keckler)


[International Symposium on High-Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/9773183)


[Accelerators](/publication/2022-01_accelerators)

[Steve Keckler](/person/stephen-keckler), Dejan Milojicic


[IEEE Computer](https://ieeexplore.ieee.org/document/9681667)


### 2021 

[GPU Domain Specialization via Composable On-Package Architecture](/publication/2021-12_gpu-domain-specialization-composable-package-architecture)

[Yaosheng Fu](/person/yaosheng-fu), Evgeny Bolotin, [Niladrish Chatterjee](/person/niladrish-chatterjee), [David Nellans](/person/david-nellans), [Steve Keckler](/person/stephen-keckler)


[ACM Transactions on Architecture and Code Optimization (TACO)](https://dl.acm.org/doi/full/10.1145/3484505)


[Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers](/publication/2021-12_softermax-hardwaresoftware-co-design-efficient-softmax-transformers)

Jacob R. Stevens, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Brucek Khailany](/person/brucek-khailany), Anand Raghunathan


[Design Automation Conference (DAC) 2021](https://www.dac.com/)


[Evolution of the Graphics Processing Unit (GPU)](/publication/2021-12_evolution-graphics-processing-unit-gpu)

[William Dally](/person/william-dally), [Steve Keckler](/person/stephen-keckler), David B. Kirk


[IEEE Micro Special Issue of the 50th Anniversary of the Microprocessor](https://ieeexplore.ieee.org/document/9623445)


[Optimizing Selective Protection for CNN Resilience](/publication/2021-10_optimizing-selective-protection-cnn-resilience)

Abdulrahman Mahmoud, [Siva Hari](/person/siva-hari), Christopher W. Fletcher, Sarita V. Adve, [Charbel Sakr](/person/charbel-sakr), Naresh Shanbhag, [Pavlo Molchanov](/person/pavlo-molchanov), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai, [Steve Keckler](/person/stephen-keckler)


[International Symposium on Software Reliability Engineering (ISSRE)](https://ieeexplore.ieee.org/document/9700317)


[Suraksha: A Framework to Analyze the Safety Implications of Perception Design Choices in AVs](/publication/2021-10_suraksha-framework-analyze-safety-implications-perception-design-choices-avs)

Hengyu Zhao, [Siva Hari](/person/siva-hari), Timothy Tsai, [Michael B. Sullivan](/person/mike-sullivan), [Steve Keckler](/person/stephen-keckler), Jishen Zhao


[International Symposium on Software Reliability Engineering (ISSRE)](https://ieeexplore.ieee.org/abstract/document/9700341)


[GPS: A Global Publish-Subscribe Model for Multi-GPU Memory Management](/index.php/publication/2021-10_gps-global-publish-subscribe-model-multi-gpu-memory-management)

[Harini Muthukrishnan](/index.php/person/harini-muthukrishnan), [Daniel Lustig](/index.php/person/daniel-lustig), [David Nellans](/index.php/person/david-nellans), Thomas Wenisch


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3466752.3480088)


Best Paper nominee, IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[Characterizing and Mitigating Soft Errors in GPU DRAM](/publication/2021-10_characterizing-and-mitigating-soft-errors-gpu-dram-0)

[Michael B. Sullivan](/person/mike-sullivan), Nirmal Saxena, [Mike O'Connor](/person/mike-o-connor), [Donghyuk Lee](/person/donghyuk-lee), Paul Racunas, Saurabh Hukerikar, Timothy Tsai, [Siva Hari](/person/siva-hari), [Steve Keckler](/person/stephen-keckler)


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3466752.3480111)


IEEE Micro Top Picks in Computer Architecture


[Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators](/publication/2021-09_union-unified-hw-sw-co-design-ecosystem-mlir-evaluating-tensor-operations)

Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, [Angshuman Parashar](/person/angshuman-parashar), [Po-An Tsai](/person/po-an-tsai), Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna


[Parallel Architectures and Compilation Techniques (PACT)](https://ieeexplore.ieee.org/document/9563040)


[Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture](/index.php/publication/2021-08_large-graph-convolutional-network-training-gpu-oriented-data-communication)

Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, [Wen-mei Hwu](/index.php/person/wen-mei-hwu)


[Proceedings of the VLDB Endowment (VLDB)](https://dl.acm.org/doi/10.14778/3476249.3476264)


[EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal in GPUs](/index.php/publication/2021-08_emogi-efficient-memory-access-out-memory-graph-traversal-gpus)

Seung Won Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, Wen-mei Hwu


[Proceedings of the VLDB Endownment (VLDB)](https://dl.acm.org/doi/10.14778/3425879.3425883)


[NVBitFI: Dynamic Fault Injection for GPUs](/publication/2021-06_nvbitfi-dynamic-fault-injection-gpus)

Timothy Tsai, [Siva Hari](/person/siva-hari), [Michael B. Sullivan](/person/mike-sullivan), Oreste Villa, [Steve Keckler](/person/stephen-keckler)


[International Conference on Dependable Systems and Networks (DSN)](https://ieeexplore.ieee.org/abstract/document/9505068)


[SpZip: Architectural Support for Effective Data Compression in Irregular Applications](/publication/2021-06_spzip-architectural-support-effective-data-compression-irregular-applications)

Yifan Yang, [Joel Emer](/person/joel-emer), Daniel Sanchez


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/9499902)


[Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers](/index.php/publication/2021-06_efficient-multi-gpu-shared-memory-automatic-optimization-fine-grained-transfers)

[Harini Muthukrishnan](/index.php/person/harini-muthukrishnan), [David Nellans](/index.php/person/david-nellans), [Daniel Lustig](/index.php/person/daniel-lustig), Jeffrey Fessler, Thomas Wenisch


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/9499752)


[Simba: scaling deep-learning inference with chiplet-based architecture](/publication/2021-05_simba-scaling-deep-learning-inference-chiplet-based-architecture)

Yakun Sophia Shao, [Jason Clemons](/person/jason-clemons), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler)


[Communications of the ACM](https://dl.acm.org/doi/10.1145/3460227)


ACM Research Highlight


[Demystifying GPU Reliability: Comparing and Combining Beam Experiments, Fault Simulation, and Profiling](/index.php/publication/2021-05_demystifying-gpu-reliability-comparing-and-combining-beam-experiments-fault)

Fernando Fernandes dos Santos, [Siva Hari](/index.php/person/siva-hari), Pedro Martins Basso, Luigi Carro, Paolo Rech


[IEEE International Parallel &amp; Distributed Processing Symposium (IPDPS)](https://ieeexplore.ieee.org/document/9460470)


[Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators](/index.php/publication/2021-04_sparseloop-analytical-energy-focused-design-space-exploration-methodology)

Yannan Nellie Wu, [Po-An Tsai](/index.php/person/po-an-tsai), [Angshuman Parashar](/index.php/person/angshuman-parashar), Vivienne Sze, [Joel Emer](/index.php/person/joel-emer)


[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/9408213)


[GAMMA: Exploiting Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication](/index.php/publication/2021-04_gamma-exploiting-gustavson-s-algorithm-accelerate-sparse-matrix-multiplication)

Guowei Zhang, Nithya Attaluri, [Joel Emer](/index.php/person/joel-emer), Daniel Sanchez


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3445814.3446702)


[Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search](/index.php/publication/2021-04_mind-mappings-enabling-efficient-algorithm-accelerator-mapping-space-search)

Kartik Hegde, [Po-An Tsai](/index.php/person/po-an-tsai), Sitao Huang, Vikas Chandra, [Angshuman Parashar](/index.php/person/angshuman-parashar), Christopher W. Fletcher


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3445814.3446762)


[GPU Domain Specialization via Composable On-Package Architecture](/index.php/publication/2021-04_gpu-domain-specialization-composable-package-architecture)

[Yaosheng Fu](/index.php/person/yaosheng-fu), Evgeny Bolotin, [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), [David Nellans](/index.php/person/david-nellans), [Steve Keckler](/index.php/person/stephen-keckler)


[arXiv](https://arxiv.org/abs/2104.02188)


[VS-QUANT: Per-Vector Scaled Quantization for Accurate Low-Precision Neural Network Inference](/publication/2021-04_vs-quant-vector-scaled-quantization-accurate-low-precision-neural-network)

[Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Mark Haoxing Ren, [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Brucek Khailany](/person/brucek-khailany)


[MLSys 2021](https://mlsys.org/)


[Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU Architectures](/index.php/publication/2021-03_learning-sparse-matrix-row-permutations-efficient-spmm-gpu-architectures)

Atefeh Mehrabi, [Donghyuk Lee](/index.php/person/donghyuk-lee), [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), Danial J. Sorin, Benjamin C. Lee, [Mike O'Connor](/index.php/person/mike-o-connor)


[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/9408181)


[Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture](/publication/2021-03_large-graph-convolutional-network-training-gpu-oriented-data-communication)

Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, [Wen-mei Hwu](/person/wen-mei-hwu)


[ArXiv](https://arxiv.org/abs/2103.03330)


[Making Convolutions Resilient via Algorithm-Based Error Detection Techniques](/publication/2021-03_making-convolutions-resilient-algorithm-based-error-detection-techniques)

[Siva Hari](/person/siva-hari), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai, [Steve Keckler](/person/stephen-keckler)


[IEEE Transactions on Dependable and Secure Computing (TDSC)](https://ieeexplore.ieee.org/document/9366780)


[PGZ: Automatic Zero-Value Code Specialization](/publication/2021-03_pgz-automatic-zero-value-code-specialization)

[Mark Stephenson](/person/mark-stephenson), Ram Rangan


[International Conference on Compiler Construction (CC)](https://dl.acm.org/doi/10.1145/3446804.3446845)


[Reduced Precision DWC: An Efficient Hardening Strategy for Mixed-Precision Architectures](/index.php/publication/2021-03_reduced-precision-dwc-efficient-hardening-strategy-mixed-precision)

Fernando F. dos Santos, Marcelo Brandalero, [Michael B. Sullivan](/index.php/person/mike-sullivan), Pedro M. Basso, Michael Hubner, Luigi Carro, Paolo Rech


[IEEE Transactions on Computers](https://ieeexplore.ieee.org/abstract/document/9354571)


[Heterogeneous Dataflow Accelerators for Multi-DNN Workloads](/publication/2021-02_heterogeneous-dataflow-accelerators-multi-dnn-workloads)

Hyoukjun Kwon, Liangzhen Lai, [Michael Pellauer](/person/michael-pellauer), Tushar Krishna, Yu-Hsin Chen, Vikas Chandra


[International Symposium on High-Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/9407116)


[P-OPT: Practical Optimal Cache Replacement for Graph Analytics](/publication/2021-02_p-opt-practical-optimal-cache-replacement-graph-analytics)

Vignesh Balaji, [Neal Crago](/person/neal-crago), [Aamer Jaleel](/person/aamer-jaleel), Brandon Lucia


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/9407090)


Best Paper nominee


[Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator.](/publication/2021-02_need-speed-experiences-building-trustworthy-system-level-gpu-simulator)

Oreste Villa, [Daniel Lustig](/person/daniel-lustig), [Zi Yan](/person/zi-yan), Evgeny Bolotin, [Yaosheng Fu](/person/yaosheng-fu), [Niladrish Chatterjee](/person/niladrish-chatterjee), [Ted Jiang](/person/ted-jiang), [David Nellans](/person/david-nellans)


[International Symposium on High Performance Computer Architecture (HPCA)](https://doi.org/10.1109/HPCA51647.2021.00077)


[SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference](/publication/2021-02_snap-efficient-sparse-neural-acceleration-processor-unstructured-sparse-deep)

Jie-Fang Zhang, Ching-En Lee, Chester Liu, Yakun Sophia Shao, [Steve Keckler](/person/stephen-keckler), Zhengya Zhang


[IEEE Journal of Solid-State Circuits (JSSC)](https://ieeexplore.ieee.org/document/9310233)


[Hardware Abstractions for Targeting EDDO Architectures with the Polyhedral Model](/index.php/publication/2021-01_hardware-abstractions-targeting-eddo-architectures-polyhedral-model)

[Angshuman Parashar](/index.php/person/angshuman-parashar), Prasanth Chatarasi, [Po-An Tsai](/index.php/person/po-an-tsai)


[International Workshop on Polyhedral Compilation Techniques (IMPACT)](https://acohen.gitlabpages.inria.fr/impact/impact2021/)


[Flexion: A Quantitative Metric for Flexibility in DNN Accelerators](/publication/2021-01_flexion-quantitative-metric-flexibility-dnn-accelerators)

Hyoukjun Kwon, [Michael Pellauer](/person/michael-pellauer), [Angshuman Parashar](/person/angshuman-parashar), Tushar Krishna


[IEEE Computer Architecture Letters (CAL)](https://ieeexplore.ieee.org/document/9293373)


### 2020 

[The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems](/index.php/publication/2020-12_architectural-implications-distributed-reinforcement-learning-cpu-gpu-systems)

Ahmet Inci, Evgeny Bolotin, [Yaosheng Fu](/index.php/person/yaosheng-fu), [Gal Dalal](/index.php/person/gal-dalal), [Shie Mannor](/index.php/person/shie-mannor), [David Nellans](/index.php/person/david-nellans), Diana Marculescu


[Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2)](https://www.emc2-ai.org/virtual-20)


[GPU-Trident: Efficient Modeling of Error Propagation in GPU Programs](/publication/2020-11_gpu-trident-efficient-modeling-error-propagation-gpu-programs)

Abdul Rehman Anwer, Guanpeng Li, Karthik Pattabiraman, [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai, [Siva Hari](/person/siva-hari)


[The International Conference for High Performance Computing, Networking, Storag…](https://ieeexplore.ieee.org/abstract/document/9355257)


[Locality-Centric Data and Threadblock Management for Massive GPUs](/publication/2020-10_locality-centric-data-and-threadblock-management-massive-gpus)

Mahmoud Khairy, Vadim Nikiforov, [David Nellans](/person/david-nellans), Timothy G. Rogers


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/9251964)


[CaSA: End-to-end Quantitative Security Analysis of Randomly Mapped Caches](/index.php/publication/2020-10_casa-end-end-quantitative-security-analysis-randomly-mapped-caches)

Thomas Bourgeat, Jules Drean, Yuheng Yang, Lillian Tsai, [Joel Emer](/index.php/person/joel-emer), Mengjia Yan


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/9251961)


[HarDNN: Fine-Grained Vulnerability Evaluation and Protection for Convolutional Neural Networks](/publication/2020-09_hardnn-fine-grained-vulnerability-evaluation-and-protection-convolutional)

Abdulrahman Mahmoud, [Siva Hari](/person/siva-hari), Christopher W. Fletcher, Sarita V. Adve, Charbel Sakr, Naresh Shanbhag, [Pavlo Molchanov](/person/pavlo-molchanov), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai, [Steve Keckler](/person/stephen-keckler)


[SRC TECHCON](https://src.secure-platform.com/a/page/techcon)


[How to Evaluate Deep Neural Network Processors: TOPS/W (Alone) Considered Harmful](/index.php/publication/2020-08_how-evaluate-deep-neural-network-processors-topsw-alone-considered-harmful)

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, [Joel Emer](/index.php/person/joel-emer)


[IEEE Solid-State Circuits Magazine](https://ieeexplore.ieee.org/document/9177369)


[Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks](/publication/2020-07_planaria-dynamic-architecture-fission-spatial-multi-tenant-acceleration-deep)

Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmendra Reddy Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, Cliff Young, Hadi Esmaeilzadeh


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/9251939)


[PyTorchFI: A Runtime Perturbation Tool for DNNs](/publication/2020-06_pytorchfi-runtime-perturbation-tool-dnns)

Abdulrahman Mahmoud, Neeraj Aggarwal, Alex Nobbe, Jose Rodrigo Sanchez Vicarte, Sarita V. Adve, Christopher W. Fletcher, [Iuri Frosio](/person/iuri-frosio), [Siva Hari](/person/siva-hari)


[Workshop on Dependable and Secure Machine Learning](https://ieeexplore.ieee.org/document/9151812)


[EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs](/index.php/publication/2020-06_emogi-efficient-memory-access-out-memory-graph-traversal-gpus)

Seung Won Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, Wen-mei Hwu


[arXiv](https://arxiv.org/abs/2006.06890)


[Making Convolutions Resilient via Algorithm-Based Error Detection Techniques](/publication/2020-06_making-convolutions-resilient-algorithm-based-error-detection-techniques)

[Siva Hari](/person/siva-hari), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai, [Steve Keckler](/person/stephen-keckler)


[arXiv](https://arxiv.org/abs/2006.04984)


[There’s Plenty of Room at the Top: What Will Drive Computer Performance after Moore’s Law?](/index.php/publication/2020-06_there-s-plenty-room-top-what-will-drive-computer-performance-after-moore-s-law)

Charles E. Leiserson, Neil C. Thompson, [Joel Emer](/index.php/person/joel-emer), Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez , Tao B. Schardl 


[Science](https://www.science.org/doi/10.1126/science.aam9744)


[Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs](/publication/2020-06_buddy-compression-enabling-larger-memory-deep-learning-and-hpc-workloads-gpus)

Esha Chouske, [Michael B. Sullivan](/person/mike-sullivan), [Mike O'Connor](/person/mike-o-connor), Mattan Erez, Jeff Pool, [David Nellans](/person/david-nellans), [Steve Keckler](/person/stephen-keckler)


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/9138915)


[An In-Network Architecture for Accelerating Shared-Memory Multiprocessor Collectives](/publication/2020-05_network-architecture-accelerating-shared-memory-multiprocessor-collectives)

[Benjamin Klenk](/person/ben-klenk), [Ted Jiang](/person/ted-jiang), Greg Thorson, [Larry Dennison](/person/larry-dennison)


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1109/ISCA45697.2020.00085)


[Estimating Silent Data Corruption Rates Using a Two-Level Model](/publication/2020-04_estimating-silent-data-corruption-rates-using-two-level-model)

[Siva Hari](/person/siva-hari), Paolo Rech, Timothy Tsai, [Mark Stephenson](/person/mark-stephenson), Arslan Zulfiqar, [Michael B. Sullivan](/person/mike-sullivan), Philip Shirvani, Paul Racunas, [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler)


[arXiv](https://arxiv.org/abs/2005.01445)


[MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings](/publication/2020-04_maestro-data-centric-approach-understand-reuse-performance-and-hardware-cost)

Hyoukjun Kwon, Prasanth Chatarasi, Vivek Sarkar, Tushar Krishna, [Michael Pellauer](/person/michael-pellauer), [Angshuman Parashar](/person/angshuman-parashar)


[IEEE Micro (Issue: Top Picks of the 2019 Computer Architecture Conferences)](https://ieeexplore.ieee.org/document/9076333)


[BYOC: A "Bring Your Own Core" Framework for Heterogeneous-ISA Research](/publication/2020-03_byoc-bring-your-own-core-framework-heterogeneous-isa-research)

Jonathan Balkind, Katie Lim, Michael Schaffner, Fei Gao, Grigory Chirkov, Ang Li, Alexey Lavrov, Tri M. Nguyen, [Yaosheng Fu](/person/yaosheng-fu), Florian Zaruba, Kunal Gulati, Luca Benini, David Wentzlaf


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3373376.3378479)


[Safecracker: Leaking Secrets through Compressed Caches](/publication/2020-03_safecracker-leaking-secrets-through-compressed-caches)

[Po-An Tsai](/person/po-an-tsai), Andres Sanchez, Christopher W. Fletcher, Daniel Sanchez


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3373376.3378453)


IEEE Micro Top Picks in Computer Architecture


[RealityCheck: Bringing Modularity, Hierarchy, and Abstraction to Automated Microarchitectural Memory Consistency Verification](/index.php/publication/2020-03_realitycheck-bringing-modularity-hierarchy-and-abstraction-automated)

Yatin A. Manerkar, [Daniel Lustig](/index.php/person/daniel-lustig), Margaret Martonosi


[arXiv](https://arxiv.org/abs/2003.04892)


[Feature Map Vulnerability Evaluation in CNNs](/publication/2020-03_feature-map-vulnerability-evaluation-cnns)

Abdulrahman Mahmoud, [Siva Hari](/person/siva-hari), Christopher W. Fletcher, Sarita V. Adve, Charbel Sakr, Naresh Shanbhag, [Pavlo Molchanov](/person/pavlo-molchanov), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai, [Steve Keckler](/person/stephen-keckler)


[Workshop on Secure and Resilient Autonomy](http://sara-workshop.org/)


[HarDNN: Feature Map Vulnerability Evaluation in CNNs](/publication/2020-02_hardnn-feature-map-vulnerability-evaluation-cnns)

Abdulrahman Mahmoud, [Siva Hari](/person/siva-hari), Christopher W. Fletcher, Sarita V. Adve, Charbel Sakr, Naresh Shanbhag, [Pavlo Molchanov](/person/pavlo-molchanov), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai, [Steve Keckler](/person/stephen-keckler)


[arXiv](https://arxiv.org/abs/2002.09786)


[Speculative Reconvergence for Improved SIMT Efficiency](/publication/2020-02_speculative-reconvergence-improved-simt-efficiency)

Sana Damani, Daniel Johnson, [Mark Stephenson](/person/mark-stephenson), Eddie Yan, Olivier Giroux, Michael McKeown, [Steve Keckler](/person/stephen-keckler)


[International Symposium on Code Generation and Optimization](https://dl.acm.org/doi/10.1145/3368826.3377911)


[HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems](/index.php/publication/2020-02_hmg-extending-cache-coherence-protocols-across-modern-hierarchical-multi-gpu)

Xiaowei Ren, [Daniel Lustig](/index.php/person/daniel-lustig), Evgeny Bolotin, [Aamer Jaleel](/index.php/person/aamer-jaleel), Oreste Villa, [David Nellans](/index.php/person/david-nellans)


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/9065597)


### 2019 

[Near-Memory Data Transformation for Efficient Sparse Matrix Multi-Vector Multiplication](/index.php/publication/2019-11_near-memory-data-transformation-efficient-sparse-matrix-multi-vector)

Daichi Fujiki, [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), [Donghyuk Lee](/index.php/person/donghyuk-lee), [Mike O'Connor](/index.php/person/mike-o-connor)


[International Conference for High-Performance Computing, Networking, Storage, a…](https://dl.acm.org/doi/10.1145/3295500.3356154)


[MAGNet: A Modular Accelerator Generator for Neural Networks](/publication/2019-11_magnet-modular-accelerator-generator-neural-networks)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, Miaorong Wang, [Jason Clemons](/person/jason-clemons), [Steve Dai](/person/steve-dai), [Matt Fojtik](/person/matt-fojtik), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Yanqing Zhang](/person/yanqing-zhang), [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)


[International Conference On Computer Aided Design (ICCAD)](https://ieeexplore.ieee.org/document/8942127)


[Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs](/publication/2019-11_accelergy-architecture-level-energy-estimation-methodology-accelerator-designs)

Yannan Nellie Wu, [Joel Emer](/person/joel-emer), Vivienne Sze


[International Conference on Computer Aided Design (ICCAD)](https://ieeexplore.ieee.org/document/8942149)


[NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs](/index.php/publication/2019-10_nvbit-dynamic-binary-instrumentation-framework-nvidia-gpus)

Oreste Villa, [Mark Stephenson](/index.php/person/mark-stephenson), [David Nellans](/index.php/person/david-nellans), [Steve Keckler](/index.php/person/stephen-keckler)


[International Symposium on Microarchitecture (MICRO)](https://doi.org/10.1145/3352460.3358307)


[Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture](/publication/2019-10_simba-scaling-deep-learning-inference-multi-chip-module-based-architecture)

Sophia Shao, [Jason Clemons](/person/jason-clemons), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler)


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3352460.3358302)


Best Paper award, IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[ExTensor: An Accelerator for Sparse Tensor Algebra](/publication/2019-10_extensor-accelerator-sparse-tensor-algebra)

Kartik Hegde, Hadi Asghari-Moghaddam, [Michael Pellauer](/person/michael-pellauer), [Neal Crago](/person/neal-crago), [Aamer Jaleel](/person/aamer-jaleel), Edgar Solomonik, [Joel Emer](/person/joel-emer), Christopher W. Fletcher


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3352460.3358275)


IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach.](/index.php/publication/2019-10_understanding-reuse-performance-and-hardware-cost-dnn-dataflows-data-centric)

Hyoukjun Kwon, Prasanth Chatarasi, [Michael Pellauer](/index.php/person/michael-pellauer), [Angshuman Parashar](/index.php/person/angshuman-parashar), Vivek Sarkar, Tushar Krishna


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3352460.3358252)


IEEE Micro Top Picks in Computer Architecture


[A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator Designed with a High-Productivity VLSI Methodology](/publication/2019-08_011-pjop-032-128-tops-scalable-multi-chip-module-based-deep-neural-network)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, [Brian Zimmer](/person/brian-zimmer), [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)


[Hot Chips: A Symposium on High Performance Chips](http://www.hotchips.org/)


[Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training](/publication/2019-08_optimizing-multi-gpu-parallelization-strategies-deep-learning-training)

Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar, [Yaosheng Fu](/person/yaosheng-fu), Victor Zhang, Szymon Migacz, [David Nellans](/person/david-nellans), Puneet Gupta


[IEEE MICRO: Special Edition on Machine Learning Acceleration](https://ieeexplore.ieee.org/document/8805338)


[Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training](/publication/2019-07_optimizing-multi-gpu-parallelization-strategies-deep-learning-training)

Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar, [Yaosheng Fu](/person/yaosheng-fu), Victor Zhang, Szymon Migacz, [David Nellans](/person/david-nellans), Puneet Gupta 


[arXiv](https://arxiv.org/abs/1907.13257)


[GPU Snapshot: Checkpoint Offloading for GPU-Dense Systems](/index.php/publication/2019-06_gpu-snapshot-checkpoint-offloading-gpu-dense-systems)

Kyushick Lee, [Michael B. Sullivan](/index.php/person/mike-sullivan), [Siva Hari](/index.php/person/siva-hari), Timothy Tsai, [Steve Keckler](/index.php/person/stephen-keckler), Mattan Erez


[International Conference on Supercomputing](https://dl.acm.org/doi/10.1145/3330345.3330361)


[On the Trend of Resilience for GPU-Dense Systems](/publication/2019-06_trend-resilience-gpu-dense-systems)

Kyushick Lee, [Michael B. Sullivan](/person/mike-sullivan), [Siva Hari](/person/siva-hari), Timothy Tsai, [Steve Keckler](/person/stephen-keckler), Mattan Erez


[International Conference on Dependable Systems and Networks, Supplemental (DSN-…](https://ieeexplore.ieee.org/document/8805794)


Best of SELSE (Workshop on Silicon Errors in Logic - System Effects)


[Translation Ranger: Operating System Support for Contiguity-Aware TLBs](/index.php/publication/2019-06_translation-ranger-operating-system-support-contiguity-aware-tlbs)

[Zi Yan](/index.php/person/zi-yan), [Daniel Lustig](/index.php/person/daniel-lustig), [David Nellans](/index.php/person/david-nellans), Abhishek Bhattacharjee


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3307650.3322223)


[Adaptive Memory-Side Last-Level GPU Caching](/index.php/publication/2019-06_adaptive-memory-side-last-level-gpu-caching)

Xia Zhao, Almutaz Adileh, Zhibin Yu, Zhiying Wang, [Aamer Jaleel](/index.php/person/aamer-jaleel), Lieven Eeckhout


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3307650.3322235)


[SNAP: A 1.67 – 21.55 TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference in 16nm CMOS](/index.php/publication/2019-06_snap-167-2155-topsw-sparse-neural-acceleration-processor-unstructured-sparse)

Jie-Fang Zhang, Ching-En Lee, Chester Liu, Yakun Sophia Shao, [Steve Keckler](/index.php/person/stephen-keckler), Zhengya Zhang


[Symposia on VLSI Technology and Circuits (VLSI)](https://ieeexplore.ieee.org/document/8778193)


[Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs](/publication/2019-04_buddy-compression-enabling-larger-memory-deep-learning-and-hpc-workloads-gpus)

Esha Choukse, [Michael B. Sullivan](/person/mike-sullivan), [Mike O'Connor](/person/mike-o-connor), Mattan Erez, Jeff Pool, [David Nellans](/person/david-nellans), Stephen W. Keckler


[arXiv](https://arxiv.org/abs/1903.02596)


[Security Verification through Automatic Hardware-Aware Exploit Synthesis: The CheckMate Approach.](/publication/2019-04_security-verification-through-automatic-hardware-aware-exploit-synthesis)

Caroline Trippel, [Daniel Lustig](/person/daniel-lustig), Margaret Martonosi


[IEEE Micro (Issue: Top Picks of the 2018 Computer Architecture Conferences)](https://ieeexplore.ieee.org/document/8686197)


[Nimble Page Management for Tiered Memory Systems](/index.php/publication/2019-04_nimble-page-management-tiered-memory-systems)

[Zi Yan](/index.php/person/zi-yan), [Daniel Lustig](/index.php/person/daniel-lustig), [David Nellans](/index.php/person/david-nellans), Abhishek Bhattacharjee


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3297858.3304024)


[A Formal Analysis of the NVIDIA PTX Memory Consistency Model.](/index.php/publication/2019-04_formal-analysis-nvidia-ptx-memory-consistency-model)

[Daniel Lustig](/index.php/person/daniel-lustig), Sameer Sahasrabuddhe, Olivier Giroux


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3297858.3304043)


IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration](/index.php/publication/2019-04_buffets-efficient-and-composable-storage-idiom-explicit-decoupled-data)

[Michael Pellauer](/index.php/person/michael-pellauer), Yakun Sophia Shao, [Jason Clemons](/index.php/person/jason-clemons), [Neal Crago](/index.php/person/neal-crago), Kartik Hegde, [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Steve Keckler](/index.php/person/stephen-keckler), Christopher W. Fletcher, [Joel Emer](/index.php/person/joel-emer)


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3297858.3304025)


IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis](/publication/2019-04_delta-gpu-performance-model-deep-learning-applications-depth-memory-system)

Sangkug Lym, [Donghyuk Lee](/person/donghyuk-lee), [Mike O'Connor](/person/mike-o-connor), [Niladrish Chatterjee](/person/niladrish-chatterjee), Mattan Erez


[arXiv](https://arxiv.org/abs/1904.01691)


[On the Trend of Resilience for GPU-Dense Systems](/publication/2019-03_trend-resilience-gpu-dense-systems)

Kyushick Lee, [Michael B. Sullivan](/person/mike-sullivan), [Siva Hari](/person/siva-hari), Timothy Tsai, [Steve Keckler](/person/stephen-keckler), Mattan Erez


[IEEE Workshop on Silicon Errors in Logic – System Effects (SELSE)](https://selse.org/2019-archive/)


Award paper


[Towards Analytically Evaluating the Error Resilience of GPU Programs](/publication/2019-03_towards-analytically-evaluating-error-resilience-gpu-programs)

Abdul Rehman Anwer, Guanpeng Li, Karthik Pattabiraman, [Siva Hari](/person/siva-hari), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai


[IEEE Workshop on Silicon Errors in Logic – System Effects (SELSE)](https://selse.org/2019-archive/)


[DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis](/publication/2019-03_delta-gpu-performance-model-deep-learning-applications-depth-memory-system)

Sankug Lym, [Donghyuk Lee](/person/donghyuk-lee), [Niladrish Chatterjee](/person/niladrish-chatterjee), [Mike O'Connor](/person/mike-o-connor), Mattan Erez


[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/8695646)


[Timeloop: A Systematic Approach to DNN Accelerator Evaluation](/publication/2019-03_timeloop-systematic-approach-dnn-accelerator-evaluation)

[Angshuman Parashar](/person/angshuman-parashar), Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler), [Joel Emer](/person/joel-emer)


[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/8695666)


[DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems](/index.php/publication/2019-03_ducati-high-performance-address-translation-extending-tlb-reach-gpu-accelerated)

[Aamer Jaleel](/index.php/person/aamer-jaleel), Eiman Ebrahimi, Sam Duncan


[ACM Transactions on Architecture and Code Optimization (TACO)](https://dl.acm.org/doi/abs/10.1145/3309710)


[Understanding the Future of Energy Efficiency in Multi-Module GPUs.](/publication/2019-02_understanding-future-energy-efficiency-multi-module-gpus)

Akhil Arunkumar, Evgeny Bolotin, [David Nellans](/person/david-nellans), Carole-Jean Wu


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/8675192)


### 2018 

[Optimizing Software-Directed Instruction Replication for GPU Error Detection ](/publication/2018-11_optimizing-software-directed-instruction-replication-gpu-error-detection)

Abdulrahman Mahmoud, [Siva Hari](/person/siva-hari), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai, [Steve Keckler](/person/stephen-keckler)


[ International Conference for High-Performance Computing, Networking, Storage a…](https://dl.acm.org/doi/10.5555/3291656.3291746)


[Exploiting Idle Resources in a High-Radix Switch for Supplemental Storage](/index.php/publication/2018-11_exploiting-idle-resources-high-radix-switch-supplemental-storage)

[Matthias Blumrich](/index.php/person/matthias-blumrich), [Ted Jiang](/index.php/person/ted-jiang), [Larry Dennison](/index.php/person/larry-dennison)


[Proceedings of the International Conference for High Performance Computing, Net…](https://dl.acm.org/citation.cfm?id=3291662)


[PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifications](/publication/2018-10_pipeproof-automated-memory-consistency-proofs-microarchitectural-specifications)

Yatin A. Manerkar, [Daniel Lustig](/person/daniel-lustig), Margaret Martonosi, Aarti Gupta


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/8574586)


IEEE Micro Top Picks in Computer Architecture (Honorable Mention), Best Paper nominee


[SwapCodes: Error Codes for Hardware-Software Cooperative GPU Pipeline Error Detection](/publication/2018-10_swapcodes-error-codes-hardware-software-cooperative-gpu-pipeline-error)

[Michael B. Sullivan](/person/mike-sullivan), [Siva Hari](/person/siva-hari), [Brian Zimmer](/person/brian-zimmer), Timothy Tsai, [Stephen W. Keckler](/person/stephen-keckler)


[The International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/8574584)


[Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems](/index.php/publication/2018-10_combining-hwsw-mechanisms-improve-numa-performance-multi-gpu-systems)

Vinson Young, [Aamer Jaleel](/index.php/person/aamer-jaleel), Evgeny Bolotin, Eiman Ebrahimi, [David Nellans](/index.php/person/david-nellans), Oreste Villa


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1109/MICRO.2018.00035)


[Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism](/index.php/publication/2018-10_harmonizing-speculative-and-non-speculative-execution-architectures-ordered)

Mark C. Jeffrey, Victor A. Ying, Suvinay Subramanian, Hyun Ryong Lee, [Joel Emer](/index.php/person/joel-emer), Daniel Sanchez


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/8574543)


[DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors](/index.php/publication/2018-10_dawg-defense-against-cache-timing-attacks-speculative-execution-processors)

Vladimir Kiriansky, Ilia Lebedev, Saman Amarasinghe, Srinivas Devadas, [Joel Emer](/index.php/person/joel-emer)


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/8574600)


[CheckMate: Automated Synthesis of Hardware Exploits and Security Litmus Tests](/publication/2018-10_checkmate-automated-synthesis-hardware-exploits-and-security-litmus-tests)

Caroline Trippel, [Daniel Lustig](/person/daniel-lustig), Margaret Martonosi


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/8574598)


IEEE Micro Top Picks in Computer Architecture


[Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs](/index.php/publication/2018-10_exposing-memory-access-patterns-improve-instruction-and-memory-efficiency-gpus)

[Neal Crago](/index.php/person/neal-crago), [Mark Stephenson](/index.php/person/mark-stephenson), [Steve Keckler](/index.php/person/stephen-keckler)


[ACM Transactions on Architecture and Code Optimization (TACO)](https://doi.org/10.1145/3280851)


[Software-Directed Techniques for Improved GPU Register File Utilization](/publication/2018-09_software-directed-techniques-improved-gpu-register-file-utilization)

Dani Voitsechov, Arslan Zulfiqar, [Mark Stephenson](/person/mark-stephenson), Mark Gebhart, [Steve Keckler](/person/stephen-keckler)


[ACM Transactions on Architecture and Code Optimization (TACO)](https://dl.acm.org/doi/10.1145/3243905)


[What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study](/index.php/publication/2018-07_what-your-dram-power-models-are-not-telling-you-lessons-detailed-experimental)

Saugata Ghose, Abdullah Giray Yağlıkçı, Raghav Gupta, [Donghyuk Lee](/index.php/person/donghyuk-lee), Kais Kudrolli, William X. Liu, Hasan Hassan, Kevin K. Chang, [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), Aditya Agrawal, [Mike O'Connor](/index.php/person/mike-o-connor), Onur Mutlu


[arXiv](https://arxiv.org/abs/1807.05102)


[Modeling Soft Error Propagation in Programs](/publication/2018-06_modeling-soft-error-propagation-programs)

Guanpeng Li, Karthik Pattabiraman, [Siva Hari](/person/siva-hari), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai


[International Conference on Dependable Systems and Networks (DSN)](https://ieeexplore.ieee.org/document/8416468)


[A Modular Digital VLSI Flow for High-Productivity SoC Design](/publication/2018-06_modular-digital-vlsi-flow-high-productivity-soc-design)

[Brucek Khailany](/person/brucek-khailany), Evgeni Krimer, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Jason Clemons](/person/jason-clemons), [Joel Emer](/person/joel-emer), [Matt Fojtik](/person/matt-fojtik), Alicia Klinefelter, [Michael Pellauer](/person/michael-pellauer), [Nathaniel Pinckney](/person/nathaniel-pinckney), Sophia Shao, Shreesha Srinath, Christopher Torng, Sam (Likun) Xi, [Yanqing Zhang](/person/yanqing-zhang), [Brian Zimmer](/person/brian-zimmer)


[Design Automation Conference (DAC)](https://dl.acm.org/doi/10.1145/3195970.3199846)


[What Your DRAM Power Models Aren’t Telling You: Lessons from a Detailed Experimental Study](/index.php/publication/2018-06_what-your-dram-power-models-aren-t-telling-you-lessons-detailed-experimental)

Saugata Ghose, Abdullah Giray Yağlıkçı, Raghav Gupta, [Donghyuk Lee](/index.php/person/donghyuk-lee), Kais Kudrolli, William X. Liu, Hasan Hassan, Kevin Chang, [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), Aditya Agrawal, [Mike O'Connor](/index.php/person/mike-o-connor), Onur Mutlu


[ACM International Conference on Measurement and Analysis of Computer Systems (S…](https://dl.acm.org/doi/abs/10.1145/3224419)


[A Case for Richer Cross-layer Abstractions: Bridging the Semantic Gap to Enhance Memory Optimization](/index.php/publication/2018-06_case-richer-cross-layer-abstractions-bridging-semantic-gap-enhance-memory)

Nandita Vijaykumar, Abhilasha Jain, Diptesh Majumdar, Kevin Hsieh, Gennady Pekhimenko, Eiman Ebrahimi, Nastaran Hajinazar, Phillip B. Gibbons, Onur Mutlu


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1109/ISCA.2018.00027)


[UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition](/index.php/publication/2018-06_ucnn-exploiting-computational-reuse-deep-neural-networks-weight-repetition)

Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, [Michael Pellauer](/index.php/person/michael-pellauer), Christopher W. Fletcher


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1109/ISCA.2018.00062)


[The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality in GPUs](/index.php/publication/2018-06_locality-descriptor-holistic-cross-layer-abstraction-express-data-locality-gpus)

Nandita Vijayumar, Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu


[ International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1109/ISCA.2018.00074)


[ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction](/index.php/publication/2018-06_accord-enabling-associativity-gigascale-dram-caches-coordinating-way-install)

Vinson Young, Chiachen Chou, [Aamer Jaleel](/index.php/person/aamer-jaleel), Moinuddin Qureshi


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/8416838)


[Full-Stack Memory Model Verification with TriCheck](/index.php/publication/2018-05_full-stack-memory-model-verification-tricheck)

Caroline Trippel, Yatin A. Manerkar, [Daniel Lustig](/index.php/person/daniel-lustig), [Michael Pellauer](/index.php/person/michael-pellauer), Margaret Martonosi


[IEEE Micro (Issue: Top Picks of the 2017 Computer Architecture Conferences)](https://ieeexplore.ieee.org/document/8357999)


[Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency](/publication/2018-05_voltron-understanding-and-exploiting-voltage-latency-reliability-trade-offs)

Kevin K. Chang, Abdullah Giray Yağlıkçı, Saugata Ghose, Aditya Agrawal, [Niladrish Chatterjee](/person/niladrish-chatterjee), Abhijith Kashyap, [Donghyuk Lee](/person/donghyuk-lee), [Mike O'Connor](/person/mike-o-connor), Hasan Hassan, Onur Mutlu


[arXiv](https://arxiv.org/abs/1805.03175)


[DUO: Exposing On-chip Redundancy to Rank-Level ECC for High Reliability](/publication/2018-03_duo-exposing-chip-redundancy-rank-level-ecc-high-reliability)

Seong-Lyong Gong, Jungrae Kim, [Michael B. Sullivan](/person/mike-sullivan), Howard David, Mattan Erez


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/abstract/document/8327047)


[Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks](/publication/2018-02_compressing-dma-engine-leveraging-activation-sparsity-training-deep-neural)

Minsoo Rhu, [Mike O'Connor](/person/mike-o-connor), [Niladrish Chatterjee](/person/niladrish-chatterjee), Jeff Pool, Youngeun Kwon, [Steve Keckler](/person/stephen-keckler)


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/8327000)


[Reducing Data Transfer Energy by Exploiting Similarity within a Data Transaction](/publication/2018-02_reducing-data-transfer-energy-exploiting-similarity-within-data-transaction)

[Donghyuk Lee](/person/donghyuk-lee), [Mike O'Connor](/person/mike-o-connor), [Niladrish Chatterjee](/person/niladrish-chatterjee)


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/8326997)


Best Paper nominee


[Stitch-X: An Accelerator Architecture for Exploiting Unstructured Sparsity in Deep Neural Networks](/publication/2018-02_stitch-x-accelerator-architecture-exploiting-unstructured-sparsity-deep-neural)

Ching-En Lee, Yakun Sophia Shao, Jie-Fang Zhang, [Angshuman Parashar](/person/angshuman-parashar), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), Zhengya Zhang


[SysML Conference](https://mlsys.org/Conferences/2018/index.html#posters)


[MeltdownPrime and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-Based Coherence Protocols](/publication/2018-02_meltdownprime-and-spectreprime-automatically-synthesized-attacks-exploiting)

Caroline Trippel, [Daniel Lustig](/person/daniel-lustig), Margaret Martonosi


[arXiv](https://arxiv.org/abs/1802.03802)


### 2017 

[Toward Standardized Near-Data Processing with Unrestricted Data Placement for GPUs](/index.php/publication/2017-11_toward-standardized-near-data-processing-unrestricted-data-placement-gpus)

Gwangsun Kim, [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), [Mike O'Connor](/index.php/person/mike-o-connor), Kevin Hsieh


[International Conference for High-Performance Computing, Networking, Storage, a…](https://dl.acm.org/citation.cfm?id=3126965)


[Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications](/publication/2017-11_understanding-error-propagation-deep-learning-neural-network-dnn-accelerators)

Guanpeng Li, [Siva Hari](/person/siva-hari), [Michael B. Sullivan](/person/mike-sullivan), Timothy Tsai, Karthik Pattabiraman, [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler)


[The International Conference for High Performance Computing, Networking, Storag…](https://dl.acm.org/doi/10.1145/3126908.3126964)


[Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems](/index.php/publication/2017-10_fine-grained-dram-energy-efficient-dram-extreme-bandwidth-systems)

[Mike O'Connor](/index.php/person/mike-o-connor), [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), [Donghyuk Lee](/index.php/person/donghyuk-lee), [John Wilson](/index.php/person/john-wilson), Aditya Agrawal, [Steve Keckler](/index.php/person/stephen-keckler), [William Dally](/index.php/person/william-dally)


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/citation.cfm?id=3124545)


[Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content](/publication/2017-10_detecting-and-mitigating-data-dependent-dram-failures-exploiting-current-memory)

Samira Khan, Chris Wilkerson, Zhe Wang, Alaa R. Alameldeen, [Donghyuk Lee](/person/donghyuk-lee), Onur Mutlu


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/8686523)


[Xylem: Enhancing Vertical Thermal Conduction in 3D Processor-Memory Stacks](/publication/2017-10_xylem-enhancing-vertical-thermal-conduction-3d-processor-memory-stacks)

Aditya Agrawal, Josep Torrellas, Sachin Idgunji


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/8686607)


[Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology](/publication/2017-10_ambit-memory-accelerator-bulk-bitwise-operations-using-commodity-dram)

Vivek Seshadri, [Donghyuk Lee](/person/donghyuk-lee), Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, Todd C. Mowry


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/8686556)


[RTLCheck: Verifying Memory Consistency in RTL Designs](/publication/2017-10_rtlcheck-verifying-memory-consistency-rtl-designs)

Yatin A. Manerkar, [Daniel Lustig](/person/daniel-lustig), Margaret Martonosi, [Michael Pellauer](/person/michael-pellauer)


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3123939.3124536)


IEEE Micro Top Picks in Computer Architecture (Honorable Mention)


[Beyond the Socket: NUMA-Aware GPUs](/index.php/publication/2017-10_beyond-socket-numa-aware-gpus)

Ugljesa Milic, Oreste Villa, Evgeny Bolotin, Akhil Arunkumar, Eiman Ebrahimi, [Aamer Jaleel](/index.php/person/aamer-jaleel), Alex Ramirez, [David Nellans](/index.php/person/david-nellans)


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/citation.cfm?id=3124534)


[Weak Memory Models with Matching Axiomatic and Operational Definitions](/index.php/publication/2017-10_weak-memory-models-matching-axiomatic-and-operational-definitions)

Sizhuo Zhang, Muralidaran Vijayaraghavan, [Daniel Lustig](/index.php/person/daniel-lustig), Arvind


[arXiv](https://arxiv.org/abs/1710.04259)


[BATMAN: Maximizing Bandwidth Utilization of Hybrid Memory Systems](/publication/2017-10_batman-maximizing-bandwidth-utilization-hybrid-memory-systems)

Chiachen Chou, [Aamer Jaleel](/person/aamer-jaleel), Moinuddin Qureshi


[International Symposium on Memory Systems (MEMSYS)](https://dl.acm.org/doi/10.1145/3132402.3132404)


[SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks](/index.php/publication/2017-06_scnn-accelerator-compressed-sparse-convolutional-neural-networks)

[Angshuman Parashar](/index.php/person/angshuman-parashar), Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Brucek Khailany](/index.php/person/brucek-khailany), [Joel Emer](/index.php/person/joel-emer), [Steve Keckler](/index.php/person/stephen-keckler), [William Dally](/index.php/person/william-dally)


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3079856.3080254)


[MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability](/publication/2017-06_mcm-gpu-multi-chip-module-gpus-continued-performance-scalability)

Akhil Arunkumar , Evgeny Bolotin, Benjamin Cho, Ugljesa Milic , Eiman Ebrahimi, Oreste Villa, [Aamer Jaleel](/person/aamer-jaleel), Carole-Jean Wu , [David Nellans](/person/david-nellans)


[International Symposium on Computer Architecture (ISCA)](https://doi.org/10.1145/3079856.3080231)


[Fractal: An Execution Model for Fine-Grain Nested Speculative Parallelism](/publication/2017-06_fractal-execution-model-fine-grain-nested-speculative-parallelism)

Suvinay Subramanian, Mark C. Jeffrey, Maleen Abeydeera, Hyun Ryong Lee, Victor A. Ying, [Joel Emer](/person/joel-emer), Daniel Sanchez


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/8192504)


[Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms](/publication/2017-06_understanding-reduced-voltage-operation-modern-dram-devices-experimental)

Kevin Chang, Abdullah Giray Yağlıkçı, Saugata Ghose, Aditya Agrawal, [Niladrish Chatterjee](/person/niladrish-chatterjee), Abhijith Kashyap, [Donghyuk Lee](/person/donghyuk-lee), [Mike O'Connor](/person/mike-o-connor), Hasan Hassan, Onur Mutlu


[ACM Conference on Measurement and Analysis of Computer Systems (SIGMETRICS 2017)](http://dl.acm.org/citation.cfm?id=3078590)


[Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms](/index.php/publication/2017-06_design-induced-latency-variation-modern-dram-chips-characterization-analysis)

[Donghyuk Lee](/index.php/person/donghyuk-lee), Samira Khan, Lavanya Subramanian, Saugata Ghose, Rachata Ausavarungnirun, Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu


[ACM Conference on Measurement and Analysis of Computer Systems (SIGMETRICS 2017)](http://dl.acm.org/citation.cfm?id=3084464)


[Understanding Reduced-Voltage Operation in Modern DRAM Chips: Characterization, Analysis, and Mechanisms](/index.php/publication/2017-05_understanding-reduced-voltage-operation-modern-dram-chips-characterization)

Kevin K. Chang, Abdullah Giray Yağlıkçı, Saugata Ghose, Aditya Agrawal, [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), Abhijith Kashyap, [Donghyuk Lee](/index.php/person/donghyuk-lee), [Mike O'Connor](/index.php/person/mike-o-connor), Hasan Hassan, Onur Mutlu


[arXiv](https://arxiv.org/abs/1705.10292)


[SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks](/publication/2017-05_scnn-accelerator-compressed-sparse-convolutional-neural-networks)

[Angshuman Parashar](/person/angshuman-parashar), Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)


[arXiv](https://arxiv.org/abs/1708.04485)


[Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks](/publication/2017-05_compressing-dma-engine-leveraging-activation-sparsity-training-deep-neural)

Minsoo Rhu, [Mike O'Connor](/person/mike-o-connor), [Niladrish Chatterjee](/person/niladrish-chatterjee), Jeff Pool, [Stephen W. Keckler](/person/stephen-keckler)


[arXiv](https://arxiv.org/abs/1705.01626)


[SASSIFI: An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluation](/publication/2017-04_sassifi-architecture-level-fault-injection-tool-gpu-application-resilience)

[Siva Hari](/person/siva-hari), Timothy Tsai, [Mark Stephenson](/person/mark-stephenson), [Steve Keckler](/person/stephen-keckler), [Joel Emer](/person/joel-emer)


[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/7975296)


[Automated Synthesis of Comprehensive Memory Model Litmus Test Suites](/index.php/publication/2017-04_automated-synthesis-comprehensive-memory-model-litmus-test-suites)

[Daniel Lustig](/index.php/person/daniel-lustig), Andrew Wright, Alexandros Papakonstantinou, Olivier Giroux


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3037697.3037723)


[TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA](/publication/2017-04_tricheck-memory-model-verification-trisection-software-hardware-and-isa)

Caroline Trippel, Yatin A. Manerkar, [Daniel Lustig](/person/daniel-lustig), [Michael Pellauer](/person/michael-pellauer), Margaret Martonosi


[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3093336.3037719)


IEEE Micro Top Picks in Computer Architecture


[SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies](/publication/2017-02_softmc-flexible-and-practical-open-source-infrastructure-enabling-experimental)

Hasan Hassan, Nandita Vijaykumar, Samira Khan, Saugata Ghose, Kevin Chang, Gennady Pekhimenko, [Donghyuk Lee](/person/donghyuk-lee), Oguz Ergin, Onur Mutlu


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/7920829)


[Architecting an Energy-Efficient DRAM System for GPUs](/publication/2017-02_architecting-energy-efficient-dram-system-gpus)

[Niladrish Chatterjee](/person/niladrish-chatterjee), [Mike O'Connor](/person/mike-o-connor), [Donghyuk Lee](/person/donghyuk-lee), Daniel Johnson, Minsoo Rhu, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)


[International Symposium on High Performance Computer Architecture (HPCA)](http://ieeexplore.ieee.org/document/7920815/)


### 2016 

[Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings](/publication/2016-11_counterexamples-and-proof-loophole-cc-power-and-armv7-trailing-sync-compiler)

Yatin A. Manerkar, Caroline Trippel, [Daniel Lustig](/person/daniel-lustig), [Michael Pellauer](/person/michael-pellauer), Margaret Martonosi


[arXiv](https://arxiv.org/abs/1611.01507)


[CANDY: Enabling Coherent DRAM Caches for Multi-Node Systems](/index.php/publication/2016-10_candy-enabling-coherent-dram-caches-multi-node-systems)

Chiachen Chou, [Aamer Jaleel](/index.php/person/aamer-jaleel), Moinuddin Qureshi


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.5555/3195638.3195680)


[Snatch: Opportunistically Reassigning Power Allocation between Processor and Memory in 3D Stacks](/publication/2016-10_snatch-opportunistically-reassigning-power-allocation-between-processor-and)

Dimitrios Skarlatos, Renji Thomas, Aditya Agrawal, Shibin Qin, Robert Pilawa-Podgurski, Ulya R. Karpuzcu, Radu Teodorescu, Nam Sung Kim, Josep Torrellas


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7783757)


[A Patch Memory System For Image Processing and Computer Vision.](/index.php/publication/2016-10_patch-memory-system-image-processing-and-computer-vision)

[Jason Clemons](/index.php/person/jason-clemons), Chih-Chi Cheng, [Iuri Frosio](/index.php/person/iuri-frosio), Daniel Johnson, [Steve Keckler](/index.php/person/stephen-keckler)


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7783754)


[Approxilyzer: Towards A Systematic Framework for Instruction-Level Approximate Computing and its Application to Hardware Resiliency](/publication/2016-10_approxilyzer-towards-systematic-framework-instruction-level-approximate)

Radha Venkatagiri, Abdulrahman Mahmoud, [Siva Hari](/person/siva-hari), Sarita Adve


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7783745)


[The Bunker Cache for Spatio-Value Approximation](/index.php/publication/2016-10_bunker-cache-spatio-value-approximation)

Joshua San Miguel, Jorge Albericio, [Aamer Jaleel](/index.php/person/aamer-jaleel), Natalie Enright Jerger


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7783746)


[vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design.](/publication/2016-10_vdnn-virtualized-deep-neural-networks-scalable-memory-efficient-neural-network)

Minsoo Rhu, Natalia Gimelshein, [Jason Clemons](/person/jason-clemons), Arslan Zulfiqar, [Steve Keckler](/person/stephen-keckler)


[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.5555/3195638.3195660)


[Data-Centric Execution of Speculative Parallel Programs](/publication/2016-10_data-centric-execution-speculative-parallel-programs)

Mark C. Jeffrey, Suvinay Subramanian, Maleen Abeydeera, [Joel Emer](/person/joel-emer), Daniel Sanchez


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7783708)


[Co-Designing Accelerators and SoC Interfaces Using gem5-Aladdin](/publication/2016-10_co-designing-accelerators-and-soc-interfaces-using-gem5-aladdin)

Yakun Sophia Zhao, Sam (Likun) Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, David Brooks


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7783751)


[CLARA: Circular Linked-List Auto- and Self-Refresh Architecture](/index.php/publication/2016-10_clara-circular-linked-list-auto-and-self-refresh-architecture)

Aditya Agrawal, [Mike O'Connor](/index.php/person/mike-o-connor), Evgeny Bolotin, [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), [Joel Emer](/index.php/person/joel-emer), [Steve Keckler](/index.php/person/stephen-keckler)


[International Symposium on Memory Systems (MEMSYS'16)](https://dl.acm.org/doi/10.1145/2989081.2989084)


[Automatically Exploiting Implicit Pipeline Parallelism from Multiple Dependent Kernels for GPUs](/publication/2016-09_automatically-exploiting-implicit-pipeline-parallelism-multiple-dependent)

Gwangsun Kim, Jiyun Jeong, John Kim, [Mark Stephenson](/person/mark-stephenson)


[International Conference on Parallel Architectures and Compilation (PACT)](https://dl.acm.org/doi/proceedings/10.1145/2967938)


[TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA](/index.php/publication/2016-08_tricheck-memory-model-verification-trisection-software-hardware-and-isa)

Caroline Trippel, Yatin A. Manerkar, [Daniel Lustig](/index.php/person/daniel-lustig), [Michael Pellauer](/index.php/person/michael-pellauer), Margaret Martonosi


[arXiv](https://arxiv.org/abs/1608.07547)


[Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems](/index.php/publication/2016-06_transparent-offloading-and-mapping-tom-enabling-programmer-transparent-near)

Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), [Mike O'Connor](/index.php/person/mike-o-connor), Nandita Vijaykumar, Onur Mutlu, [Steve Keckler](/index.php/person/stephen-keckler)


[International Symposium on Computer Architecture (ISCA)](http://ieeexplore.ieee.org/document/7551394/)


[Bit-Plane Compression: Transforming Data for Better Compression in Many-core Architectures](/publication/2016-06_bit-plane-compression-transforming-data-better-compression-many-core)

Jungrae Kim, [Michael B. Sullivan](/person/mike-sullivan), Esha Choukse, Mattan Erez


[The International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/abstract/document/7551404)


[Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks](/publication/2016-06_eyeriss-spatial-architecture-energy-efficient-dataflow-convolutional-neural)

Yu-Hsin Chen, [Joel Emer](/person/joel-emer), Vivienne Sze


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/7551407)


[EIE: Efficient Inference Engine on Compressed Deep Neural Network](/index.php/publication/2016-06_eie-efficient-inference-engine-compressed-deep-neural-network)

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, [William Dally](/index.php/person/william-dally)


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3007787.3001163)


[Accelerating Dependent Cache Misses with an Enhanced Memory Controller](/index.php/publication/2016-06_accelerating-dependent-cache-misses-enhanced-memory-controller)

Milad Hashemi, Khubaib, Eiman Ebrahimi, Onur Mutlu, Yale N. Patt


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/abs/10.1145/3007787.3001184)


[LAP: Loop-Block Aware Inclusion Properties for Energy-Efficient Asymmetric Last Level Caches](/publication/2016-06_lap-loop-block-aware-inclusion-properties-energy-efficient-asymmetric-last)

Hsiang-Yun Cheng, Jishen Zhao, Jack Sampson, Mary Jane Irwin, [Aamer Jaleel](/person/aamer-jaleel), Yu Lu, Yuan Xie


[International Symposium on Computer Architecture (ISCA)](https://ieeexplore.ieee.org/document/7551386)


[A Real-time Energy-Efficient Superpixel Hardware Accelerator for Mobile Computer Vision Applications](/index.php/publication/2016-06_real-time-energy-efficient-superpixel-hardware-accelerator-mobile-computer)

Injoon Hong, [Jason Clemons](/index.php/person/jason-clemons), [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Iuri Frosio](/index.php/person/iuri-frosio), [Brucek Khailany](/index.php/person/brucek-khailany), [Steve Keckler](/index.php/person/stephen-keckler)


[Design Automation Conference (DAC)](http://dl.acm.org/citation.cfm?id=2897974)


[All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory](/publication/2016-06_all-inclusive-ecc-thorough-end-end-protection-reliable-computer-memory)

Jungrae Kim, [Michael B. Sullivan](/person/mike-sullivan), Sangkug Lym, Mattan Erez


[The International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3007787.3001203)


[Selective GPU Caches to Eliminate CPU-GPU HW Cache Coherence](/index.php/publication/2016-03_selective-gpu-caches-eliminate-cpu-gpu-hw-cache-coherence)

Neha Agarwal, [David Nellans](/index.php/person/david-nellans), Eiman Ebrahimi, Thomas F. Wenisch, John Danskin, [Steve Keckler](/index.php/person/stephen-keckler)


[ International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/7446089)


[A Case for Toggle-Aware Compression for GPU Systems](/publication/2016-03_case-toggle-aware-compression-gpu-systems)

Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, [Steve Keckler](/person/stephen-keckler)


[International Symposium on High Performance Computer Architecture (HPCA)](http://ieeexplore.ieee.org/document/7446064/)


[Towards High Performance Paged Memory for GPUs](/index.php/publication/2016-03_towards-high-performance-paged-memory-gpus)

Tianhao Zheng, [David Nellans](/index.php/person/david-nellans), Arslan Zulfiqar, [Mark Stephenson](/index.php/person/mark-stephenson), [Steve Keckler](/index.php/person/stephen-keckler)


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/7446077)


[An Analytical Model for Hardened Latch Selection and Exploration](/index.php/publication/2016-03_analytical-model-hardened-latch-selection-and-exploration)

[Michael B. Sullivan](/index.php/person/mike-sullivan), [Brian Zimmer](/index.php/person/brian-zimmer), [Siva Hari](/index.php/person/siva-hari), Timothy Tsai, [Steve Keckler](/index.php/person/stephen-keckler)


[Workshop on Silicon Errors in Logic--System Effects (SELSE)](http://www.selse.org/)


[vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design](/index.php/publication/2016-02_vdnn-virtualized-deep-neural-networks-scalable-memory-efficient-neural-network)

Minsoo Rhu, Natalia Gimelshein, [Jason Clemons](/index.php/person/jason-clemons), Arslan Zulfiqar, [Steve Keckler](/index.php/person/stephen-keckler)


[arXiv](https://arxiv.org/abs/1602.08124)


### 2015 

[CCICheck: Using μhb Graphs to Verify the Coherence-Consistency Interface](/index.php/publication/2015-12_ccicheck-using-mhb-graphs-verify-coherence-consistency-interface)

Yatin A. Manerkar, [Daniel Lustig](/index.php/person/daniel-lustig), [Michael Pellauer](/index.php/person/michael-pellauer), Margaret Martonosi


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/abstract/document/7856585)


[A Scalable Architecture for Ordered Parallelism](/publication/2015-12_scalable-architecture-ordered-parallelism)

Mark C. Jeffery, Suvinay Subramanian, Cong Yang, [Joel Emer](/person/joel-emer), Daniel Sanchez


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7856601)


[A Fast and Accurate Analytical Technique to Compute the AVF of Sequential Bits in a Processor](/publication/2015-12_fast-and-accurate-analytical-technique-compute-avf-sequential-bits-processor)

Steve Raasch, Arijis Biswas, Jon Stephan, Paul Racunas, [Joel Emer](/person/joel-emer)


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/abstract/document/7856641)


[Exploiting Asymmetry in Booth-Encoded Multipliers for Reduced Energy Multiplication](/index.php/publication/2015-11_exploiting-asymmetry-booth-encoded-multipliers-reduced-energy-multiplication)

[Mike O'Connor](/index.php/person/mike-o-connor), Earl Swartzlander, Jr.


[49th Asilomar Conference on Signals, Systems, and Computers](http://ieeexplore.ieee.org/abstract/document/7421228/)


[Anatomy of GPU Memory System for Multi-Application Execution](/index.php/publication/2015-10_anatomy-gpu-memory-system-multi-application-execution)

Adwait Jog, Onur Kayiran, Tuba Kesten, Ashutosh Pattnaik, Evgeny Bolotin, [Niladrish Chatterjee](/index.php/person/niladrish-chatterjee), [Steve Keckler](/index.php/person/stephen-keckler), Mahmut T. Kandemir, Chita R. Das


[International Symposium on Memory Systems (MEMSYS)](http://dl.acm.org/citation.cfm?id=2818979)


[GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors](/publication/2015-10_gpu-computing-pipeline-inefficiencies-and-optimization-opportunities)

Joel Hestness, [Steve Keckler](/person/stephen-keckler), David A. Wood


[International Symposium on Workload Characterization (IISWC)](https://ieeexplore.ieee.org/document/7314150)


[Scavenger: Automating the Construction of Application-Optimized Memory Hierarchies](/publication/2015-09_scavenger-automating-construction-application-optimized-memory-hierarchies)

Hsin-Jung Yang, Kermin Fleming, Michael Adler, Felix Winterstein, [Joel Emer](/person/joel-emer)


[International Conference on Field Programmable Logic and Applications (FPL)](https://ieeexplore.ieee.org/abstract/document/7294018)


[Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures](/index.php/publication/2015-09_efficient-control-and-communication-paradigms-coarse-grained-spatial)

[Michael Pellauer](/index.php/person/michael-pellauer), [Angshuman Parashar](/index.php/person/angshuman-parashar), Michael Adler, Bushra Ahsan, Randy Almon, [Neal Crago](/index.php/person/neal-crago), Kermin Fleming, Mohit Gambhir, [Aamer Jaleel](/index.php/person/aamer-jaleel), Tushar Krishna, [Daniel Lustig](/index.php/person/daniel-lustig), Stephen Maresh, Vladimir Pavlov, Rachid Rayess, Antonia Zhai, [Joel Emer](/index.php/person/joel-emer)


[ACM Transactions on Computing Systems (TOCS)](https://dl.acm.org/doi/10.1145/2754930)


[MemcachedGPU: Scaling-up Scale-out Key-value Stores](/publication/2015-08_memcachedgpu-scaling-scale-out-key-value-stores)

Tayler Hetherington, [Mike O'Connor](/person/mike-o-connor), Tor Aamodt


[Sixth ACM Symposium on Cloud Computing (SoCC '15)](http://dl.acm.org/citation.cfm?id=2806836)


[Designing Efficient Heterogeneous Memory Architectures](/index.php/publication/2015-08_designing-efficient-heterogeneous-memory-architectures)

Evgeny Bolotin, [David Nellans](/index.php/person/david-nellans), Oreste Villa, [Mike O'Connor](/index.php/person/mike-o-connor), Alex Ramirez, [Steve Keckler](/index.php/person/stephen-keckler), [Mike O'Connor](/index.php/person/mike-o-connor)


[IEEE Micro](https://ieeexplore.ieee.org/document/7155441)


[A Variable Warp Size Architecture](/publication/2015-06_variable-warp-size-architecture)

Timothy Rogers, Daniel Johnson, [Mike O'Connor](/person/mike-o-connor), [Steve Keckler](/person/stephen-keckler)


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/2749469.2750410)


[Flexible Software Profiling of GPU Architectures](/publication/2015-06_flexible-software-profiling-gpu-architectures)

[Mark Stephenson](/person/mark-stephenson), [Siva Hari](/person/siva-hari), Yunsup Lee, Eiman Ebrahimi, Daniel Johnson, [David Nellans](/person/david-nellans), [Mike O'Connor](/person/mike-o-connor), [Steve Keckler](/person/stephen-keckler)


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/2749469.2750375)


[Locality-Driven Dynamic GPU Cache Bypassing](/index.php/publication/2015-06_locality-driven-dynamic-gpu-cache-bypassing)

Chao Li, Shuaiwen Leon Song, Hongwen Dai, [Siva Hari](/index.php/person/siva-hari), [Albert Sidelnik](/index.php/person/albert-sidelnik), Huiyang Zhou


[International Conference on Supercomputing (ICS)](https://dl.acm.org/doi/10.1145/2751205.2751237)


[Toggle-aware Compression for GPUs](/index.php/publication/2015-05_toggle-aware-compression-gpus)

Gennady Pekhimenko, Evgeny Bolotin, [Mike O'Connor](/index.php/person/mike-o-connor), Onur Mutlu, Todd C. Mowry, [Steve Keckler](/index.php/person/stephen-keckler)


[IEEE Computer Architecture Letters ( Volume: 14, Issue: 2, July-Dec. 1 2015 )](http://ieeexplore.ieee.org/document/7103282/)


[SASSIFI: Evaluating Resilience of GPU Applications](/publication/2015-03_sassifi-evaluating-resilience-gpu-applications)

[Siva Hari](/person/siva-hari), Timothy Tsai, [Mark Stephenson](/person/mark-stephenson), [Steve Keckler](/person/stephen-keckler), [Joel Emer](/person/joel-emer)


[Workshop on Silicon Errors in Logic - System Effects (SELSE-11)](https://selse.org/previous-workshops/2017-archive-2/2015-program/)


[In-Memory Graph Databases for Web-Scale Data](/index.php/publication/2015-03_memory-graph-databases-web-scale-data)

Vito Giovanni Castellana, Alessandro Morari, Jesse Weaver, Antonino Time, David Haglin, Oreste Villa, John Feo


[IEEE Computer](https://ieeexplore.ieee.org/document/7063171)


[Page Placement Strategies for GPUs within Heterogeneous Memory Systems](/publication/2015-03_page-placement-strategies-gpus-within-heterogeneous-memory-systems)

Neha Agarwal, [David Nellans](/person/david-nellans), [Mark Stephenson](/person/mark-stephenson), [Mike O'Connor](/person/mike-o-connor), [Steve Keckler](/person/stephen-keckler)


[International Conference on Architectural Support for Programming Languages and…](http://dl.acm.org/citation.cfm?id=2694381)


[Unlocking Bandwidth for GPUs in CC-NUMA systems](/index.php/publication/2015-02_unlocking-bandwidth-gpus-cc-numa-systems)

Neha Agarwal, [David Nellans](/index.php/person/david-nellans), [Mike O'Connor](/index.php/person/mike-o-connor), [Steve Keckler](/index.php/person/stephen-keckler), Thomas Wenisch


[International Symposium on High Performance Computer Architecture (HPCA)](http://ieeexplore.ieee.org/document/7056046/)


[Priority-Based Cache Allocation in Throughput Processors](/index.php/publication/2015-02_priority-based-cache-allocation-throughput-processors)

Dong Li, Minsoo Rhu, Daniel Johnson, [Mike O'Connor](/index.php/person/mike-o-connor), Mattan Erez, Donald Fussell, [Steve Keckler](/index.php/person/stephen-keckler)


[International Symposium on High Performance Computer Architecture (HPCA)](http://ieeexplore.ieee.org/document/7056024/)


[High Performing Cache Hierarchies for Server Workloads -- Relaxing Inclusion to Capture the Latency Benefits of Exclusive Caches](/publication/2015-02_high-performing-cache-hierarchies-server-workloads-relaxing-inclusion-capture)

[Aamer Jaleel](/person/aamer-jaleel), Joseph Nuzman, Adrian Moga, Simon C. Steely Jr., [Joel Emer](/person/joel-emer)


[International Symposium on High Performance Computer Architecture (HPCA)](https://ieeexplore.ieee.org/document/7056045)


### 2014 

[Arbitrary Modulus Indexing](/index.php/publication/2014-12_arbitrary-modulus-indexing)

Jeffrey R. Diamond, Donald S. Fussell, [Steve Keckler](/index.php/person/stephen-keckler)


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7011384)


[Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures](/publication/2014-12_exploring-design-space-spmd-divergence-management-data-parallel-architectures)

Yunsup Lee, Vinod Grover, Ronny Krashinsky, [Mark Stephenson](/person/mark-stephenson), [Steve Keckler](/person/stephen-keckler), Krste Asanovic


[International Symposium on Microarchitecture (MICRO)](https://doi.org/10.1109/MICRO.2014.48)


[Scaling the Power Wall: A Path to Exascale](/index.php/publication/2014-11_scaling-power-wall-path-exascale)

Oreste Villa, Daniel Johnson, [Mike O'Connor](/index.php/person/mike-o-connor), Evgeny Bolotin, [David Nellans](/index.php/person/david-nellans), Justin Luitjens, Nikolai Sakharnykh, Peng Wang, Paulius Micikevicius, Anthony Scudiero, [Steve Keckler](/index.php/person/stephen-keckler), [William Dally](/index.php/person/william-dally)


[SC '14](http://ieeexplore.ieee.org/abstract/document/7013055/)


[A Comparative Analysis of Microarchitecture Effects on CPU and GPU Memory System Behavior](/index.php/publication/2014-10_comparative-analysis-microarchitecture-effects-cpu-and-gpu-memory-system)

Joel Hestness, [Steve Keckler](/index.php/person/stephen-keckler), David A. Wood


[International Symposium on Workload Characterization (IISWC)](https://ieeexplore.ieee.org/document/6983054)


[Scaling Irregular Applications through Data Aggregation and Software Multithreading](/publication/2014-05_scaling-irregular-applications-through-data-aggregation-and-software)

Alessandro Morari, Antonino Tumeo, Daniel Chavarria-Miranda, Oreste Villa, Mateo Valero


[International Parallel and Distributed Processing Symposium (IPDPS)](https://ieeexplore.ieee.org/document/6877341)


[Measuring the Radiation Reliability of SRAM Structures in GPUs Designed for HPC](/publication/2014-04_measuring-radiation-reliability-sram-structures-gpus-designed-hpc)

Paolo Rech, Luigi Carro, Nicholas Wang, Timothy Tsai, [Siva Hari](/person/siva-hari), [Steve Keckler](/person/stephen-keckler)


[Workshop on Silicon Errors in Logic - System Effects (SELSE-10)](https://selse.org)


[Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications](/index.php/publication/2014-03_application-aware-memory-system-fair-and-efficient-execution-concurrent-gpgpu)

Adwait Jog, Evgeny Bolotin, Zvika Guz, Mike Parker, [Steve Keckler](/index.php/person/stephen-keckler), Mahmut T. Kandemir, Chita R. Das


[Workshop on General Purpose Processing Using GPUs (GPGPU-7)](http://dl.acm.org/citation.cfm?id=2576780)


### 2013 

[21st Century Digital Design Tools](/index.php/publication/2013-05_21st-century-digital-design-tools)

[William Dally](/index.php/person/william-dally), Chris Malachosky, [Steve Keckler](/index.php/person/stephen-keckler)


[Design Automation Conference (DAC)](https://ieeexplore.ieee.org/document/6560687)


[Convergence and Scalarization for Data-Parallel Architectures](/publication/2013-02_convergence-and-scalarization-data-parallel-architectures)

Yunsup Lee, Ronny Krashinsky, Vinod Grover, [Steve Keckler](/person/stephen-keckler), Krste Asanovic


[International Symposium on Code Generation and Optimization (CGO)](https://ieeexplore.ieee.org/document/6494995)


### 2012 

[Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor](/index.php/publication/2012-12_unifying-primary-cache-scratch-and-register-file-memories-throughput-processor)

Mark Gebhart, [Steve Keckler](/index.php/person/stephen-keckler), [Brucek Khailany](/index.php/person/brucek-khailany), Ronny Krashinsky, [William Dally](/index.php/person/william-dally)


[International Symposium on Microarchitecture (MICRO)](http://dl.acm.org/citation.cfm?id=2457489)


[A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors](/publication/2012-04_hierarchical-thread-scheduler-and-register-file-energy-efficient-throughput)

Mark Gebhart, Daniel R. Johnson, David Tarjan, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally), Erik Lindholm, Kevin Skadron


[ACM Transactions on Computer Systems (TOCS)](http://dl.acm.org/citation.cfm?id=2166882)


### 2011 

[A Compile-Time Managed Multi-Level Register File Hierarchy](/index.php/publication/2011-12_compile-time-managed-multi-level-register-file-hierarchy)

Mark Gebhart, [Steve Keckler](/index.php/person/stephen-keckler), [William Dally](/index.php/person/william-dally)


[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7851495)


[CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization](/index.php/publication/2011-11_cudadma-optimizing-gpu-memory-bandwidth-warp-specialization)

Michael Bauer, Henry Cook, [Brucek Khailany](/index.php/person/brucek-khailany)


[SC '11](https://dl.acm.org/doi/10.1145/2063384.2063400)


[GPUs and the Future of Parallel Computing](/index.php/publication/2011-09_gpus-and-future-parallel-computing)

[Steve Keckler](/index.php/person/stephen-keckler), [William Dally](/index.php/person/william-dally), [Brucek Khailany](/index.php/person/brucek-khailany), [Michael Garland](/index.php/person/michael-garland), David Glasco


[IEEE Micro](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6045685&tag=1)


[Energy-efficient Mechanisms for Managing Thread Context in Throughput Processors](/publication/2011-06_energy-efficient-mechanisms-managing-thread-context-throughput-processors)

Mark Gebhart, Daniel R. Johnson, David Tarjan, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally), Erik Lindholm, Kevin Skadron


[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/2000064.2000093)


### 2009 

[Increasing Memory Miss Tolerance for SIMD Cores](/publication/2009-11_increasing-memory-miss-tolerance-simd-cores)

David Tarjan, Jiayuan Meng, Kevin Skadron


[International Conference for High Performance Computing, Networking, Storage an…](https://dl.acm.org/doi/10.1145/1654059.1654082)


### 2007 

[The NVIDIA GeForce 8800 GPU](/index.php/publication/2007-05_nvidia-geforce-8800-gpu)

Erik Lindholm, Stuart Oberman


[2007 IEEE Hot Chips 19 Symposium](https://ieeexplore.ieee.org/abstract/document/7482490)


 ### Researchers

 
[Aamer Jaleel](/person/aamer-jaleel)


[Athinagoras Skiadopoulos](/person/athinagoras-skiadopoulos)


[Ben Keller](/person/ben-keller)


[Benjamin Klenk](/person/ben-klenk)


[Brucek Khailany](/person/brucek-khailany)


[Carlos Villavieja](/person/carlos-villavieja)


[Christos Kozyrakis](/index.php/person/christos-kozyrakis)


[Daniel Lustig](/person/daniel-lustig)


[David Nellans](/person/david-nellans)


[Dennis Abts](/index.php/person/dennis-abts)


[Edward Suh](/person/edward-suh)


[Guillermo Marcus](/person/guillermo-marcus)


[Hans Eberle](/index.php/person/hans-eberle)


[Harini Muthukrishnan](/index.php/person/harini-muthukrishnan)


[Hasan Nazim Genc](/person/hasan-nazim-genc)


[Isaac Gelado](/person/isaac-gelado)


[Josef Spjut](/index.php/person/josef-spjut)


[Juan Gomez Luna](/person/juan-gomez-luna)


[Mark Stephenson](/index.php/person/mark-stephenson)


[Matthias Blumrich](/index.php/person/matthias-blumrich)


[Matthijs Van keirsbilck](/person/matthijs-van-keirsbilck)


[Michael Davies](/person/michael-davies)


[Michael Pellauer](/person/michael-pellauer)


[Mike Sullivan](/person/mike-sullivan)


[Mohamed Tarek Ibn Ziad](/person/mohamed-tarek-ibn-ziad)


[Nathaniel Pinckney](/person/nathaniel-pinckney)


[Nicolai Oswald](/person/nicolai-oswald)


[Po-An Tsai](/person/po-an-tsai)


[Qijing Jenny Huang](/person/qijing-jenny-huang)


[Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan)


[Sana Damani](/person/sana-damani)


[Simon Cooksey](/person/simon-cooksey)


[Siva Hari](/person/siva-hari)


[Song Han](/person/song-han)


[Steve Keckler](/person/stephen-keckler)


[Vignesh Balaji](/index.php/person/vignesh-balaji)


[Vinu Joseph](/person/vinu-joseph)


[Wen-mei Hwu](/index.php/person/wen-mei-hwu)


[William Dally](/person/william-dally)


[Yaosheng Fu](/person/yaosheng-fu)


[Zachary Susskind](/person/zachary-susskind)