  Brucek Khailany  

 



  ![](/sites/default/files/person/KhailanyBio.jpg)

  

 Brucek Khailany joined NVIDIA in 2009 and currently leads the ASIC &amp; VLSI Research group. During his time at NVIDIA, he has contributed to projects within research and product groups on topics spanning computer architecture, unit micro-architecture, and ASIC and VLSI design techniques. Previously, Dr. Khailany was a Co-Founder and Principal Architect at Stream Processors, Inc. (SPI) where he led research and development activities related to highly-parallel programmable processor architectures. At SPI, he helped lead the development of the industry's first commercially-available stream processor architecture targeting signal and image processing applications. From 1997-2003, at Stanford University, he led the silicon implementation of the Imagine stream processor, a research chip that introduced the concepts of stream processing and efficient partitioned register organizations. He received his Ph.D. and Masters in Electrical Engineering from Stanford University and received B.S.E. degrees in Electrical Engineering and Computer Engineering from the University of Michigan.



   Research Area(s)

[Computer Architecture](/research-area/computer-architecture)

[Artificial Intelligence and Machine Learning ](/research-area/machine-learning-artificial-intelligence)

 

 

  

 Main Field of Interest

[Circuits and VLSI Design](/research-area/circuits)

 

  

 Google Scholar

[https://scholar.google.com/citations?hl=en&amp;user=c4-bwRcAAAAJ](https://scholar.google.com/citations?hl=en&user=c4-bwRcAAAAJ)

 

  

 

 

 



 ### Publications

 

### 2026 

[GalaxyDiT: Efficient Video Generation with Guidance Alignment and Adaptive Proxy in Diffusion Transformers](/index.php/publication/2026-07_galaxydit-efficient-video-generation-guidance-alignment-and-adaptive-proxy)

Zoey Song, [Steve Dai](/index.php/person/steve-dai), [Ben Keller](/index.php/person/ben-keller), [Brucek Khailany](/index.php/person/brucek-khailany)



[DAC 2026](https://dac.com/2026)









[Alpha-Vision: A Real-Time Always-on Vision Processor with 787µs Face Detection Latency in &lt;5mW](/publication/2026-02_alpha-vision-real-time-always-vision-processor-787ms-face-detection-latency)

[Ben Keller](/person/ben-keller), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Muya Chang](/person/muya-chang), Thierry Tambe, [Nathaniel Pinckney](/person/nathaniel-pinckney), [Stephen Tell](/person/stephen-tell), [Qijing Jenny Huang](/person/qijing-jenny-huang), [Shalini De Mello](/person/shalini-de-mello), [Brucek Khailany](/person/brucek-khailany)



[ISSCC 2026](https://www.isscc.org/)









### 2025 

[Augmenting Simulated Noisy Quantum Data Collection by Orders of Magnitude Using Pre-Trajectory Sampling with Batched Execution](/index.php/publication/2025-11_augmenting-simulated-noisy-quantum-data-collection-orders-magnitude-using-pre)

[Taylor Patti](/index.php/person/taylor-patti), Thien Nguyen, Justin Lietz, Alex McCaskey, [Brucek Khailany](/index.php/person/brucek-khailany)



<https://arxiv.org/abs/2504.16297>









[GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting](/index.php/publication/2025-06_gaurast-enhancing-gpu-triangle-rasterizers-accelerate-3d-gaussian-splatting)

Sixu Li, [Ben Keller](/index.php/person/ben-keller), Yingyan Celine Lin, [Brucek Khailany](/index.php/person/brucek-khailany)



[Design Automation Conference (DAC)](https://arxiv.org/abs/2503.16681)









[Marco: Configurable Graph-Based Task Solving and Multi-AI Agents Framework for Hardware Design](/publication/2025-06_marco-configurable-graph-based-task-solving-and-multi-ai-agents-framework)

[Chia-Tung (Mark) Ho](/person/chia-tung-mark-ho), Jing Gong, [Yunsheng Bai](/person/yunsheng-bai), [Chenhui Deng](/person/chenhui-deng), Mark Haoxing Ren, [Brucek Khailany](/person/brucek-khailany)













### 2024 

[VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool](/index.php/publication/2024-08_verilogcoder-autonomous-verilog-coding-agents-graph-based-planning-and-abstract)

[Chia-Tung (Mark) Ho](/index.php/person/chia-tung-mark-ho), Mark Haoxing Ren, [Brucek Khailany](/index.php/person/brucek-khailany)



[arXiv](https://arxiv.org/abs/2408.08927)









[GL0AM: GPU Accelerated Gate Level Logic Simulator](/publication/2024-06_gl0am-gpu-accelerated-gate-level-logic-simulator)

[Yanqing Zhang](/person/yanqing-zhang), Mark Haoxing Ren, [Brucek Khailany](/person/brucek-khailany)













### 2023 

[ChipNeMo: Domain-Adapted LLMs for Chip Design](/publication/2023-10_chipnemo-domain-adapted-llms-chip-design)

[Mingjie Liu](/person/mingjie-liu), Teo Ene, Robert Kirby, Chris Cheng, [Nathaniel Pinckney](/person/nathaniel-pinckney), [Rongjian Liang](/person/rongjian-liang), Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, [Brucek Khailany](/person/brucek-khailany), George Kokai, Kishor Kunal, Xiaowei Li, Charley Lind, Hao Liu, Stuart Oberman, Sujeet Omar, Sreedhar Pratty, Jonathan Raman, Ambar Sarkar, Zhengjiang Shao, Hanfei Sun, Pratik P Suthar, Varun Tej, [Walker Turner](/person/walker-turner), Kaizhe Xu, Mark Haoxing Ren













[VerilogEval: Evaluating Large Language Models for Verilog Code Generation](/publication/2023-09_verilogeval-evaluating-large-language-models-verilog-code-generation)

[Mingjie Liu](/person/mingjie-liu), [Nathaniel Pinckney](/person/nathaniel-pinckney), [Brucek Khailany](/person/brucek-khailany), Mark Haoxing Ren



[2023 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)](https://arxiv.org/abs/2309.07544)









[Late Breaking Results: Test Selection For RTL Coverage By Unsupervised Learning From Fast Functional Simulation](/publication/2023-07_late-breaking-results-test-selection-rtl-coverage-unsupervised-learning-fast)

[Rongjian Liang](/person/rongjian-liang), [Nathaniel Pinckney](/person/nathaniel-pinckney), Yuji Chai, Mark Haoxing Ren, [Brucek Khailany](/person/brucek-khailany)



[60th Design Automation Conference](https://www.dac.com/)









[Efficient Transformer Inference with Statically Structured Sparse Attention](/index.php/publication/2023-07_efficient-transformer-inference-statically-structured-sparse-attention)

[Steve Dai](/index.php/person/steve-dai), Hasan Genc, [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Brucek Khailany](/index.php/person/brucek-khailany)



[2023 60th ACM/IEEE Design Automation Conference (DAC)](https://ieeexplore.ieee.org/xpl/conhome/10247654/proceeding)









[NVCell 2: Routability-Driven Standard Cell Layout in Advanced Nodes with Lattice Graph Routability Model](/index.php/publication/2023-03_nvcell-2-routability-driven-standard-cell-layout-advanced-nodes-lattice-graph)

[Chia-Tung (Mark) Ho](/index.php/person/chia-tung-mark-ho), Alvin Ho, [Matt Fojtik](/index.php/person/matt-fojtik), Minsoo Kim, Shang Wei, Yaguang LI, [Brucek Khailany](/index.php/person/brucek-khailany), Mark Haoxing Ren



[International Symposium on Physical Design 2023](https://ispd.cc/ispd2023/index.php)









[AutoDMP: Automated DREAMPlace-based Macro Placement](/index.php/publication/2023-03_autodmp-automated-dreamplace-based-macro-placement)

Anthony Agnesina, Puranjay Rajvanshi, Tian Yang, Geraldo Pradipta, Austin Jiao, [Ben Keller](/index.php/person/ben-keller), [Brucek Khailany](/index.php/person/brucek-khailany), Mark Haoxing Ren



[International Symposium on Physical Design 2023](https://ispd.cc/ispd2023/index.php)









[A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm](/publication/2023-01_956-topsw-deep-learning-inference-accelerator-vector-scaled-4-bit-quantization)

[Ben Keller](/person/ben-keller), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Stephen Tell](/person/stephen-tell), [Brian Zimmer](/person/brian-zimmer), [Charbel Sakr](/person/charbel-sakr), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany)



[Journal of Solid-State Circuits](https://ieeexplore.ieee.org/document/10019275)









### 2022 

[HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression](/publication/2022-12_heat-hardware-efficient-automatic-tensor-decomposition-transformer-compression)

Jiaqi Gu, [Ben Keller](/person/ben-keller), [Jean Kossaifi](/person/jean-kossaifi), Anima Anandkumar, [Brucek Khailany](/person/brucek-khailany), David Z. Pan



[Workshop on ML for Systems at NeurIPS](http://mlforsystems.org)



Spotlight Paper





[An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design](/publication/2022-12_adversarial-active-sampling-based-data-augmentation-framework-manufacturable)

[Mingjie Liu](/person/mingjie-liu), [Haoyu Yang](/person/haoyu-yang), Zongyi Li, Kumara Sastry, Saumyadip Mukhopadhyay, Selim Dogru, Anima Anandkumar, David Z. Pan, [Brucek Khailany](/person/brucek-khailany), Mark Haoxing Ren



[Workshop on ML for Systems at NeurIPS](http://mlforsystems.org/)









[LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update](/publication/2022-12_lns-madam-low-precision-training-logarithmic-number-system-using-multiplicative)

Jiawei Zhao, [Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), Mustafa Ali, [Ming-Yu Liu](/person/ming-yu-liu), [Brucek Khailany](/person/brucek-khailany), [William Dally](/person/william-dally), Anima Anandkumar



[IEEE Transactions on Computers (Volume: 71, Issue: 12, 01 December 2022)](https://www.computer.org/csdl/journal/tc)









[TransSizer: A Novel Transformer-Based Fast Gate Sizer](/index.php/publication/2022-10_transsizer-novel-transformer-based-fast-gate-sizer)

Siddhartha Nath, Geraldo Pradipta, Corey Hu, Tian Yang, [Brucek Khailany](/index.php/person/brucek-khailany), Mark Haoxing Ren



[2022 International Conference on Computer-Aided Design](https://iccad.com/)









[XT-PRAGGMA: Crosstalk Pessimism Reduction Accessible by GPU Gate-level Simulations and Machine Learning](/index.php/publication/2022-09_xt-praggma-crosstalk-pessimism-reduction-accessible-gpu-gate-level-simulations)

Vidya Chhabria, [Ben Keller](/index.php/person/ben-keller), [Yanqing Zhang](/index.php/person/yanqing-zhang), Sandeep Vollala, Sreedhar Patty, Mark Haoxing Ren, [Brucek Khailany](/index.php/person/brucek-khailany)



[MLCAD '22: Proceedings of the 2022 ACM/IEEE Workshop on Machine Learning for CAD](https://mlcad-workshop.org/)









[From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus](/index.php/publication/2022-08_rtl-cuda-gpu-acceleration-flow-rtl-simulation-batch-stimulus)

Dian-Lun Lin, Mark Haoxing Ren, [Yanqing Zhang](/index.php/person/yanqing-zhang), [Brucek Khailany](/index.php/person/brucek-khailany), Tsung-Wei Huang



[51st International Conference on Parallel Processing (ICPP '22)](https://icpp22.gitlabpages.inria.fr/)









[Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training](/publication/2022-07_optimal-clipping-and-magnitude-aware-differentiation-improved-quantization)

[Charbel Sakr](/person/charbel-sakr), [Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Brucek Khailany](/person/brucek-khailany), [William Dally](/person/william-dally)



[2022 International Conference on Machine Learning (ICML)](https://arxiv.org/abs/2206.06501)









[A 17–95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm](/publication/2022-06_17-956-topsw-deep-learning-inference-accelerator-vector-scaled-4-bit)

[Ben Keller](/person/ben-keller), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Stephen Tell](/person/stephen-tell), [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany)



[2022 Symposium on VLSI Technology &amp; Circuits Digest of Technical Papers](https://www.vlsisymposium.org)









[AutoCRAFT: Layout Automation for Custom Circuits in Advanced FinFET Technologies](/publication/2022-03_autocraft-layout-automation-custom-circuits-advanced-finfet-technologies)

Hao Chen, [Walker Turner](/person/walker-turner), [Sanquan Song](/person/sanquan-song), Keren Zhu, George Kokai, [Brian Zimmer](/person/brian-zimmer), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany), Mark Haoxing Ren



[International Symposium on Physical Design 2022](https://ispd.cc/ispd2022/slides/ispd2022.html)









[Generic Lithography Modeling with Dual-band Optics-Inspired Neural Networks](/publication/2022-03_generic-lithography-modeling-dual-band-optics-inspired-neural-networks)

[Haoyu Yang](/person/haoyu-yang), Zongyi Li, Kumara Sastry, Saumyadip Mukhopadhyay, [Mark Kilgard](/person/mark-kilgard), Anima Anandkumar, [Brucek Khailany](/person/brucek-khailany), Vivek Singh, Mark Haoxing Ren



[2022 Design Automation Conference](https://www.dac.com)









[GATSPI: GPU Accelerated Gate-Level Simulation for Power Improvement](/index.php/publication/2022-03_gatspi-gpu-accelerated-gate-level-simulation-power-improvement)

[Yanqing Zhang](/index.php/person/yanqing-zhang), Mark Haoxing Ren, Akshay Sridharan, [Brucek Khailany](/index.php/person/brucek-khailany)



[2022 Design Automation Conference](https://www.dac.com)









[Machine Learning and Algorithms: Let Us Team Up for EDA](/index.php/publication/2022-01_machine-learning-and-algorithms-let-us-team-eda)

Mark Haoxing Ren, [Brucek Khailany](/index.php/person/brucek-khailany), [Matt Fojtik](/index.php/person/matt-fojtik), [Yanqing Zhang](/index.php/person/yanqing-zhang)



[IEEE Design &amp; Test](https://ieee-ceda.org/publication/ieee-designtest)









### 2021 

[NVCell: Standard Cell Layout in Advanced Technology Nodes with Reinforcement Learning](/index.php/publication/2021-12_nvcell-standard-cell-layout-advanced-technology-nodes-reinforcement-learning)

Mark Haoxing Ren, [Matt Fojtik](/index.php/person/matt-fojtik), [Brucek Khailany](/index.php/person/brucek-khailany)



Design Automation Conference (DAC) 2021 (Invited special session paper)









[Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers](/publication/2021-12_softermax-hardwaresoftware-co-design-efficient-softmax-transformers)

Jacob R. Stevens, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Brucek Khailany](/person/brucek-khailany), Anand Raghunathan



[Design Automation Conference (DAC) 2021](https://www.dac.com/)









[IPA: Floorplan-Aware SystemC Interconnect Performance Modeling and Generation for HLS-based SoCs](/index.php/publication/2021-11_ipa-floorplan-aware-systemc-interconnect-performance-modeling-and-generation)

[Nathaniel Pinckney](/index.php/person/nathaniel-pinckney), [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Ben Keller](/index.php/person/ben-keller), [Brucek Khailany](/index.php/person/brucek-khailany)



[IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’21)](https://iccad.com/)









[Simba: scaling deep-learning inference with chiplet-based architecture](/publication/2021-05_simba-scaling-deep-learning-inference-chiplet-based-architecture)

Yakun Sophia Shao, [Jason Clemons](/person/jason-clemons), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler)



[Communications of the ACM](https://dl.acm.org/doi/10.1145/3460227)



ACM Research Highlight





[VS-QUANT: Per-Vector Scaled Quantization for Accurate Low-Precision Neural Network Inference](/publication/2021-04_vs-quant-vector-scaled-quantization-accurate-low-precision-neural-network)

[Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Mark Haoxing Ren, [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Brucek Khailany](/person/brucek-khailany)



[MLSys 2021](https://mlsys.org/)









[Verifying High-Level Latency-Insensitive Designs with Formal Model Checking](/publication/2021-02_verifying-high-level-latency-insensitive-designs-formal-model-checking)

[Steve Dai](/person/steve-dai), Alicia Klinefelter, Mark Haoxing Ren, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Ben Keller](/person/ben-keller), [Nathaniel Pinckney](/person/nathaniel-pinckney), [Brucek Khailany](/person/brucek-khailany)



[arXiv](https://arxiv.org/abs/2102.06326)









[MAVIREC: ML-Aided Vectored IR-Drop Estimation and Classification](/publication/2021-02_mavirec-ml-aided-vectored-ir-drop-estimation-and-classification)

Vidya A. Chhabria, [Yanqing Zhang](/person/yanqing-zhang), Mark Haoxing Ren, [Ben Keller](/person/ben-keller), [Brucek Khailany](/person/brucek-khailany), Sachin S. Sapatnekar



[2021 Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE)](https://www.date-conference.com/)









[Parasitic-Aware Analog Circuit Sizing with Graph Neural Networks and Bayesian Optimization](/publication/2021-02_parasitic-aware-analog-circuit-sizing-graph-neural-networks-and-bayesian)

Mingjie Liu, [Walker Turner](/person/walker-turner), George Kokai, David Z. Pan, [Brucek Khailany](/person/brucek-khailany), Mark Haoxing Ren



[2021 Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE)](https://www.date-conference.com/)









### 2020 

[NVCell: Generate Standard Cell Layout in Advanced Technology Nodes with Reinforcement Learning](/index.php/publication/2020-12_nvcell-generate-standard-cell-layout-advanced-technology-nodes-reinforcement)

Mark Haoxing Ren, [Matt Fojtik](/index.php/person/matt-fojtik), [Brucek Khailany](/index.php/person/brucek-khailany)



[ Workshop on ML for Systems at NeurIPS](https://mlforsystems.org/assets/papers/neurips2020/nvcell_ren_2020.pdf)









[Opportunities for RTL and Gate Level Simulation using GPUs](/publication/2020-11_opportunities-rtl-and-gate-level-simulation-using-gpus)

[Yanqing Zhang](/person/yanqing-zhang), Mark Haoxing Ren, [Brucek Khailany](/person/brucek-khailany)



[IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’20)](https://iccad.com/images/programs/2020_ICCAD_ConferenceProgram.pdf)









[Accelerating Chip Design with Machine Learning](/index.php/publication/2020-09_accelerating-chip-design-machine-learning)

[Brucek Khailany](/index.php/person/brucek-khailany), Mark Haoxing Ren, [Steve Dai](/index.php/person/steve-dai), Saad Godil, [Ben Keller](/index.php/person/ben-keller), Robert Kirby, Alicia Klinefelter, [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Yanqing Zhang](/index.php/person/yanqing-zhang), Bryan Catanzaro, [William Dally](/index.php/person/william-dally)



[IEEE Micro](https://ieeexplore.ieee.org/document/9205654)









[GRANNITE: Graph Neural Network Inference for Transferable Power Estimation](/publication/2020-07_grannite-graph-neural-network-inference-transferable-power-estimation)

[Yanqing Zhang](/person/yanqing-zhang), Mark Haoxing Ren, [Brucek Khailany](/person/brucek-khailany)



[Design Automation Conference (DAC) 2020](https://www.dac.com/)









[DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement](/index.php/publication/2020-06_dreamplace-deep-learning-toolkit-enabled-gpu-acceleration-modern-vlsi-placement)

Yibo Lin, Zixuan Jiang, Jiaqi Gu, Wuxi Li, Shounak Dhar, Mark Haoxing Ren, [Brucek Khailany](/index.php/person/brucek-khailany), David Z. Pan



[IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (…](https://ieeexplore.ieee.org/document/9122053)



2021 IEEE Transactions on Computer-Aided Design Donald O. Pederson Best Paper Award





[ABCDPlace: Accelerated Batch-based Concurrent Detailed Placement on Multi-threaded CPUs and GPUs](/index.php/publication/2020-02_abcdplace-accelerated-batch-based-concurrent-detailed-placement-multi-threaded)

Yibo Lin, Wuxi Li, Jiaqi Gu, Mark Haoxing Ren, [Brucek Khailany](/index.php/person/brucek-khailany), David Z. Pan



[IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (…](https://ieeexplore.ieee.org/document/8982049)









[A 0.32–128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm](/publication/2020-01_032-128-tops-scalable-multi-chip-module-based-deep-neural-network-inference)

[Brian Zimmer](/person/brian-zimmer), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Yakun Sophia Shao, [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[IEEE Journal of Solid-State Circuits (JSSC)](https://ieeexplore.ieee.org/document/8959403)



JSSC 2020 Best Paper award





[FIST: A Feature-Importance Sampling and Tree-Based Method for Automatic Design Flow Parameter Tuning](/index.php/publication/2020-01_fist-feature-importance-sampling-and-tree-based-method-automatic-design-flow)

Zhiyao Xie, Guan-Qi Fang, Yu-Hung Huang, Mark Haoxing Ren, [Yanqing Zhang](/index.php/person/yanqing-zhang), [Brucek Khailany](/index.php/person/brucek-khailany), Shao-Yun Fang, Jiang Hu, Yiran Chen, Erick Carvajal Barboza



[ASP-DAC 2020](https://aspdac2020.github.io/aspdac20/welcome/index.html)









[PowerNet: Transferable Dynamic IR Drop Estimation via Maximum Convolutional Neural Network](/index.php/publication/2020-01_powernet-transferable-dynamic-ir-drop-estimation-maximum-convolutional-neural)

Zhiyao Xie, Mark Haoxing Ren, [Brucek Khailany](/index.php/person/brucek-khailany), Ye Sheng, Santosh Santosh, Jiang Hu, Yiran Chen



[ASP-DAC 2020](https://aspdac2020.github.io/aspdac20/welcome/index.html)









### 2019 

[MAGNet: A Modular Accelerator Generator for Neural Networks](/publication/2019-11_magnet-modular-accelerator-generator-neural-networks)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, Miaorong Wang, [Jason Clemons](/person/jason-clemons), [Steve Dai](/person/steve-dai), [Matt Fojtik](/person/matt-fojtik), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Yanqing Zhang](/person/yanqing-zhang), [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[International Conference On Computer Aided Design (ICCAD)](https://ieeexplore.ieee.org/document/8942127)









[Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture](/publication/2019-10_simba-scaling-deep-learning-inference-multi-chip-module-based-architecture)

Sophia Shao, [Jason Clemons](/person/jason-clemons), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler)



[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3352460.3358302)



Best Paper award, IEEE Micro Top Picks in Computer Architecture (Honorable Mention)





[A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator Designed with a High-Productivity VLSI Methodology](/publication/2019-08_011-pjop-032-128-tops-scalable-multi-chip-module-based-deep-neural-network)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, [Brian Zimmer](/person/brian-zimmer), [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[Hot Chips: A Symposium on High Performance Chips](http://www.hotchips.org/)









[A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm](/publication/2019-06_011-pjop-032-128-tops-scalable-multi-chip-module-based-deep-neural-network)

[Brian Zimmer](/person/brian-zimmer), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[Symposium on VLSI Circuits](https://ieeexplore.ieee.org/document/8778056)









[DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement](/index.php/publication/2019-06_dreamplace-deep-learning-toolkit-enabled-gpu-acceleration-modern-vlsi-placement)

Yibo Lin, Shounak Dhar, Wuxi Li, Mark Haoxing Ren, [Brucek Khailany](/index.php/person/brucek-khailany), David Z. Pan



[Design Automation Conference (DAC) 2019](http://yibolin.com/publications/papers/PLACE_DAC2019_Lin.pdf)



DAC 2019 Best Paper Award





[PRIMAL: Power Inference using Machine Learning](/publication/2019-06_primal-power-inference-using-machine-learning)

Yuan Zhou, Mark Haoxing Ren, [Yanqing Zhang](/person/yanqing-zhang), [Ben Keller](/person/ben-keller), [Brucek Khailany](/person/brucek-khailany), Zhiru Zhang



[Design Automation Conference (DAC)](https://dac.com/)









[High Performance Graph Convolutional Networks with Applications in Testability Analysis](/publication/2019-06_high-performance-graph-convolutional-networks-applications-testability-analysis)

Yuzhe Ma, Mark Haoxing Ren, [Brucek Khailany](/person/brucek-khailany), Harbinder Sikka, Lijuan Luo, Karthikeyan Natarajan, Bei Yu



[Design Automation Conference (DAC)](https://dac.com/)









[Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference](/publication/2019-06_analogmixed-signal-hardware-error-modeling-deep-learning-inference)

Angad S. Rekhi, [Brian Zimmer](/person/brian-zimmer), [Nikola Nedovic](/person/nikola-nedovic), Nigxi Liu, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Miaorong Wang, [Brucek Khailany](/person/brucek-khailany), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray)



[Design Automation Conference (DAC)](https://dac.com/)









[A Fine-Grained GALS SoC with Pausible Adaptive Clocking in 16 nm FinFET](/publication/2019-05_fine-grained-gals-soc-pausible-adaptive-clocking-16-nm-finfet)

[Matt Fojtik](/person/matt-fojtik), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), [Stephen Tell](/person/stephen-tell), [Brian Zimmer](/person/brian-zimmer), Tezaswi Raja, Kevin Zhou, [William Dally](/person/william-dally), [Brucek Khailany](/person/brucek-khailany)



[ASYNC 2019](http://www.async2019.jp/)



ASYNC 2019 Best Paper Award





[Timeloop: A Systematic Approach to DNN Accelerator Evaluation](/publication/2019-03_timeloop-systematic-approach-dnn-accelerator-evaluation)

[Angshuman Parashar](/person/angshuman-parashar), Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler), [Joel Emer](/person/joel-emer)



[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/8695666)









### 2018 

[A Modular Digital VLSI Flow for High-Productivity SoC Design](/publication/2018-06_modular-digital-vlsi-flow-high-productivity-soc-design)

[Brucek Khailany](/person/brucek-khailany), Evgeni Krimer, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Jason Clemons](/person/jason-clemons), [Joel Emer](/person/joel-emer), [Matt Fojtik](/person/matt-fojtik), Alicia Klinefelter, [Michael Pellauer](/person/michael-pellauer), [Nathaniel Pinckney](/person/nathaniel-pinckney), Sophia Shao, Shreesha Srinath, Christopher Torng, Sam (Likun) Xi, [Yanqing Zhang](/person/yanqing-zhang), [Brian Zimmer](/person/brian-zimmer)



[Design Automation Conference (DAC)](https://dl.acm.org/doi/10.1145/3195970.3199846)









[Hardware-Enabled Artificial Intelligence](/publication/2018-06_hardware-enabled-artificial-intelligence)

[William Dally](/person/william-dally), [Tom Gray](/person/tom-gray), John Poulton, [Brucek Khailany](/person/brucek-khailany), [John Wilson](/person/john-wilson), [Larry Dennison](/person/larry-dennison)



Symposia on VLSI Technology and Circuits









### 2017 

[SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks](/publication/2017-06_scnn-accelerator-compressed-sparse-convolutional-neural-networks)

[Angshuman Parashar](/person/angshuman-parashar), Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)



[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3079856.3080254)









[SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks](/publication/2017-05_scnn-accelerator-compressed-sparse-convolutional-neural-networks)

[Angshuman Parashar](/person/angshuman-parashar), Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)



[arXiv](https://arxiv.org/abs/1708.04485)









### 2016 

[A Real-time Energy-Efficient Superpixel Hardware Accelerator for Mobile Computer Vision Applications](/index.php/publication/2016-06_real-time-energy-efficient-superpixel-hardware-accelerator-mobile-computer)

Injoon Hong, [Jason Clemons](/index.php/person/jason-clemons), [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Iuri Frosio](/index.php/person/iuri-frosio), [Brucek Khailany](/index.php/person/brucek-khailany), [Steve Keckler](/index.php/person/stephen-keckler)



[Design Automation Conference (DAC)](http://dl.acm.org/citation.cfm?id=2897974)









[Modeling and Analysis of Power Supply Noise Tolerance with Fine-grained GALS Adaptive Clocks](/index.php/publication/2016-05_modeling-and-analysis-power-supply-noise-tolerance-fine-grained-gals-adaptive)

Divya Akella Kamakshi, [Matt Fojtik](/index.php/person/matt-fojtik), [Brucek Khailany](/index.php/person/brucek-khailany), [Sudhir Kudva](/index.php/person/sudhir-kudva), Yaping Zhou, Benton H. Calhoun



[ASYNC 2016](http://www.inf.pucrs.br/async2016/)



ASYNC 2016 Best Paper Award Nominee





### 2015 

[A Pausible Bisynchronous FIFO for GALS Systems](/publication/2015-05_pausible-bisynchronous-fifo-gals-systems)

Ben Keller, [Matt Fojtik](/person/matt-fojtik), [Brucek Khailany](/person/brucek-khailany)



[ASYNC 2015](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7152683&tag=1)









### 2012 

[Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor](/publication/2012-12_unifying-primary-cache-scratch-and-register-file-memories-throughput-processor)

Mark Gebhart, [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany), Ronny Krashinsky, [William Dally](/person/william-dally)



[International Symposium on Microarchitecture (MICRO)](http://dl.acm.org/citation.cfm?id=2457489)









### 2011 

[CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization](/index.php/publication/2011-11_cudadma-optimizing-gpu-memory-bandwidth-warp-specialization)

Michael Bauer, Henry Cook, [Brucek Khailany](/index.php/person/brucek-khailany)



[SC '11](https://dl.acm.org/doi/10.1145/2063384.2063400)









[GPUs and the Future of Parallel Computing](/publication/2011-09_gpus-and-future-parallel-computing)

[Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally), [Brucek Khailany](/person/brucek-khailany), [Michael Garland](/person/michael-garland), David Glasco



[IEEE Micro](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6045685&tag=1)