  Rangharajan Venkatesan  

 



  ![](/sites/default/files/person/rangharajan-venkatesan.jpg)

  

 Rangharajan Venkatesan is a Senior Research Scientist in the ASIC &amp; VLSI Research group in NVIDIA. He received the B.Tech. degree in Electronics and Communication Engineering from the Indian Institute of Technology, Roorkee in 2009 and the Ph.D. degree in Electrical and Computer Engineering from Purdue University in August 2014. His research interests include variation-tolerant design methodologies, low power SoC design, machine learning, spintronic memories, and approximate computing. During his Ph.D., he was a recipient of Purdue’s Ross Fellowship for the year 2009–2010 and the Bilsland Dissertation Fellowship for the year 2013–2014. His work on spintronic memory design was recognized with the Best Paper Award at the International Symposium on Low Power Electronics and Design (ISLPED), 2012 and Best paper nomination at the Design, Automation and Test in Europe (DATE), 2017.



   Research Area(s)

[Circuits and VLSI Design](/index.php/research-area/circuits)

[Computer Architecture](/index.php/research-area/computer-architecture)

 

 

  

 Google Scholar

[https://scholar.google.com/citations?user=ca0D8ngAAAAJ&amp;hl=en](https://scholar.google.com/citations?user=ca0D8ngAAAAJ&hl=en)

 

  

 

 

 



 ### Publications

 

### 2026 

[Alpha-Vision: A Real-Time Always-on Vision Processor with 787µs Face Detection Latency in &lt;5mW](/publication/2026-02_alpha-vision-real-time-always-vision-processor-787ms-face-detection-latency)

[Ben Keller](/person/ben-keller), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Muya Chang](/person/muya-chang), Thierry Tambe, [Nathaniel Pinckney](/person/nathaniel-pinckney), [Stephen Tell](/person/stephen-tell), [Qijing Jenny Huang](/person/qijing-jenny-huang), [Shalini De Mello](/person/shalini-de-mello), [Brucek Khailany](/person/brucek-khailany)



[ISSCC 2026](https://www.isscc.org/)









### 2023 

[Efficient Transformer Inference with Statically Structured Sparse Attention](/publication/2023-07_efficient-transformer-inference-statically-structured-sparse-attention)

[Steve Dai](/person/steve-dai), Hasan Genc, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany)



[2023 60th ACM/IEEE Design Automation Conference (DAC)](https://ieeexplore.ieee.org/xpl/conhome/10247654/proceeding)









[A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm](/publication/2023-01_956-topsw-deep-learning-inference-accelerator-vector-scaled-4-bit-quantization)

[Ben Keller](/person/ben-keller), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Stephen Tell](/person/stephen-tell), [Brian Zimmer](/person/brian-zimmer), [Charbel Sakr](/person/charbel-sakr), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany)



[Journal of Solid-State Circuits](https://ieeexplore.ieee.org/document/10019275)









### 2022 

[LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update](/publication/2022-12_lns-madam-low-precision-training-logarithmic-number-system-using-multiplicative)

Jiawei Zhao, [Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), Mustafa Ali, [Ming-Yu Liu](/person/ming-yu-liu), [Brucek Khailany](/person/brucek-khailany), [William Dally](/person/william-dally), Anima Anandkumar



[IEEE Transactions on Computers (Volume: 71, Issue: 12, 01 December 2022)](https://www.computer.org/csdl/journal/tc)









[Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training](/publication/2022-07_optimal-clipping-and-magnitude-aware-differentiation-improved-quantization)

[Charbel Sakr](/person/charbel-sakr), [Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Brucek Khailany](/person/brucek-khailany), [William Dally](/person/william-dally)



[2022 International Conference on Machine Learning (ICML)](https://arxiv.org/abs/2206.06501)









[A 17–95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm](/publication/2022-06_17-956-topsw-deep-learning-inference-accelerator-vector-scaled-4-bit)

[Ben Keller](/person/ben-keller), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Stephen Tell](/person/stephen-tell), [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany)



[2022 Symposium on VLSI Technology &amp; Circuits Digest of Technical Papers](https://www.vlsisymposium.org)









### 2021 

[Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers](/publication/2021-12_softermax-hardwaresoftware-co-design-efficient-softmax-transformers)

Jacob R. Stevens, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Brucek Khailany](/person/brucek-khailany), Anand Raghunathan



[Design Automation Conference (DAC) 2021](https://www.dac.com/)









[IPA: Floorplan-Aware SystemC Interconnect Performance Modeling and Generation for HLS-based SoCs](/index.php/publication/2021-11_ipa-floorplan-aware-systemc-interconnect-performance-modeling-and-generation)

[Nathaniel Pinckney](/index.php/person/nathaniel-pinckney), [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Ben Keller](/index.php/person/ben-keller), [Brucek Khailany](/index.php/person/brucek-khailany)



[IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’21)](https://iccad.com/)









[Simba: scaling deep-learning inference with chiplet-based architecture](/publication/2021-05_simba-scaling-deep-learning-inference-chiplet-based-architecture)

Yakun Sophia Shao, [Jason Clemons](/person/jason-clemons), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler)



[Communications of the ACM](https://dl.acm.org/doi/10.1145/3460227)



ACM Research Highlight





[VS-QUANT: Per-Vector Scaled Quantization for Accurate Low-Precision Neural Network Inference](/publication/2021-04_vs-quant-vector-scaled-quantization-accurate-low-precision-neural-network)

[Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Mark Haoxing Ren, [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Brucek Khailany](/person/brucek-khailany)



[MLSys 2021](https://mlsys.org/)









[Fair and Comprehensive Benchmarking of Machine Learning Processing Chips](/publication/2021-03_fair-and-comprehensive-benchmarking-machine-learning-processing-chips)

Geoffrey W. Burr, SukHwan Lim, Boris Murmann, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Marian Verhelst



<https://ieeexplore.ieee.org/document/9367229>









[Verifying High-Level Latency-Insensitive Designs with Formal Model Checking](/publication/2021-02_verifying-high-level-latency-insensitive-designs-formal-model-checking)

[Steve Dai](/person/steve-dai), Alicia Klinefelter, Mark Haoxing Ren, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Ben Keller](/person/ben-keller), [Nathaniel Pinckney](/person/nathaniel-pinckney), [Brucek Khailany](/person/brucek-khailany)



[arXiv](https://arxiv.org/abs/2102.06326)









### 2020 

[Accelerating Chip Design with Machine Learning](/publication/2020-09_accelerating-chip-design-machine-learning)

[Brucek Khailany](/person/brucek-khailany), Mark Haoxing Ren, [Steve Dai](/person/steve-dai), Saad Godil, [Ben Keller](/person/ben-keller), Robert Kirby, Alicia Klinefelter, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Yanqing Zhang](/person/yanqing-zhang), Bryan Catanzaro, [William Dally](/person/william-dally)



[IEEE Micro](https://ieeexplore.ieee.org/document/9205654)









[A 0.32–128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm](/publication/2020-01_032-128-tops-scalable-multi-chip-module-based-deep-neural-network-inference)

[Brian Zimmer](/person/brian-zimmer), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Yakun Sophia Shao, [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[IEEE Journal of Solid-State Circuits (JSSC)](https://ieeexplore.ieee.org/document/8959403)



JSSC 2020 Best Paper award





### 2019 

[MAGNet: A Modular Accelerator Generator for Neural Networks](/publication/2019-11_magnet-modular-accelerator-generator-neural-networks)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, Miaorong Wang, [Jason Clemons](/person/jason-clemons), [Steve Dai](/person/steve-dai), [Matt Fojtik](/person/matt-fojtik), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Yanqing Zhang](/person/yanqing-zhang), [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[International Conference On Computer Aided Design (ICCAD)](https://ieeexplore.ieee.org/document/8942127)









[Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture](/publication/2019-10_simba-scaling-deep-learning-inference-multi-chip-module-based-architecture)

Sophia Shao, [Jason Clemons](/person/jason-clemons), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler)



[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3352460.3358302)



Best Paper award, IEEE Micro Top Picks in Computer Architecture (Honorable Mention)





[A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator Designed with a High-Productivity VLSI Methodology](/publication/2019-08_011-pjop-032-128-tops-scalable-multi-chip-module-based-deep-neural-network)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, [Brian Zimmer](/person/brian-zimmer), [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[Hot Chips: A Symposium on High Performance Chips](http://www.hotchips.org/)









[A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm](/publication/2019-06_011-pjop-032-128-tops-scalable-multi-chip-module-based-deep-neural-network)

[Brian Zimmer](/person/brian-zimmer), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[Symposium on VLSI Circuits](https://ieeexplore.ieee.org/document/8778056)









[Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference](/publication/2019-06_analogmixed-signal-hardware-error-modeling-deep-learning-inference)

Angad S. Rekhi, [Brian Zimmer](/person/brian-zimmer), [Nikola Nedovic](/person/nikola-nedovic), Nigxi Liu, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Miaorong Wang, [Brucek Khailany](/person/brucek-khailany), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray)



[Design Automation Conference (DAC)](https://dac.com/)









[Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration](/publication/2019-04_buffets-efficient-and-composable-storage-idiom-explicit-decoupled-data)

[Michael Pellauer](/person/michael-pellauer), Yakun Sophia Shao, [Jason Clemons](/person/jason-clemons), [Neal Crago](/person/neal-crago), Kartik Hegde, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Keckler](/person/stephen-keckler), Christopher W. Fletcher, [Joel Emer](/person/joel-emer)



[International Conference on Architectural Support for Programming Languages and…](https://dl.acm.org/doi/10.1145/3297858.3304025)



IEEE Micro Top Picks in Computer Architecture (Honorable Mention)





[Timeloop: A Systematic Approach to DNN Accelerator Evaluation](/publication/2019-03_timeloop-systematic-approach-dnn-accelerator-evaluation)

[Angshuman Parashar](/person/angshuman-parashar), Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler), [Joel Emer](/person/joel-emer)



[International Symposium on Performance Analysis of Systems and Software (ISPASS)](https://ieeexplore.ieee.org/document/8695666)









### 2018 

[A Modular Digital VLSI Flow for High-Productivity SoC Design](/publication/2018-06_modular-digital-vlsi-flow-high-productivity-soc-design)

[Brucek Khailany](/person/brucek-khailany), Evgeni Krimer, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Jason Clemons](/person/jason-clemons), [Joel Emer](/person/joel-emer), [Matt Fojtik](/person/matt-fojtik), Alicia Klinefelter, [Michael Pellauer](/person/michael-pellauer), [Nathaniel Pinckney](/person/nathaniel-pinckney), Sophia Shao, Shreesha Srinath, Christopher Torng, Sam (Likun) Xi, [Yanqing Zhang](/person/yanqing-zhang), [Brian Zimmer](/person/brian-zimmer)



[Design Automation Conference (DAC)](https://dl.acm.org/doi/10.1145/3195970.3199846)









### 2017 

[SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks](/publication/2017-06_scnn-accelerator-compressed-sparse-convolutional-neural-networks)

[Angshuman Parashar](/person/angshuman-parashar), Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)



[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3079856.3080254)









[SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks](/publication/2017-05_scnn-accelerator-compressed-sparse-convolutional-neural-networks)

[Angshuman Parashar](/person/angshuman-parashar), Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)



[arXiv](https://arxiv.org/abs/1708.04485)









### 2016 

[A Real-time Energy-Efficient Superpixel Hardware Accelerator for Mobile Computer Vision Applications](/publication/2016-06_real-time-energy-efficient-superpixel-hardware-accelerator-mobile-computer)

Injoon Hong, [Jason Clemons](/person/jason-clemons), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Iuri Frosio](/person/iuri-frosio), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler)



[Design Automation Conference (DAC)](http://dl.acm.org/citation.cfm?id=2897974)