  William Dally  

 



  ![](/sites/default/files/person/william-dally.jpg)

  

 Bill Dally joined NVIDIA in January 2009 as chief scientist, after spending 12 years at Stanford University, where he was chairman of the computer science department. Dally and his Stanford team developed the system architecture, network architecture, signaling, routing and synchronization technology that is found in most large parallel computers today. Dally was previously at the Massachusetts Institute of Technology from 1986 to 1997, where he and his team built the J-Machine and the M-Machine, experimental parallel computer systems that pioneered the separation of mechanism from programming models and demonstrated very low overhead synchronization and communication mechanisms. From 1983 to 1986, he was at California Institute of Technology (CalTech), where he designed the MOSSIM Simulation Engine and the Torus Routing chip, which pioneered “wormhole” routing and virtual-channel flow control. He is a member of the National Academy of Engineering, a Fellow of the American Academy of Arts &amp; Sciences, a Fellow of the IEEE and the ACM, and has received the ACM Eckert-Mauchly Award, the IEEE Seymour Cray Award, and the ACM Maurice Wilkes award. He has published over 250 papers, holds over 120 issued patents, and is an author of four textbooks. Dally received a bachelor's degree in Electrical Engineering from Virginia Tech, a master’s in Electrical Engineering from Stanford University and a Ph.D. in Computer Science from CalTech. He was a cofounder of Velio Communications and Stream Processors.



   Research Area(s)

[Circuits and VLSI Design](/index.php/research-area/circuits)

[Computer Architecture](/index.php/research-area/computer-architecture)

[High Performance Computing](/index.php/research-area/high-performance-computing)

[Artificial Intelligence and Machine Learning ](/index.php/research-area/machine-learning-artificial-intelligence)

[Networking](/index.php/research-area/networking)

[Programming Languages, Systems and Tools](/index.php/research-area/programming-languages-systems)

 

 

  

 Google Scholar

[https://scholar.google.com/citations?user=YZHj-Y4AAAAJ&amp;hl=en&amp;oi=sra](https://scholar.google.com/citations?user=YZHj-Y4AAAAJ&hl=en&oi=sra)

 

  

 

 

 



 ### Publications

 

### 2024 

[A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS](/publication/2024-04_0190-pjbit-252-gbswire-inverter-based-ac-coupled-transceiver-short-reach-die)

[Yoshinori Nishi](/person/yoshi-nishi), John W. Poulton, [Xi Chen](/person/xi-chen), [Sanquan Song](/person/sanquan-song), [Brian Zimmer](/person/brian-zimmer), [Walker Turner](/person/walker-turner), [Stephen Tell](/person/stephen-tell), [Nikola Nedovic](/person/nikola-nedovic), [John Wilson](/person/john-wilson), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray)



[IEEE Journal of Solid-State Circuits (JSSC) (Volume: 59, Issue: 4, April 2024)](https://ieeexplore.ieee.org/document/10185334)









### 2023 

[A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS](/publication/2023-06_0190-pjbit-252-gbswire-inverter-based-ac-coupled-transceiver-short-reach-die)

[Yoshinori Nishi](/person/yoshi-nishi), John W. Poulton, [Xi Chen](/person/xi-chen), [Sanquan Song](/person/sanquan-song), [Brian Zimmer](/person/brian-zimmer), [Walker Turner](/person/walker-turner), [Stephen Tell](/person/stephen-tell), [Nikola Nedovic](/person/nikola-nedovic), [John Wilson](/person/john-wilson), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray)



[2023 IEEE SYMPOSIUM ON VLSI TECHNOLOGY &amp; CIRCUITS](https://ieeexplore.ieee.org/abstract/document/10185334)









[A 0.297-pJ/Bit 50.4-Gb/s/Wire Inverter-Based Short-Reach Simultaneous Bi-Directional Transceiver for Die-to-Die Interface in 5-nm CMOS](/publication/2023-04_0297-pjbit-504-gbswire-inverter-based-short-reach-simultaneous-bi-directional)

[Yoshinori Nishi](/person/yoshi-nishi), John W. Poulton, [Walker Turner](/person/walker-turner), [Xi Chen](/person/xi-chen), [Sanquan Song](/person/sanquan-song), [Brian Zimmer](/person/brian-zimmer), [Stephen Tell](/person/stephen-tell), [Nikola Nedovic](/person/nikola-nedovic), [John Wilson](/person/john-wilson), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray)



[IEEE Journal of Solid-State Circuits ( Volume: 58, Issue: 4, April 2023)](https://ieeexplore.ieee.org/document/10011563)









[A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm](/publication/2023-01_956-topsw-deep-learning-inference-accelerator-vector-scaled-4-bit-quantization)

[Ben Keller](/person/ben-keller), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Stephen Tell](/person/stephen-tell), [Brian Zimmer](/person/brian-zimmer), [Charbel Sakr](/person/charbel-sakr), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany)



[Journal of Solid-State Circuits](https://ieeexplore.ieee.org/document/10019275)









### 2022 

[LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update](/publication/2022-12_lns-madam-low-precision-training-logarithmic-number-system-using-multiplicative)

Jiawei Zhao, [Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), Mustafa Ali, [Ming-Yu Liu](/person/ming-yu-liu), [Brucek Khailany](/person/brucek-khailany), [William Dally](/person/william-dally), Anima Anandkumar



[IEEE Transactions on Computers (Volume: 71, Issue: 12, 01 December 2022)](https://www.computer.org/csdl/journal/tc)









[Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training](/publication/2022-07_optimal-clipping-and-magnitude-aware-differentiation-improved-quantization)

[Charbel Sakr](/person/charbel-sakr), [Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Brucek Khailany](/person/brucek-khailany), [William Dally](/person/william-dally)



[2022 International Conference on Machine Learning (ICML)](https://arxiv.org/abs/2206.06501)









[A 0.297-pJ/bit 50.4-Gb/s/wire Inverter-Based Short-Reach Simultaneous Bidirectional Transceiver for Die-to-Die Interface in 5nm CMOS](/index.php/publication/2022-06_0297-pjbit-504-gbswire-inverter-based-short-reach-simultaneous-bidirectional)

[Yoshinori Nishi](/index.php/person/yoshi-nishi), John W. Poulton, [Xi Chen](/index.php/person/xi-chen), [Sanquan Song](/index.php/person/sanquan-song), [Brian Zimmer](/index.php/person/brian-zimmer), [Walker Turner](/index.php/person/walker-turner), [Stephen Tell](/index.php/person/stephen-tell), [Nikola Nedovic](/index.php/person/nikola-nedovic), [John Wilson](/index.php/person/john-wilson), [William Dally](/index.php/person/william-dally), [Tom Gray](/index.php/person/tom-gray)



[2022 IEEE SYMPOSIUM ON VLSI TECHNOLOGY &amp; CIRCUITS](https://archive.vlsisymposium.org/22web/about/)









[A 17–95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm](/publication/2022-06_17-956-topsw-deep-learning-inference-accelerator-vector-scaled-4-bit)

[Ben Keller](/person/ben-keller), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Steve Dai](/person/steve-dai), [Stephen Tell](/person/stephen-tell), [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany)



[2022 Symposium on VLSI Technology &amp; Circuits Digest of Technical Papers](https://www.vlsisymposium.org)









### 2021 

[Evolution of the Graphics Processing Unit (GPU)](/publication/2021-12_evolution-graphics-processing-unit-gpu)

[William Dally](/person/william-dally), [Steve Keckler](/person/stephen-keckler), David B. Kirk



[IEEE Micro Special Issue of the 50th Anniversary of the Microprocessor](https://ieeexplore.ieee.org/document/9623445)









[Simba: scaling deep-learning inference with chiplet-based architecture](/publication/2021-05_simba-scaling-deep-learning-inference-chiplet-based-architecture)

Yakun Sophia Shao, [Jason Clemons](/person/jason-clemons), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler)



[Communications of the ACM](https://dl.acm.org/doi/10.1145/3460227)



ACM Research Highlight





[VS-QUANT: Per-Vector Scaled Quantization for Accurate Low-Precision Neural Network Inference](/publication/2021-04_vs-quant-vector-scaled-quantization-accurate-low-precision-neural-network)

[Steve Dai](/person/steve-dai), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Mark Haoxing Ren, [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Brucek Khailany](/person/brucek-khailany)



[MLSys 2021](https://mlsys.org/)









### 2020 

[Accelerating Chip Design with Machine Learning](/publication/2020-09_accelerating-chip-design-machine-learning)

[Brucek Khailany](/person/brucek-khailany), Mark Haoxing Ren, [Steve Dai](/person/steve-dai), Saad Godil, [Ben Keller](/person/ben-keller), Robert Kirby, Alicia Klinefelter, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Yanqing Zhang](/person/yanqing-zhang), Bryan Catanzaro, [William Dally](/person/william-dally)



[IEEE Micro](https://ieeexplore.ieee.org/document/9205654)









[A 0.32–128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm](/publication/2020-01_032-128-tops-scalable-multi-chip-module-based-deep-neural-network-inference)

[Brian Zimmer](/person/brian-zimmer), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Yakun Sophia Shao, [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[IEEE Journal of Solid-State Circuits (JSSC)](https://ieeexplore.ieee.org/document/8959403)



JSSC 2020 Best Paper award





### 2019 

[MAGNet: A Modular Accelerator Generator for Neural Networks](/publication/2019-11_magnet-modular-accelerator-generator-neural-networks)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, Miaorong Wang, [Jason Clemons](/person/jason-clemons), [Steve Dai](/person/steve-dai), [Matt Fojtik](/person/matt-fojtik), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Yanqing Zhang](/person/yanqing-zhang), [Brian Zimmer](/person/brian-zimmer), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[International Conference On Computer Aided Design (ICCAD)](https://ieeexplore.ieee.org/document/8942127)









[Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture](/publication/2019-10_simba-scaling-deep-learning-inference-multi-chip-module-based-architecture)

Sophia Shao, [Jason Clemons](/person/jason-clemons), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brian Zimmer](/person/brian-zimmer), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Brucek Khailany](/person/brucek-khailany), [Steve Keckler](/person/stephen-keckler)



[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/doi/10.1145/3352460.3358302)



Best Paper award, IEEE Micro Top Picks in Computer Architecture (Honorable Mention)





[A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator Designed with a High-Productivity VLSI Methodology](/publication/2019-08_011-pjop-032-128-tops-scalable-multi-chip-module-based-deep-neural-network)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, [Brian Zimmer](/person/brian-zimmer), [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[Hot Chips: A Symposium on High Performance Chips](http://www.hotchips.org/)









[A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm](/publication/2019-06_011-pjop-032-128-tops-scalable-multi-chip-module-based-deep-neural-network)

[Brian Zimmer](/person/brian-zimmer), [Rangharajan Venkatesan](/person/rangharajan-venkatesan), Sophia Shao, [Jason Clemons](/person/jason-clemons), [Matt Fojtik](/person/matt-fojtik), [Ted Jiang](/person/ted-jiang), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), Priyanka Raina, [Stephen Tell](/person/stephen-tell), [Yanqing Zhang](/person/yanqing-zhang), [William Dally](/person/william-dally), [Joel Emer](/person/joel-emer), [Tom Gray](/person/tom-gray), [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany)



[Symposium on VLSI Circuits](https://ieeexplore.ieee.org/document/8778056)









[Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference](/index.php/publication/2019-06_analogmixed-signal-hardware-error-modeling-deep-learning-inference)

Angad S. Rekhi, [Brian Zimmer](/index.php/person/brian-zimmer), [Nikola Nedovic](/index.php/person/nikola-nedovic), Nigxi Liu, [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), Miaorong Wang, [Brucek Khailany](/index.php/person/brucek-khailany), [William Dally](/index.php/person/william-dally), [Tom Gray](/index.php/person/tom-gray)



[Design Automation Conference (DAC)](https://dac.com/)









[A Fine-Grained GALS SoC with Pausible Adaptive Clocking in 16 nm FinFET](/publication/2019-05_fine-grained-gals-soc-pausible-adaptive-clocking-16-nm-finfet)

[Matt Fojtik](/person/matt-fojtik), [Ben Keller](/person/ben-keller), Alicia Klinefelter, [Nathaniel Pinckney](/person/nathaniel-pinckney), [Stephen Tell](/person/stephen-tell), [Brian Zimmer](/person/brian-zimmer), Tezaswi Raja, Kevin Zhou, [William Dally](/person/william-dally), [Brucek Khailany](/person/brucek-khailany)



[ASYNC 2019](http://www.async2019.jp/)



ASYNC 2019 Best Paper Award





[A 1.17-pJ/b, 25-Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication Using a Process- and Temperature-Adaptive Voltage Regulator](/publication/2019-01_117-pjb-25-gbspin-ground-referenced-single-ended-serial-link-and-package)

John Poulton, [John Wilson](/person/john-wilson), [Walker Turner](/person/walker-turner), [Brian Zimmer](/person/brian-zimmer), [Xi Chen](/person/xi-chen), [Sudhir Kudva](/person/sudhir-kudva), [Sanquan Song](/person/sanquan-song), [Stephen Tell](/person/stephen-tell), [Nikola Nedovic](/person/nikola-nedovic), Wenxu Zhao, Sunil Sudhakaran, [Tom Gray](/person/tom-gray), [William Dally](/person/william-dally)



IEEE JOURNAL OF SOLID-STATE CIRCUITS









### 2018 

[Hardware-Enabled Artificial Intelligence](/index.php/publication/2018-06_hardware-enabled-artificial-intelligence)

[William Dally](/index.php/person/william-dally), [Tom Gray](/index.php/person/tom-gray), John Poulton, [Brucek Khailany](/index.php/person/brucek-khailany), [John Wilson](/index.php/person/john-wilson), [Larry Dennison](/index.php/person/larry-dennison)



Symposia on VLSI Technology and Circuits









[Ground-Referenced Signaling for Intra-Chip and Short-Reach Chip-to-Chip Interconnects](/publication/2018-04_ground-referenced-signaling-intra-chip-and-short-reach-chip-chip-interconnects)

[Walker Turner](/person/walker-turner), John Poulton, [John Wilson](/person/john-wilson), [Xi Chen](/person/xi-chen), [Stephen Tell](/person/stephen-tell), [Matt Fojtik](/person/matt-fojtik), [Trey Greer](/person/trey-greer), [Brian Zimmer](/person/brian-zimmer), [Sanquan Song](/person/sanquan-song), [Nikola Nedovic](/person/nikola-nedovic), [Sudhir Kudva](/person/sudhir-kudva), Sunil Sudhakaran, Rizwan Bashirullah, Wenxu Zhao, [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray)



Custom Integrated Circuits Conference









[A 1.17pJ/b 25Gb/s/pin Ground-Referenced Single Ended Serial Link for Off- and On-Package Communication in 16nm CMOS Using a Process- and Temperature-Adaptive Voltage Regulator](/publication/2018-02_117pjb-25gbspin-ground-referenced-single-ended-serial-link-and-package)

[John Wilson](/person/john-wilson), [Walker Turner](/person/walker-turner), John Poulton, [Brian Zimmer](/person/brian-zimmer), [Xi Chen](/person/xi-chen), [Sanquan Song](/person/sanquan-song), [Stephen Tell](/person/stephen-tell), [Nikola Nedovic](/person/nikola-nedovic), Wenxu Zhao, Sunil Sudhakaran, [Tom Gray](/person/tom-gray), [William Dally](/person/william-dally)



ISSCC









### 2017 

[Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems](/publication/2017-10_fine-grained-dram-energy-efficient-dram-extreme-bandwidth-systems)

[Mike O'Connor](/person/mike-o-connor), [Niladrish Chatterjee](/person/niladrish-chatterjee), [Donghyuk Lee](/person/donghyuk-lee), [John Wilson](/person/john-wilson), Aditya Agrawal, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)



[International Symposium on Microarchitecture (MICRO)](https://dl.acm.org/citation.cfm?id=3124545)









[SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks](/publication/2017-06_scnn-accelerator-compressed-sparse-convolutional-neural-networks)

[Angshuman Parashar](/person/angshuman-parashar), Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, [Rangharajan Venkatesan](/person/rangharajan-venkatesan), [Brucek Khailany](/person/brucek-khailany), [Joel Emer](/person/joel-emer), [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)



[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3079856.3080254)









[SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks](/index.php/publication/2017-05_scnn-accelerator-compressed-sparse-convolutional-neural-networks)

[Angshuman Parashar](/index.php/person/angshuman-parashar), Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, [Rangharajan Venkatesan](/index.php/person/rangharajan-venkatesan), [Brucek Khailany](/index.php/person/brucek-khailany), [Joel Emer](/index.php/person/joel-emer), [Steve Keckler](/index.php/person/stephen-keckler), [William Dally](/index.php/person/william-dally)



[arXiv](https://arxiv.org/abs/1708.04485)









[Architecting an Energy-Efficient DRAM System for GPUs](/publication/2017-02_architecting-energy-efficient-dram-system-gpus)

[Niladrish Chatterjee](/person/niladrish-chatterjee), [Mike O'Connor](/person/mike-o-connor), [Donghyuk Lee](/person/donghyuk-lee), Daniel Johnson, Minsoo Rhu, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)



[International Symposium on High Performance Computer Architecture (HPCA)](http://ieeexplore.ieee.org/document/7920815/)









### 2016 

[EIE: Efficient Inference Engine on Compressed Deep Neural Network](/publication/2016-06_eie-efficient-inference-engine-compressed-deep-neural-network)

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, [William Dally](/person/william-dally)



[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/3007787.3001163)









[Current parking regulator for zero droop/overshoot load transient response](/index.php/publication/2016-03_current-parking-regulator-zero-droopovershoot-load-transient-response)

[Sudhir Kudva](/index.php/person/sudhir-kudva), [William Dally](/index.php/person/william-dally), [Trey Greer](/index.php/person/trey-greer), [Tom Gray](/index.php/person/tom-gray)



Applied Power Electronics Conference and Exposition (APEC)









[A 6.5-to-23.3fJ/b/mm Balanced Charge-Recycling Bus in 16nm FinFET CMOS at 1.7-to-2.6Gb/s/wire with Clock Forwarding and Low-Crosstalk Contraflow Wiring](/publication/2016-02_65-233fjbmm-balanced-charge-recycling-bus-16nm-finfet-cmos-17-26gbswire-clock)

[John Wilson](/person/john-wilson), [Matt Fojtik](/person/matt-fojtik), John Poulton, [Xi Chen](/person/xi-chen), [Stephen Tell](/person/stephen-tell), [Trey Greer](/person/trey-greer), [Tom Gray](/person/tom-gray), [William Dally](/person/william-dally)



[International Solid-State Circuits Conference (ISSCC 2016)](http://ieeexplore.ieee.org/document/7417954/)









[A 28nm 2Mbit 6T SRAM with Highly Configurable Write Assist Implementation and Capacitor Based Sense Amplifier Input Offset Compen](/publication/2016-02_28nm-2mbit-6t-sram-highly-configurable-write-assist-implementation-and)

Mahmut Sinangil, John Poulton, [Matt Fojtik](/person/matt-fojtik), [Trey Greer](/person/trey-greer), [Stephen Tell](/person/stephen-tell), Andy Gotterba, Jesse Wang, Jason Golbus, [William Dally](/person/william-dally), [Tom Gray](/person/tom-gray)



Journal of Solid State Circuits









### 2015 

[Network Endpoint Congestion Control for Fine-Grained Communication](/index.php/publication/2015-11_network-endpoint-congestion-control-fine-grained-communication)

[Ted Jiang](/index.php/person/ted-jiang), [Larry Dennison](/index.php/person/larry-dennison), [William Dally](/index.php/person/william-dally)



[SC15](http://dl.acm.org/citation.cfm?id=2807600)









### 2014 

[Scaling the Power Wall: A Path to Exascale](/publication/2014-11_scaling-power-wall-path-exascale)

Oreste Villa, Daniel Johnson, [Mike O'Connor](/person/mike-o-connor), Evgeny Bolotin, [David Nellans](/person/david-nellans), Justin Luitjens, Nikolai Sakharnykh, Peng Wang, Paulius Micikevicius, Anthony Scudiero, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)



[SC '14](http://ieeexplore.ieee.org/abstract/document/7013055/)









### 2013 

[A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications](/publication/2013-12_054-pjb-20-gbs-ground-referenced-single-ended-short-reach-serial-link-28-nm)

John Poulton, [William Dally](/person/william-dally), [Xi Chen](/person/xi-chen), John Eyles, [Trey Greer](/person/trey-greer), [Stephen Tell](/person/stephen-tell), [John Wilson](/person/john-wilson), [Tom Gray](/person/tom-gray)



[Journal of Solid State Circuits](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6601723)









[21st Century Digital Design Tools](/publication/2013-05_21st-century-digital-design-tools)

[William Dally](/person/william-dally), Chris Malachosky, [Steve Keckler](/person/stephen-keckler)



[Design Automation Conference (DAC)](https://ieeexplore.ieee.org/document/6560687)









[A 0.54pJ/b 20Gb/s Ground-Referenced Single-Ended Short-Haul Serial Link in 28nm CMOS for Advanced Packaging Applications](/publication/2013-02_054pjb-20gbs-ground-referenced-single-ended-short-haul-serial-link-28nm-cmos)

John Poulton, [William Dally](/person/william-dally), [Xi Chen](/person/xi-chen), John Eyles, [Trey Greer](/person/trey-greer), [Stephen Tell](/person/stephen-tell), [Tom Gray](/person/tom-gray)



[ISSCC](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6487789)









### 2012 

[Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor](/publication/2012-12_unifying-primary-cache-scratch-and-register-file-memories-throughput-processor)

Mark Gebhart, [Steve Keckler](/person/stephen-keckler), [Brucek Khailany](/person/brucek-khailany), Ronny Krashinsky, [William Dally](/person/william-dally)



[International Symposium on Microarchitecture (MICRO)](http://dl.acm.org/citation.cfm?id=2457489)









[A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors](/publication/2012-04_hierarchical-thread-scheduler-and-register-file-energy-efficient-throughput)

Mark Gebhart, Daniel R. Johnson, David Tarjan, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally), Erik Lindholm, Kevin Skadron



[ACM Transactions on Computer Systems (TOCS)](http://dl.acm.org/citation.cfm?id=2166882)









### 2011 

[A Compile-Time Managed Multi-Level Register File Hierarchy](/publication/2011-12_compile-time-managed-multi-level-register-file-hierarchy)

Mark Gebhart, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally)



[International Symposium on Microarchitecture (MICRO)](https://ieeexplore.ieee.org/document/7851495)









[GPUs and the Future of Parallel Computing](/publication/2011-09_gpus-and-future-parallel-computing)

[Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally), [Brucek Khailany](/person/brucek-khailany), [Michael Garland](/person/michael-garland), David Glasco



[IEEE Micro](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6045685&tag=1)









[Energy-efficient Mechanisms for Managing Thread Context in Throughput Processors](/publication/2011-06_energy-efficient-mechanisms-managing-thread-context-throughput-processors)

Mark Gebhart, Daniel R. Johnson, David Tarjan, [Steve Keckler](/person/stephen-keckler), [William Dally](/person/william-dally), Erik Lindholm, Kevin Skadron



[International Symposium on Computer Architecture (ISCA)](https://dl.acm.org/doi/10.1145/2000064.2000093)









### 2010 

[The Even/Odd Synchronizer: A Fast, All-Digital Periodic Synchronizer](/publication/2010-05_evenodd-synchronizer-fast-all-digital-periodic-synchronizer)

[William Dally](/person/william-dally), [Stephen Tell](/person/stephen-tell)



[16th International Symposium on Asynchronous Circuits and Systems](https://ieeexplore.ieee.org/document/5476986)









### 2007 

[A 14-mW 6.25-Gb/s Transceiver in 90-nm CMOS](/publication/2007-12_14-mw-625-gbs-transceiver-90-nm-cmos)

John Poulton, Robert Palmer, Andy Fuller, [Trey Greer](/person/trey-greer), John Eyles, [William Dally](/person/william-dally), Mark Horowitz



[IEEE Journal of Solid-State Circuits](http://www.ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4)