A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm

This work presents a scalable deep neural network (DNN) accelerator consisting of 36 chips connected in a mesh network on a multi-chip-module (MCM) using ground-referenced signaling (GRS). While previous accelerators fabricated on a single monolithic die are limited to specific network sizes, the proposed architecture enables flexible scaling for efficient inference on a wide range of DNNs, from mobile to data center domains. The 16nm prototype achieves 1.29 TOPS/mm^2, 0.11 pJ/op energy efficiency, 4.01 TOPS peak performance for a 1-chip system, and 127.8 peak TOPS and 2615 images/s ResNet-50 inference for a 36-chip system.

Authors

Brian Zimmer

Rangharajan Venkatesan

Sophia Shao (NVIDIA)

Alicia Klinefelter (NVIDIA)

Nathaniel Pinckney

Priyanka Raina (Stanford University)

Publication Date

Sunday, June 9, 2019

Published in

Symposium on VLSI Circuits

Research Area

Artificial Intelligence and Machine Learning

Circuits and VLSI Design

External Links

IEEE Digital Library

Uploaded Files

Published manuscriptshed525.33 KB

Copyright

This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org.