SNAP: A 1.67 – 21.55 TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference in 16nm CMOS

A Sparse Neural Acceleration Processor (SNAP) is designed to exploit unstructured sparsity in deep neural networks (DNNs). SNAP uses parallel associative search to discover input pairs to maintain an average 75% hardware utilization. SNAP's two-level partial sum reduce eliminates access contention and cuts the writeback traffic by 22×. Through diagonal and row configurations of PE arrays, SNAP supports any CONV and FC layers. A 2.4mm^2 16nm SNAP test chip is measured to achieve a peak effectual efficiency of 21.55TOPS/W (16b) at 0.55V and 260MHz for CONV layers with 10% weight and activation density. Operating on pruned ResNet-50, SNAP achieves 90.98fps at 0.80V and 480MHz, dissipating 348mW.

Authors

Jie-Fang Zhang (University of Michigan)

Ching-En Lee (University of Michigan)

Chester Liu (University of Michigan)

Yakun Sophia Shao (NVIDIA)

Steve Keckler

Zhengya Zhang (University of Michigan)

Publication Date

Sunday, June 9, 2019

Published in

Symposia on VLSI Technology and Circuits (VLSI)

Research Area

Computer Architecture

External Links

IEEE Digital Library

Uploaded Files

Published manuscript1.25 MB

Copyright

This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org.