SNAP: A 1.67 – 21.55 TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference in 16nm CMOS

Publication image

A Sparse Neural Acceleration Processor (SNAP) is designed to exploit unstructured sparsity in deep neural networks (DNNs). SNAP uses parallel associative search to discover input pairs to maintain an average 75% hardware utilization. SNAP's two-level partial sum reduce eliminates access contention and cuts the writeback traffic by 22×. Through diagonal and row configurations of PE arrays, SNAP supports any CONV and FC layers. A 2.4mm^2 16nm SNAP test chip is measured to achieve a peak effectual efficiency of 21.55TOPS/W (16b) at 0.55V and 260MHz for CONV layers with 10% weight and activation density. Operating on pruned ResNet-50, SNAP achieves 90.98fps at 0.80V and 480MHz, dissipating 348mW.

Authors

Jie-Fang Zhang (University of Michigan)
Ching-En Lee (University of Michigan)
Chester Liu (University of Michigan)
Yakun Sophia Shao (NVIDIA)
Zhengya Zhang (University of Michigan)

Publication Date

Research Area

Uploaded Files