1. [Publications](/publications)
2. Simba: scaling deep-learning inference with chiplet-based architecture
 
 # Simba: scaling deep-learning inference with chiplet-based architecture

  ![](/sites/default/files/styles/wide/public/publications/RC18_photo_1_0.jpg?itok=T9tRWhtD)

 Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication. This work investigates and quantifies the costs and benefits of using MCMs with finegrained chiplets for deep learning inference, an application domain with large compute and on-chip storage requirements. To evaluate the approach, we architected, implemented, fabricated, and tested Simba, a 36-chiplet prototype MCM system for deep-learning inference. Each chiplet achieves 4 TOPS peak performance, and the 36-chiplet MCM package achieves up to 128 TOPS and up to 6.1 TOPS/W. The MCM is configurable to support a flexible mapping of DNN layers to the distributed compute and storage units. To mitigate inter-chiplet communication overheads, we introduce three tiling optimizations that improve data locality. These optimizations achieve up to 16% speedup compared to the baseline layer mapping. Our evaluation shows that Simba can process 1988 images/s running ResNet-50 with a batch size of one, delivering an inference latency of 0.50 ms.


 ## Authors


Yakun Sophia Shao (UC Berkeley)

[Jason Clemons](/person/jason-clemons)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan)

[Brian Zimmer](/person/brian-zimmer)

[Matt Fojtik](/person/matt-fojtik)

[Ted Jiang](/person/ted-jiang)

[Ben Keller](/person/ben-keller)

Alicia Klinefelter (NVIDIA)

[Nathaniel Pinckney](/person/nathaniel-pinckney)

Priyanka Raina (Stanford)

[Stephen Tell](/person/stephen-tell)

[Yanqing Zhang](/person/yanqing-zhang)

[William Dally](/person/william-dally)

[Joel Emer](/person/joel-emer)

[Tom Gray](/person/tom-gray)

[Brucek Khailany](/person/brucek-khailany)

[Steve Keckler](/person/stephen-keckler)

 
 ## Publication Date


Monday, May 24, 2021

 
 ## Published in


[Communications of the ACM](https://dl.acm.org/doi/10.1145/3460227)

 
 ## Research Area


[Artificial Intelligence and Machine Learning ](/research-area/machine-learning-artificial-intelligence)

[Circuits and VLSI Design](/research-area/circuits)

[Computer Architecture](/research-area/computer-architecture)

 
 ## External Links


[Published manuscript (ACM Digital Library)](https://dl.acm.org/doi/10.1145/3460227)

 
 ## Award


ACM Research Highlight

 
 ## Copyright


Copyright by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or <permissions@acm.org>. The definitive version of this paper can be found at ACM's Digital Library <http://www.acm.org/dl/>.