1. [Publications](/publications)
2. Efficient Transformer Inference with Statically Structured Sparse Attention
 
 # Efficient Transformer Inference with Statically Structured Sparse Attention

  ![](/sites/default/files/styles/wide/public/publications/Screenshot%202023-11-07%20191858.png?itok=6_fGh0Gg)

 Self-attention matrices of Transformers are often highly sparse because the relevant context of each token is typically limited to just a few other tokens in the sequence. To reduce the computational burden of self-attention on Transformer inference, we propose static, structured, sparse attention masks that split attention matrices into dense regions, skipping computations outside these regions while reducing computations inside these regions. To support the proposed mask structure, we design an entropy-aware finetuning algorithm to naturally encourage attention sparsity while maximizing task accuracy. Furthermore, we extend a typical dense deep learning accelerator to efficiently exploit our structured sparsity pattern. Compared to a dense baseline, we achieve 56.6% reduction in energy consumption, 58.9% performance improvement with &lt;1% accuracy loss and 2.6% area overhead.


 ## Authors


[Steve Dai](/person/steve-dai)

Hasan Genc (UC Berkeley)

[Rangharajan Venkatesan](/person/rangharajan-venkatesan)

[Brucek Khailany](/person/brucek-khailany)

 
 ## Publication Date


Sunday, July 9, 2023

 
 ## Published in


[2023 60th ACM/IEEE Design Automation Conference (DAC)](https://ieeexplore.ieee.org/xpl/conhome/10247654/proceeding)

 
 ## Research Area


[Artificial Intelligence and Machine Learning ](/research-area/machine-learning-artificial-intelligence)

[Circuits and VLSI Design](/research-area/circuits)

[Computer Architecture](/research-area/computer-architecture)

[Generative AI](/research-area/generative-ai)

 
 ## External Links


[IEEE Proceeding](https://ieeexplore.ieee.org/abstract/document/10247993)

 
 ## Copyright


This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to <pubs-permissions@ieee.org>.