1. [Publications](/index.php/publications)
2. GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
 
 # GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation

  ![Publication image](/sites/default/files/styles/wide/public/default_images/default.jpeg?itok=qUFsuJCP "Publication image")

 We present the Group Propagation Vision Transformer (GPViT): a novel non- hierarchical (i.e. non-pyramidal) transformer model designed for general visual recognition with high-resolution features. High-resolution features (or tokens) are a natural fit for tasks that involve perceiving fine-grained details such as detection and segmentation, but exchanging global information between these features is expensive in memory and computation because of the way self-attention scales. We provide a highly efficient alternative Group Propagation Block (GP Block) to exchange global information. In each GP Block, features are first grouped to- gether by a fixed number of learnable group tokens; we then perform Group Propagation where global information is exchanged between the grouped fea- tures; finally, global information in the updated grouped features is returned back to the image features through a transformer decoder. We evaluate GPViT on a variety of visual recognition tasks including image classification, semantic seg- mentation, object detection, and instance segmentation. Our method achieves significant performance gains over previous works across all tasks, especially on tasks that require high-resolution outputs, for example, our GPViT-L3 out- performs Swin Transformer-B by 2.0 mIoU on ADE20K semantic segmentation with only half as many parameters. Code and pre-trained models are available at <https://github.com/ChenhongyiYang/GPViT>.


 ## Authors


Chenhongyi Yang ( University of Edinburgh)

 Jiarui Xu (University of California at San Diego)

[Shalini De Mello](/index.php/person/shalini-de-mello)

Elliot J. Crowley ( University of Edinburgh)

Xiaolong Wang (University of California at San Diego)

 
 ## Publication Date


Monday, May 1, 2023

 
 ## Published in


[International Conference on Learning Representations (ICLR) 2023](https://iclr.cc/virtual/2023/poster/11986)

 
 ## Research Area


[Artificial Intelligence and Machine Learning ](/index.php/research-area/machine-learning-artificial-intelligence)

[Computer Vision](/index.php/research-area/computer-vision)

 
 ## External Links


[Code](https://github.com/ChenhongyiYang/GPViT)

[ArXiv](https://arxiv.org/abs/2212.06795)

 
 ## Uploaded Files


[Paper](https://d1qx31qr3h6wln.cloudfront.net/publications/2023_GPVIT_A%20HIGH%20RESOLUTION%20NON-HIERARCHICAL%20VISION%20TRANSFORMER%20WITH%20GROUP%20PROPAGATION.pdf "Open file in new window")13.72 MB

 
 ## Awards


Notable top 25%

Oral