1. [Publications](/publications)
2. Audio-Visual Segmentation
 
 # Audio-Visual Segmentation

  ![](/sites/default/files/styles/wide/public/publications/AVSBench.png?itok=x00YKXed)

 We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with this benchmark: 1) semi-supervised audio-visual segmentation with a single sound source and 2) fully-supervised audio-visual segmentation with multiple sound sources. To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process. We also design a regularization loss to encourage the audio-visual mapping during training. Quantitative and qualitative experiments on the AVSBench compare our approach to several existing methods from related tasks, demonstrating that the proposed method is promising for building a bridge between the audio and pixel-wise visual semantics.



 ## Authors



Jinxin Zhou (Australian National Univ.)

Yiran Zhong (Australian National Univ.)

[Stan Birchfield](/person/stan-birchfield)

et al. (NVIDIA)

 

 

 ## Publication Date



Monday, October 10, 2022

 

 ## Published in



[ECCV 2022](https://eccv2022.ecva.net/)

 

 ## Research Area



[Computer Vision](/research-area/computer-vision)

 

 

 ## External Links



[arXiv paper](https://arxiv.org/abs/2207.05042)

[code](https://github.com/OpenNLPLab/AVSBench)

[website](https://opennlplab.github.io/AVSBench/)