Audio-visual Learning

Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e., the audio and visual modality are both assumed to signal the prediction target. With the Look, …