Audio Flamingo 2
Published:
Understanding and reasoning over non-speech sounds and music are crucial for both humans and AI agents to interact effectively with their environments. In this paper, we introduce Audio Flamingo 2, an Audio-Language Model (ALM) with advanced long-audio understanding and reasoning capabilities. Audio Flamingo 2 achieves the state-of-the-art performance across over 20 benchmarks, with only a 3B parameter small language model. In addition, we propose training and test sets for long audio understanding capabilities – namely LongAudio and LongAudioBench – to advance this field, and Audio Flamingo 2 is the first ALM that can understand long audio up to 5 minutes. We confirm the efficacy of our method through extensive evaluations and ablation studies.