Audio-Visual

Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds