Speech Enhancement

A Unified Latency-Flexible Framework with a Causal Mamba Core for Multichannel Speech Enhancement

We propose a unified and latency-flexible framework for multichannel speech enhancement in CHiME-9 Task 2 (ECHI), built upon a decoupled Shell-Core architecture. A latency-aware DenseNet-style Shell performs local spectral-spatial modeling across …

Leveraging Mamba with Full-Face Vision for Audio-Visual Speech Enhancement

Recent Mamba-based models have shown promise in speech enhancement by efficiently modeling long-range temporal dependencies. However, models like Speech Enhancement Mamba (SEMamba) remain limited to single-speaker scenarios and struggle in complex …

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech

Speech quality estimation has recently undergone a paradigm shift from human- hearing expert designs to machine-learning models. However, current models rely mainly on supervised learning, which is time-consuming and expensive for label collection. …