Modern speech quality prediction models are trained on audio data resampled to a specific sampling rate. When faced with higher-rate audio at test time, these models can produce biased scores. We introduce HighRateMOS, the first non-intrusive mean …
Data collected from the real world typically exhibit long-tailed distributions, where frequent classes contain abundant data while rare ones have only a limited number of samples. While existing supervised learning approaches have been proposed to …