Speech Recognition

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to attain human parity on several publicly available clean speech datasets. However, even state-of-the-art ASR systems experience performance degradation …

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR). Our methodology leverages both acoustic information and external linguistic representations to generate accurate speech …