Skip to content

Tools

NeMo Forced Aligner and its application to word alignment for subtitle generation

We present NeMo Forced Aligner (NFA): an efficient and accurate forced aligner which is part of the NeMo conversational AI open-source toolkit. NFA can produce token, word, and segment-level alignments, and can generate subtitle files for highlighting words or tokens as they are spoken. We present a demo which shows this functionality, and demonstrate that NFA has the best word alignment accuracy and speed of alignment generation compared with other aligners.


A Toolbox for Construction and Analysis of Speech Datasets

Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets. In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on Kürzinger et al. work, and, to the best of our knowledge, the dataset exploration tool is the world’s first open-source tool of this kind. We demonstrate how to apply these tools to create a Russian speech dataset and analyze existing speech datasets (Multilingual LibriSpeech, Mozilla Common Voice). The tools are open sourced as a part of the NeMo framework.