What shall we do with an hour of data? Speech recognition for the un- and under-served languages of Common Voice

This technical report describes the methods and results of a three-week sprint to produce deployable speech recognition models for 31 under-served languages of the Common Voice project. We outline the preprocessing steps, hyperparameter selection, and resulting accuracy on official testing sets. In...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tyers, Francis M, Meyer, Josh
Format:	Journal Article
Language:	English
Published:	10-05-2021
Subjects:	Computer Science - Computation and Language Computer Science - Learning Computer Science - Sound
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This technical report describes the methods and results of a three-week sprint to produce deployable speech recognition models for 31 under-served languages of the Common Voice project. We outline the preprocessing steps, hyperparameter selection, and resulting accuracy on official testing sets. In addition to this we evaluate the models on multiple tasks: closed-vocabulary speech recognition, pre-transcription, forced alignment, and key-word spotting. The following experiments use Coqui STT, a toolkit for training and deployment of neural Speech-to-Text models.
DOI:	10.48550/arxiv.2105.04674