Semi-supervised training in low-resource ASR and KWS

In particular for "low resource" Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates...

Full description

Saved in:

Bibliographic Details
Published in:	2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 4699 - 4703
Main Authors:	Metze, Florian, Gandhe, Ankur, Yajie Miao, Sheikh, Zaid, Yun Wang, Di Xu, Hao Zhang, Jungsuk Kim, Lane, Ian, Won Kyum Lee, Stuker, Sebastian, Muller, Markus
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-04-2015
Subjects:	Acoustics automatic speech recognition Feature extraction Keyword search low-resource LTs semi-supervised training Speech Speech recognition spoken term detection Training Vocabulary
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In particular for "low resource" Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of experiments on low-resource languages in telephony speech quality in Assamese, Bengali, Lao, Haitian, Zulu, and Tamil, demonstrating the impact that such techniques can have, in particular learning robust bottle-neck features on the test data. In the case of Tamil, when significantly more test data than training data is available, we integrated semi-supervised training and speaker adaptation on the test data, and achieved significant additional improvements in STT and KWS.
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2015.7178862