Boosting systems for large vocabulary continuous speech recognition

► We apply the Adaboost algorithm to large vocabulary continuous speech recognition. ► Acoustic models are trained sequentially on re-weighted data. ► Phonetic decision trees are also included in the boosting procedure. ► We study the impact of boosting for ML and discriminatively trained models. ►...

Full description

Saved in:

Bibliographic Details
Published in:	Speech communication Vol. 54; no. 2; pp. 212 - 218
Main Authors:	Saon, George, Soltau, Hagen
Format:	Journal Article
Language:	English
Published:	Amsterdam Elsevier B.V 01-02-2012 Elsevier
Subjects:	Acoustic modeling Acoustics Algorithms Applied sciences Boosting Decision trees Exact sciences and technology Information, signal and communications theory Iterative methods Recognition Signal processing Speech processing Speech recognition Statistics Telecommunications and information theory Training Trains Boosting Speech recognition Acoustic modeling Performance evaluation Discriminant analysis Probabilistic approach Iterative method Cable television Acoustic method Decision tree Learning English Multiple models approach Arabic Gaussian process Linear combination Audiovisual document Hidden Markov models News Maximum likelihood Speech processing Learning algorithm
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	► We apply the Adaboost algorithm to large vocabulary continuous speech recognition. ► Acoustic models are trained sequentially on re-weighted data. ► Phonetic decision trees are also included in the boosting procedure. ► We study the impact of boosting for ML and discriminatively trained models. ► Accuracy gains on English and Arabic broadcast news transcription are obtained. We employ a variant of the popular Adaboost algorithm to train multiple acoustic models such that the aggregate system exhibits improved performance over the individual recognizers. Each model is trained sequentially on re-weighted versions of the training data. At each iteration, the weights are decreased for the frames that are correctly decoded by the current system. These weights are then multiplied with the frame-level statistics for the decision trees and Gaussian mixture components of the next iteration system. The composite system uses a log-linear combination of HMM state observation likelihoods. We report experimental results on several broadcast news transcription setups which differ in the language being spoken (English and Arabic) and amounts of training data. Additionally, we study the impact of boosting on maximum likelihood (ML) and discriminatively trained acoustic models. Our findings suggest that significant gains can be obtained for small amounts of training data even after feature and model-space discriminative training.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2011.07.011