Sequence Noise Injected Training for End-to-end Speech Recognition

We present a simple noise injection algorithm for training end-to-end ASR models which consists in adding to the spectra of training utterances the scaled spectra of random utterances of comparable length. We conjecture that the sequence information of the "noise" utterances is important a...

Full description

Saved in:
Bibliographic Details
Published in:ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6261 - 6265
Main Authors: Saon, George, Tuske, Zoltan, Audhkhasi, Kartik, Kingsbury, Brian
Format: Conference Proceeding
Language:English
Published: IEEE 01-05-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present a simple noise injection algorithm for training end-to-end ASR models which consists in adding to the spectra of training utterances the scaled spectra of random utterances of comparable length. We conjecture that the sequence information of the "noise" utterances is important and verify this via a contrast experiment where the frames of the utterances to be added are randomly shuffled. Experiments for both CTC and attention-based models show that the pro-posed scheme results in up to 9% relative word error rate improvements (depending on the model and test set) on the Switchboard 300 hours English conversational telephony database. Additionally, we set a new benchmark for attention-based encoder-decoder models on this corpus.
ISSN:2379-190X
DOI:10.1109/ICASSP.2019.8683706