Prediction of LSTM-RNN Full Context States as a Subtask for N-Gram Feedforward Language Models

Long short-term memory (LSTM) recurrent neural network language models compress the full context of variable lengths into a fixed size vector. In this work, we investigate the task of predicting the LSTM hidden representation of the full context from a truncated n-gram context as a subtask for train...

Full description

Saved in:

Bibliographic Details
Published in:	2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6104 - 6108
Main Authors:	Irie, Kazuki, Lei, Zhihong, Schluter, Ralf, Ney, Hermann
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-04-2018
Subjects:	Computational modeling Context modeling Data models knowledge distillation language modeling Linear programming neural networks Speech recognition student-teacher Switches Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Long short-term memory (LSTM) recurrent neural network language models compress the full context of variable lengths into a fixed size vector. In this work, we investigate the task of predicting the LSTM hidden representation of the full context from a truncated n-gram context as a subtask for training an n-gram feedforward language model. Since this approach is a form of knowledge distillation, we compare two methods. First, we investigate the standard transfer based on the Kullback-Leibler divergence of the output distribution of the feedforward model from that of the LSTM. Second, we minimize the mean squared error between the hidden state of the LSTM and that of the n-gram feedforward model. We carry out experiments on different subsets of the Switchboard speech recognition dataset for feedforward models with a short (5-gram) and a medium (10-gram) context length. We show that we get improvements in perplexity and word error rate of up to 8% and 4% relative for the medium model, while the improvements are only marginal for the short model.
ISSN:	2379-190X
DOI:	10.1109/ICASSP.2018.8461743