Prediction of LSTM-RNN Full Context States as a Subtask for N-Gram Feedforward Language Models
Long short-term memory (LSTM) recurrent neural network language models compress the full context of variable lengths into a fixed size vector. In this work, we investigate the task of predicting the LSTM hidden representation of the full context from a truncated n-gram context as a subtask for train...
Saved in:
Published in: | 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6104 - 6108 |
---|---|
Main Authors: | , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-04-2018
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Long short-term memory (LSTM) recurrent neural network language models compress the full context of variable lengths into a fixed size vector. In this work, we investigate the task of predicting the LSTM hidden representation of the full context from a truncated n-gram context as a subtask for training an n-gram feedforward language model. Since this approach is a form of knowledge distillation, we compare two methods. First, we investigate the standard transfer based on the Kullback-Leibler divergence of the output distribution of the feedforward model from that of the LSTM. Second, we minimize the mean squared error between the hidden state of the LSTM and that of the n-gram feedforward model. We carry out experiments on different subsets of the Switchboard speech recognition dataset for feedforward models with a short (5-gram) and a medium (10-gram) context length. We show that we get improvements in perplexity and word error rate of up to 8% and 4% relative for the medium model, while the improvements are only marginal for the short model. |
---|---|
ISSN: | 2379-190X |
DOI: | 10.1109/ICASSP.2018.8461743 |