Cross-lingual deep neural network based submodular unbiased data selection for low-resource keyword search

In this paper, we propose a cross-lingual deep neural network (DNN) based submodular unbiased data selection approach for low-resource keyword search (KWS). A small amount (e.g. one hour) of transcribed data is used to conduct cross-lingual transfer. The frame-level senone sequence activated by the...

Full description

Saved in:

Bibliographic Details
Published in:	2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6015 - 6019
Main Authors:	Chongjia Ni, Cheung-Chi Leung, Lei Wang, Haibo Liu, Feng Rao, Li Lu, Chen, Nancy F., Bin Ma, Haizhou Li
Format:	Conference Proceeding Journal Article
Language:	English
Published:	IEEE 01-03-2016
Subjects:	Acoustics active learning Conferences Electronics Indexes Keyword search keyword spotting Linguistics Manuals Neural networks Optimization Searching Signal processing Speech spoken term detection Submodular optimization Training data
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we propose a cross-lingual deep neural network (DNN) based submodular unbiased data selection approach for low-resource keyword search (KWS). A small amount (e.g. one hour) of transcribed data is used to conduct cross-lingual transfer. The frame-level senone sequence activated by the cross-lingual DNN is used to represent each untranscribed speech utterance. The proposed submodular function considers utterance length normalization and the feature distribution matched to a development set. Experiments are conducted by selecting 9 hours of Tamil speech for the 2014 NIST Open Keyword Search Evaluation (OpenKWS14). The proposed data selection approach provides 35.8% relative actual term weighted value (ATWV) improvement over random selection on the OpenKWS14 Evalpartl data set. Further analysis of the experimental results shows that both utterance length normalization and the feature distribution estimated from a development set deployed in the submodular function can suppress the preference to select long utterances. The selected utterances can cover a more diverse range of tri-phones, words, and acoustic variations from a wider set of utterances. Moreover, the wider coverage of words also benefits the acquired linguistic knowledge, which also contributes to improving KWS performance.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2
ISSN:	2379-190X
DOI:	10.1109/ICASSP.2016.7472832