Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling

We propose the use of speech attributes, such as voicing and aspiration, to address two key research issues in computer assisted pronunciation training (CAPT) for L2 learners, namely detecting mispronunciation and providing diagnostic feedback. To improve the performance we focus on mispronunciation...

Full description

Saved in:

Bibliographic Details
Published in:	2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6135 - 6139
Main Authors:	Wei Li, Siniscalchi, Sabato Marco, Chen, Nancy F., Chin-Hui Lee
Format:	Conference Proceeding Journal Article
Language:	English
Published:	IEEE 01-03-2016
Subjects:	Acoustics automatic speech attribute transcription (ASAT) automatic speech recognition (ASR) Classifiers computer assisted pronunciation training (CAPT) Computer simulation Computers Conferences deep neural network (DNN) Diagnostic systems Electronics Feature extraction Feedback Hidden Markov models mispronunciation detection and diagnosis Neural networks Speech Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We propose the use of speech attributes, such as voicing and aspiration, to address two key research issues in computer assisted pronunciation training (CAPT) for L2 learners, namely detecting mispronunciation and providing diagnostic feedback. To improve the performance we focus on mispronunciations occurred at the segmental and sub-segmental levels. In this study, speech attributes scores are first used to measure the pronunciation quality at a sub-segmental level, such as manner and place of articulation. These speech attribute scores are integrated by neural network classifiers to generate segmental pronunciation scores. Compared with the conventional phone-based GOP (Goodness of Pronunciation) system we implement with our dataset, the proposed framework reduces the equal error rate by 8.78% relative. Moreover, it attains comparable results to phone-based classifier approach to mispronunciation detection while providing comprehensive feedback, including segmental and sub-segmental diagnostic information, to help L2 learners.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2
ISSN:	2379-190X
DOI:	10.1109/ICASSP.2016.7472856