Large vocabulary audio-visual speech recognition using active shape models

Orthogonal information present in the video signal associated with the audio helps in improving the accuracy of a speech recognition system. Audio-visual speech recognition involves extraction of both the audio as well as visual features from the input signal. Extraction of visual parameters is done...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings 15th International Conference on Pattern Recognition. ICPR-2000 Vol. 3; pp. 106 - 109 vol.3
Main Authors:	Faruquie, T.A., Majumdar, A., Rajput, N., Subramaniam, L.V.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 2000
Subjects:	Active shape model Cepstral analysis Data mining Deformable models Facial features Feature extraction Humans Speech recognition Video sequences Vocabulary
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Orthogonal information present in the video signal associated with the audio helps in improving the accuracy of a speech recognition system. Audio-visual speech recognition involves extraction of both the audio as well as visual features from the input signal. Extraction of visual parameters is done by the recognition of speech dependent features from the video sequence. The paper uses geometrical features to describe the lip shapes. Curve-based active shape models are used to extract the geometry. These geometrically represented visual parameters are used along with the audio cepstral features to perform an audio-visual classification. It is shown that the bimodal system presented gives an improvement in the classification results over classification using only the audio features.
ISBN:	0769507506 9780769507507
ISSN:	1051-4651 2831-7475
DOI:	10.1109/ICPR.2000.903496