Real-Time Continuous Phoneme Recognition System Using Class-Dependent Tied-Mixture HMM With HBT Structure for Speech-Driven Lip-Sync

This work describes a real-time lip-sync method using which an avatar's lip shape is synchronized with the corresponding speech signal. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the Head-Body-Tail (HB...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia Vol. 10; no. 7; pp. 1299 - 1306
Main Authors:	PARK, Junho, KO, Hanseok
Format:	Journal Article
Language:	English
Published:	New York, NY IEEE 01-11-2008 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Acoustic signal processing Acoustics Applied sciences Avatars Cognition & reasoning Computer science; control theory; systems Computer systems and distributed systems. User interface Context modeling Exact sciences and technology Facial animation Fundamental areas of phenomenology (including applications) Gaussian Head-body-tail HMM Hearing aids Heterojunction bipolar transistors Hidden Markov models Mathematical models Multimedia Neural networks phoneme recognition Phonemes Physics Real time Real time systems real-time lip-sync Recognition Shape Software Speech Speech recognition Speech synthesis Tasks Vowels Vocabulary Speech articulation Voiced signal Continuous time Markov model Pattern recognition Context aware Speech synthesis Real time Modeling Real time system Head-body-tail HMM Lip Codebook Gaussian process Vocal signal real-time lip-sync User interface phoneme recognition Speech recognition Hidden Markov model Phonetics Phoneme
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This work describes a real-time lip-sync method using which an avatar's lip shape is synchronized with the corresponding speech signal. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the Head-Body-Tail (HBT) model is proposed for the purpose of more efficiently recognizing phonemes which are variously uttered due to co-articulation effects. The HBT model effectively deals with the transition parts of context-dependent models for small-sized vocabulary tasks. These models provide better recognition performance than general context-dependent or context-independent models for the task of digit or vowel recognition. Moreover, each phoneme is categorized into one among four classes and the class-dependent codebook is generated to further improve the performance. Additionally, for the clear representation of the context dependency information in the transient parts, some Gaussians are excluded from class-dependent codebook. The proposed method leads to a lip-sync system that performs at a level that is similar to previous designs based on HBT and continuous hidden Markov models (CHMMs). However, our method reduces the number of model parameters by one-third and enables real-time operation.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2008.2004908