Resonance-based spectral deformation in HMM-based speech synthesis

Speech quality in statistical parametric speech synthesis relies on a sufficiency of acoustical features involved in training samples. This paper presents a spectral deformation method by using spectral-spatial information to expand the density space of acoustical features when limited training samp...

Full description

Saved in:
Bibliographic Details
Published in:2012 8th International Symposium on Chinese Spoken Language Processing pp. 88 - 92
Main Authors: Jinfu Ni, Shiga, Y., Kawai, H., Kashioka, H.
Format: Conference Proceeding
Language:English
Published: IEEE 01-12-2012
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Speech quality in statistical parametric speech synthesis relies on a sufficiency of acoustical features involved in training samples. This paper presents a spectral deformation method by using spectral-spatial information to expand the density space of acoustical features when limited training samples are available. It makes observed mel-cepstra diffused in a resonance field and achieves multiple spectral variants subject to a resonance mechanism. A statistical contribution of the mel-cepstral variants takes the place of the original while building HMM-based voices. Preliminary speech synthesis experiments are carried out in Chinese and Japanese. The experimental results indicate that the proposed method is able to improve potential discontinuity and enhance speech formants for noise reduction while achieving at least as good MOS quality as using the original.
ISBN:1467325066
9781467325066
DOI:10.1109/ISCSLP.2012.6423478