Towards a multilingual prosody model for text-to-speech
The generation of prosodic parameters such as F0 contour, duration and intensity still remains an important issue for naturally-sounding text-to-speech (TTS), although recently developed TTS systems have achieved a considerable progress. Several appropriate but language-specific rule-based, statisti...
Saved in:
Published in: | 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing Vol. 1; pp. I-421 - I-424 |
---|---|
Main Authors: | , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-05-2002
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The generation of prosodic parameters such as F0 contour, duration and intensity still remains an important issue for naturally-sounding text-to-speech (TTS), although recently developed TTS systems have achieved a considerable progress. Several appropriate but language-specific rule-based, statistical or data-driven prosody models have been successfully realized in many systems. The language and parameter dependent models lead to a more complex and inefficient TTS system design. In earlier works the authors proposed a hybrid data-driven and rule-based model, which can adjust different voices or speaking styles by learning and predicting prosodic parameters. The current paper discusses the multilingual model generalization and the design of appropriate prosodic databases. Exemplary, two different languages: German and Mandarin Chinese are examined. Prediction results and perceptual evaluation with respect to F0 contours and duration values are presented. Since the perceptual results of both languages are comparable and quite satisfying, the model is qualified for the multilingual prosody control. Resynthesis stimuli obtained from modified prosodic parameters partly achieve near-to-natural mean opinion scores (MOS) above 4.0. The introduced hybrid data-driven and rule-based model is comparatively simple and enables a multilingual prosody control in TTS. |
---|---|
ISBN: | 9780780374027 0780374029 |
ISSN: | 1520-6149 2379-190X |
DOI: | 10.1109/ICASSP.2002.5743744 |