Superpositional HMM-based intonation synthesis using a functional F0 model
This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F 0 ) contours in HMM-based speech synthesis. An F 0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted...
Saved in:
Published in: | The 9th International Symposium on Chinese Spoken Language Processing pp. 270 - 274 |
---|---|
Main Authors: | , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-09-2014
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F 0 ) contours in HMM-based speech synthesis. An F 0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted in the Fujisaki model. Separated context-dependent (CD) HMMs are trained for each type of components extracted from a speech corpus based on a functional F 0 model. At the phase of synthesis, CDHMM-generated micro, accent, and register components are superimposed to form F 0 contours for input text. Objective and subjective evaluations are carried out on a Japanese speech corpus. Compared with the conventional approach, this method not only demonstrates the improved performance in naturalness of synthetic speech by achieving better global F 0 behaviors but also shows its flexibility for intonation manipulation through modifying the functional model parameters. |
---|---|
DOI: | 10.1109/ISCSLP.2014.6936614 |