Superpositional HMM-based intonation synthesis using a functional F0 model

This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F 0 ) contours in HMM-based speech synthesis. An F 0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted...

Full description

Saved in:
Bibliographic Details
Published in:The 9th International Symposium on Chinese Spoken Language Processing pp. 270 - 274
Main Authors: Jinfu Ni, Shiga, Yoshinori, Hori, Chiori
Format: Conference Proceeding
Language:English
Published: IEEE 01-09-2014
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F 0 ) contours in HMM-based speech synthesis. An F 0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted in the Fujisaki model. Separated context-dependent (CD) HMMs are trained for each type of components extracted from a speech corpus based on a functional F 0 model. At the phase of synthesis, CDHMM-generated micro, accent, and register components are superimposed to form F 0 contours for input text. Objective and subjective evaluations are carried out on a Japanese speech corpus. Compared with the conventional approach, this method not only demonstrates the improved performance in naturalness of synthetic speech by achieving better global F 0 behaviors but also shows its flexibility for intonation manipulation through modifying the functional model parameters.
DOI:10.1109/ISCSLP.2014.6936614