Superpositional HMM-based intonation synthesis using a functional F0 model

This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F 0 ) contours in HMM-based speech synthesis. An F 0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted...

Full description

Saved in:

Bibliographic Details
Published in:	The 9th International Symposium on Chinese Spoken Language Processing pp. 270 - 274
Main Authors:	Jinfu Ni, Shiga, Yoshinori, Hori, Chiori
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-09-2014
Subjects:	Correlation Frequency synthesizers functional F0 model Hidden Markov models HMM-based speech synthesis Intonation synthesis making focal prominence prosody Registers Speech Speech synthesis Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F 0 ) contours in HMM-based speech synthesis. An F 0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted in the Fujisaki model. Separated context-dependent (CD) HMMs are trained for each type of components extracted from a speech corpus based on a functional F 0 model. At the phase of synthesis, CDHMM-generated micro, accent, and register components are superimposed to form F 0 contours for input text. Objective and subjective evaluations are carried out on a Japanese speech corpus. Compared with the conventional approach, this method not only demonstrates the improved performance in naturalness of synthetic speech by achieving better global F 0 behaviors but also shows its flexibility for intonation manipulation through modifying the functional model parameters.
DOI:	10.1109/ISCSLP.2014.6936614