Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing

We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean sig...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on audio, speech, and language processing Vol. 21; no. 5; pp. 983 - 997
Main Authors:	Nickel, R. M., Astudillo, R. F., Kolossa, D., Martin, R.
Format:	Journal Article
Language:	English
Published:	Piscataway, NJ IEEE 01-05-2013 Institute of Electrical and Electronics Engineers
Subjects:	Applied sciences Background noise Cepstral analysis Coding, codes Exact sciences and technology Information, signal and communications theory Inventory-style speech enhancement Miscellaneous modified imputation Nickel Noise Recognition Sampling, quantization Signal and communications theory Signal processing Smoothing Sound filters Speech Speech enhancement Speech processing Speech recognition Telecommunications and information theory Uncertainty uncertainty-of-observation techniques Performance evaluation Mixture theory Background noise Modeling Sound recognition Learning Audio signal User interface Speech enhancement Corpus based approach uncertainty-of-observation techniques Phoneme Inventory-style speech enhancement Vector quantization Objective analysis Man machine dialogue Decoding modified imputation Cepstral analysis Gaussian process Vocal signal Quality control Speech recognition Speech processing Signal to noise ratio
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiao's method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1558-7916 1558-7924
DOI:	10.1109/TASL.2013.2243434