A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction

We propose a new framework for joint multichannel speech source separation and acoustic noise reduction. In this framework, we start by formulating the minimum-mean-square error (MMSE)-based solution in the context of multiple simultaneous speakers and background noise, and outline the importance of...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on audio, speech, and language processing Vol. 21; no. 9; pp. 1913 - 1928
Main Authors:	Souden, Mehrez, Araki, Shoko, Kinoshita, Keisuke, Nakatani, Tomohiro, Sawada, Hiroshi
Format:	Journal Article
Language:	English
Published:	Piscataway, NJ IEEE 01-09-2013 Institute of Electrical and Electronics Engineers
Subjects:	Applied sciences Blind source separation Detection, estimation, filtering, equalization, prediction Estimation Exact sciences and technology Information, signal and communications theory Masking Mathematical models Maximization microphone arrays Microphones minimum variance distortionless response minimum-mean-square error Multichannel Noise Noise measurement Noise reduction Separation Signal and communications theory Signal processing Signal, noise Spectra Speech Speech processing Statistics Telecommunications and information theory Vectors Wiener filter Second order Multichannel recording Parameter estimation Source separation Noise reduction Signal estimation Blind source separation Background noise Mean square error Multichannel filter Wiener filter Audio signal Acoustic noise Speech enhancement minimum variance distortionless response Signal detection Additive noise microphone arrays Order statistic Statistical method Minimal variance Posterior probability Vocal signal Linear filter minimum-mean-square error Multiple channel Speech processing
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We propose a new framework for joint multichannel speech source separation and acoustic noise reduction. In this framework, we start by formulating the minimum-mean-square error (MMSE)-based solution in the context of multiple simultaneous speakers and background noise, and outline the importance of the estimation of the activities of the speakers. The latter is accurately achieved by introducing a latent variable that takes N+1 possible discrete states for a mixture of N speech signals plus additive noise. Each state characterizes the dominance of one of the N+1 signals. We determine the posterior probability of this latent variable, and show how it plays a twofold role in the MMSE-based speech enhancement. First, it allows the extraction of the second order statistics of the noise and each of the speech signals from the noisy data. These statistics are needed to formulate the multichannel Wiener-based filters (including the minimum variance distortionless response). Second, it weighs the outputs of these linear filters to shape the spectral contents of the signals' estimates following the associated target speakers' activities. We use the spatial and spectral cues contained in the multichannel recordings of the sound mixtures to compute the posterior probability of this latent variable. The spatial cue is acquired by using the normalized observation vector whose distribution is well approximated by a Gaussian-mixture-like model, while the spectral cue can be captured by using a pre-trained Gaussian mixture model for the log-spectra of speech. The parameters of the investigated models and the speakers' activities (posterior probabilities of the different states of the latent variable) are estimated via expectation maximization. Experimental results including comparisons with the well-known independent component analysis and masking are provided to demonstrate the efficiency of the proposed framework.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1558-7916 1558-7924
DOI:	10.1109/TASL.2013.2263137