Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification

Gaussian mixture model with universal background model (GMM–UBM) is a standard reference classifier in speaker verification. We have recently proposed a simplified model using vector quantization (VQ–UBM). In this study, we extensively compare these two classifiers on NIST 2005, 2006 and 2008 SRE co...

Full description

Saved in:

Bibliographic Details
Published in:	Pattern recognition letters Vol. 30; no. 4; pp. 341 - 347
Main Authors:	Kinnunen, Tomi, Saastamoinen, Juhani, Hautamäki, Ville, Vinni, Mikko, Fränti, Pasi
Format:	Journal Article
Language:	English
Published:	Amsterdam Elsevier B.V 01-03-2009 Elsevier
Subjects:	Applied sciences Coding, codes Exact sciences and technology Gaussian mixture model (GMM) Information, signal and communications theory MFCC features Sampling, quantization Signal and communications theory Signal processing Signal representation. Spectral analysis Signal, noise Speaker verification Speech processing Support vector machine (SVM) Telecommunications and information theory Universal background model (UBM) Vector quantization (VQ) Universal background model (UBM) Support vector machine (SVM) Gaussian mixture model (GMM) Vector quantization (VQ) Speaker verification MFCC features Performance evaluation Discriminant analysis Automatic classification Vector quantization Background Data compression Mixture theory Support vector machine Speaker recognition Signal classification Learning Credal approach Cepstral analysis Gaussian process A posteriori estimation Maximum likelihood Speech processing
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Gaussian mixture model with universal background model (GMM–UBM) is a standard reference classifier in speaker verification. We have recently proposed a simplified model using vector quantization (VQ–UBM). In this study, we extensively compare these two classifiers on NIST 2005, 2006 and 2008 SRE corpora, while having a standard discriminative classifier (GLDS–SVM) as a point of reference. We focus on parameter setting for N-top scoring, model order, and performance for different amounts of training data. The most interesting result, against a general belief, is that GMM–UBM yields better results for short segments whereas VQ–UBM is good for long utterances. The results also suggest that maximum likelihood training of the UBM is sub-optimal, and hence, alternative ways to train the UBM should be considered.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2008.11.007