Nearest neighbor based i-vector normalization for robust speaker recognition under unseen channel conditions

Many state-of-the-art speaker recognition engines use i-vectors to represent variable-length acoustic signals in a fixed low-dimensional total variability subspace. While such systems perform well under seen channel conditions, their performance greatly degrades under unseen channel scenarios. Accor...

Full description

Saved in:
Bibliographic Details
Published in:2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 4684 - 4688
Main Authors: Weizhong Zhu, Sadjadi, Seyed Omid, Pelecanos, Jason W.
Format: Conference Proceeding
Language:English
Published: IEEE 01-04-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many state-of-the-art speaker recognition engines use i-vectors to represent variable-length acoustic signals in a fixed low-dimensional total variability subspace. While such systems perform well under seen channel conditions, their performance greatly degrades under unseen channel scenarios. Accordingly, rapid adaptation of i-vector systems to unseen conditions has recently attracted significant research effort from the community. To mitigate this mismatch, in this paper we propose nearest neighbor based i-vector mean normalization (NN-IMN) and i-vector smoothing (IS) for unsupervised adaptation to unseen channel conditions within a state-of-the-art i-vector/PLDA speaker verification framework. A major advantage of the approach is its ability to handle multiple unseen channels without explicit retraining or clustering. Our observations on the DARPA Robust Automatic Transcription of Speech (RATS) speaker recognition task suggest that part of the distortion caused by an unseen channel may be modeled as an offset in the i-vector space. Hence, the proposed nearest neighbor based normalization technique is formulated to compensate for such a shift. Experimental results with the NN based normalized i-vectors indicate that, on average, we can recover 46% of the total performance degradation due to unseen channel conditions.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2015.7178859