Transfer Linear Subspace Learning for Cross-Corpus Speech Emotion Recognition

Speech emotion recognition has received an increasing interest in recent years, which is often conducted on the assumption that speech utterances in training and testing datasets are obtained under the same conditions. However, in reality, this assumption does not hold as the speech data are often c...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on affective computing Vol. 10; no. 2; pp. 265 - 275
Main Author: Song, Peng
Format: Journal Article
Language:English
Published: Piscataway IEEE 01-04-2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Speech emotion recognition has received an increasing interest in recent years, which is often conducted on the assumption that speech utterances in training and testing datasets are obtained under the same conditions. However, in reality, this assumption does not hold as the speech data are often collected from different devices or environments. Hence, there exists discrepancy between the training and testing data, which will have an adverse effect on recognition performance. In this paper, we examine the problem of cross-corpus speech emotion recognition. To address it, we present a novel transfer linear subspace learning (TLSL) framework to learn a common feature subspace for source and target datasets. In TLSL, a nearest neighbor graph algorithm is used to measure the similarity between different corpora, and a feature grouping strategy is introduced to divide the emotional features into two categories, i.e., high transferable part (HTP) versus low transferable part (LTP). To explore the proposed TLSL with different scenarios, we propose two kinds of TLSL approaches, called transfer unsupervised linear subspace learning (TULSL) and transfer supervised linear subspace learning (TSLSL), and provide the corresponding solutions for the optimization problems. Extensive experiments on several benchmark datasets validate the effectiveness of TLSL for cross-corpus speech emotion recognition.
ISSN:1949-3045
1949-3045
DOI:10.1109/TAFFC.2017.2705696