Lambda-vector modeling temporal and channel interactions for text-independent speaker verification

Most of the current excellent models in speaker verification are ResNet-based deep models and attention-based models. These models have a general weakness, which is the large number of parameters and high hardware requirements. On the other hand, many deep structures only generate embedding features...

Full description

Saved in:

Bibliographic Details
Published in:	Scientific reports Vol. 12; no. 1; p. 18171
Main Authors:	Wei, Guangcun, Min, Hang, Xu, Yunfei, Zhang, Yanna
Format:	Journal Article
Language:	English
Published:	London Nature Publishing Group UK 28-10-2022 Nature Publishing Group Nature Portfolio
Subjects:	639/166 639/705 Algorithms Deep learning Embedding Humanities and Social Sciences multidisciplinary Neural networks Science Science (multidisciplinary)
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Most of the current excellent models in speaker verification are ResNet-based deep models and attention-based models. These models have a general weakness, which is the large number of parameters and high hardware requirements. On the other hand, many deep structures only generate embedding features from the features extracted by the last frame-level layer, which causes shallow features and channel-related features to be ignored. To solve these problems, this paper proposed a shallow speaker verification model based on Lambda-vector, its main structure is composed of three Lambda-SE modules. The module extracts long-distance dependencies between frame-level features and channel-related interaction information to enhance representation of features. Meanwhile, so that adequately mine the information in deep and shallow features, the model introduces multi-layer feature aggregation to fuse the features of different frame-level layers together. It can increase the detailed information in the deep features and improve the model's ability to represent complex information. The experimental results on the public datasets Voxceleb1 and Voxceleb2 show that the model has more stable training speed, fewer model parameters, and better identification performances than baseline models.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-022-22977-5