Automatic Severity Classification of Dysarthric Speech by Using Self-Supervised Model with Multi-Task Learning

Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using th...

Full description

Saved in:

Bibliographic Details
Published in:	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1 - 5
Main Authors:	Yeo, Eun Jung, Choi, Kwanghee, Kim, Sunhee, Chung, Minhwa
Format:	Conference Proceeding
Language:	English
Published:	IEEE 04-06-2023
Subjects:	Acoustics automatic assessment dysarthric speech Machine learning Magnetic heads multi-task learning Multitasking self-supervised learning Signal processing Support vector machines
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity classification and auxiliary automatic speech recognition (ASR). For the baseline experiments, we employ hand-crafted acoustic features and machine learning classifiers such as SVM, MLP, and XGBoost. Explored on the Korean dysarthric speech QoLT database, our model out-performs the traditional baseline methods, with a relative percentage increase of 1.25% for F1-score. In addition, the proposed model surpasses the model trained without ASR head, achieving 10.61% relative percentage improvements. Furthermore, we present how multi-task learning affects the severity classification performance by analyzing the latent representations and regularization effect.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49357.2023.10094605