Improving Recognition for Disordered Speech in Indonesian Language: a Data Augmentation approach
For individuals with speech disorders, speech recognition serves as a crucial communication tool. But most speech recognition systems are trained on normal speech data, worsen by limited disordered speech data which mostly available in English language. Data augmentation is one possible approach, wh...
Saved in:
Published in: | 2023 3rd International Conference on Smart Cities, Automation & Intelligent Computing Systems (ICON-SONICS) pp. 207 - 211 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
06-12-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | For individuals with speech disorders, speech recognition serves as a crucial communication tool. But most speech recognition systems are trained on normal speech data, worsen by limited disordered speech data which mostly available in English language. Data augmentation is one possible approach, where we can apply signal processing such as speed perturbation to transform normal speech into disordered speech. In this work, we studied how augmented data composition in a dataset affects the system performance on recognizing disordered speech. Using QuartzNet CNN as the acoustic model, we evaluated the speech recognition system on an augmented dataset built based on an Indonesian language speech database from Mozilla Corpus. Using speed perturbation to generate disordered speech, the initial results show that augmenting normal speech dataset with 25-50% more disordered speech data could help improve the Word Error Rate (WER) of the model in recognizing disordered speech. This result, however, is limited as we use the same speed perturbation method for training and testing datasets. |
---|---|
DOI: | 10.1109/ICON-SONICS59898.2023.10435140 |