Alaryngeal Speech Enhancement for Noisy Environments Using a Pareto Denoising Gated LSTM

Loss of the larynx significantly alters natural voice production, requiring alternative communication modalities and rehabilitation methods to restore speech intelligibility and improve the quality of life of affected individuals. This paper explores advances in alaryngeal speech enhancement to impr...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of voice
Main Authors:	Maskeliūnas, Rytis, Damaševičius, Robertas, Kulikajevas, Audrius, Pribuišis, Kipras, Uloza, Virgilijus
Format:	Journal Article
Language:	English
Published:	United States Elsevier Inc 05-08-2024
Subjects:	Deep learning Laryngeal carcinoma Pareto denoising gated LSTM Speech enhancement of the alaryngeal region Speech processing Deep learning Laryngeal carcinoma Speech enhancement of the alaryngeal region Speech processing Pareto denoising gated LSTM
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Loss of the larynx significantly alters natural voice production, requiring alternative communication modalities and rehabilitation methods to restore speech intelligibility and improve the quality of life of affected individuals. This paper explores advances in alaryngeal speech enhancement to improve signal quality and reduce background noise, focusing on individuals who have undergone laryngectomy. In this study, speech samples were obtained from 23 Lithuanian males who had undergone laryngectomy with secondary implantation of the tracheoesophageal prosthesis (TEP). Pareto-optimized gated long short-term memory was trained on tracheoesophageal speech data to recognize complex temporal connections and contextual information in speech signals. The system was able to distinguish between actual speech and various forms of noise and artifacts, resulting in a 25% drop in the mean signal-to-noise ratio compared to other approaches. According to acoustic analysis, the system significantly decreased the number of unvoiced frames (proportion of voiced frames) from 40% to 10% while maintaining stable proportions of voiced frames (proportion of voiced speech frames) and average voicing evidence (average voice evidence in voiced frames), indicating the accuracy of the approach in selectively attenuating noise and undesired speech artifacts while preserving important speech information. [Display omitted]
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0892-1997 1873-4588 1873-4588
DOI:	10.1016/j.jvoice.2024.07.016