An Interpretable and Generalizable Speech Detector Based on a CNN-LSTM Framework
Speech brain-computer interface (speech BCI) aims to reconstruct speech from recorded brain signals. Real-time speech BCI relies on speech detection, which is greatly impacted by the selection of speech-related neural frequency features. However, most studies did not investigate this aspect when des...
Saved in:
Published in: | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 13231 - 13235 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
14-04-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Speech brain-computer interface (speech BCI) aims to reconstruct speech from recorded brain signals. Real-time speech BCI relies on speech detection, which is greatly impacted by the selection of speech-related neural frequency features. However, most studies did not investigate this aspect when designing speech detectors. In this study, both electrocorticography (ECoG) dataset and stereo-electroencephalography (sEEG) dataset were utilized to investigate the impact of brain signal type on the contribution of frequency bands to speech detection. We calculated the mutual information (MI) between neural frequency bands and the audio envelope and found that the distributions of frequency bands varied between the two types of brain signals. Specifically, the 40-60Hz of ECoG signal and 0-20Hz of sEEG signal got the highest MI values. To address this, we propose a two-module detector that combines convolutional neural networks and long short-term memory (CNN-LSTM) for feature extraction and speech prediction. Our detector outperformed three commonly used detectors, including Linear discriminant analysis (LDA), Support Vector Machine (SVM), and LSTM. Notably, a high correlation was found between CNN output and the frequency bands, and high MI values were observed in both types of brain signals. These findings confirm the interpretability and generalizability of our proposed speech detector. |
---|---|
ISSN: | 2379-190X |
DOI: | 10.1109/ICASSP48485.2024.10445835 |