Non-Intrusive Binaural Prediction of Speech Intelligibility Based on Phoneme Classification
In this study, we explore an approach for modeling speech intelligibility in spatial acoustic scenes. To this end, we combine a non-intrusive binaural frontend with a deep neural network (DNN) borrowed from a standard automatic speech recognition (ASR) system. The DNN estimates phoneme probabilities...
Saved in:
Published in: | ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 396 - 400 |
---|---|
Main Authors: | , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
06-06-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this study, we explore an approach for modeling speech intelligibility in spatial acoustic scenes. To this end, we combine a non-intrusive binaural frontend with a deep neural network (DNN) borrowed from a standard automatic speech recognition (ASR) system. The DNN estimates phoneme probabilities that degrade in the presence of noise and reverberation, which is quantified with an entropy-based measure. The model output is used to predict speech recognition thresholds, i.e., signal-to-noise ratio with 50% word recognition accuracy. It is compared to measured data obtained from eight normal-hearing listeners in acoustic scenarios with varying positions of localized maskers, different rooms and reverberation times. The model is non-intrusive; yet it produces a root mean squared error in the range of 0.6-2.1 dB, which is similar to results obtained with a reference model (0.3-1.8 dB) that uses oracle knowledge both in the frontend and in the backend stage. |
---|---|
ISSN: | 2379-190X |
DOI: | 10.1109/ICASSP39728.2021.9413874 |