Non-Intrusive Binaural Prediction of Speech Intelligibility Based on Phoneme Classification

In this study, we explore an approach for modeling speech intelligibility in spatial acoustic scenes. To this end, we combine a non-intrusive binaural frontend with a deep neural network (DNN) borrowed from a standard automatic speech recognition (ASR) system. The DNN estimates phoneme probabilities...

Full description

Saved in:
Bibliographic Details
Published in:ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 396 - 400
Main Authors: Rosbach, Jana, Rottges, Saskia, Hauth, Christopher F., Brand, Thomas, Meyer, Bernd T.
Format: Conference Proceeding
Language:English
Published: IEEE 06-06-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this study, we explore an approach for modeling speech intelligibility in spatial acoustic scenes. To this end, we combine a non-intrusive binaural frontend with a deep neural network (DNN) borrowed from a standard automatic speech recognition (ASR) system. The DNN estimates phoneme probabilities that degrade in the presence of noise and reverberation, which is quantified with an entropy-based measure. The model output is used to predict speech recognition thresholds, i.e., signal-to-noise ratio with 50% word recognition accuracy. It is compared to measured data obtained from eight normal-hearing listeners in acoustic scenarios with varying positions of localized maskers, different rooms and reverberation times. The model is non-intrusive; yet it produces a root mean squared error in the range of 0.6-2.1 dB, which is similar to results obtained with a reference model (0.3-1.8 dB) that uses oracle knowledge both in the frontend and in the backend stage.
ISSN:2379-190X
DOI:10.1109/ICASSP39728.2021.9413874