Assessing the benefits of virtual speaker lateralization for binaural speech intelligibility over the Internet
•Speech intelligibility tested with virtual speaker lateralization over the Internet.•Intelligibility score obtained from Levenshtein distance and word error metrics.•Speech intelligibility improves with virtual speaker lateralization over the Internet. Binaural speech intelligibility tests using he...
Saved in:
Published in: | Applied acoustics Vol. 202; p. 109146 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
Elsevier Ltd
01-01-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Speech intelligibility tested with virtual speaker lateralization over the Internet.•Intelligibility score obtained from Levenshtein distance and word error metrics.•Speech intelligibility improves with virtual speaker lateralization over the Internet.
Binaural speech intelligibility tests using headphones were conducted remotely through the Internet using virtual speaker lateralization at azimuth angles around the front within ±45 degrees, combined with babble noise with signal to noise ratios (SNR) from -12 to -4 dB. Monophonic speech recordings, selected from the publicly available Mozilla Common Voice speech corpus in Spanish, were binaurally processed using a database of generic Head-Related Transfer Functions (HRTF) to produce virtual speaker lateralization. A common signal of babble noise was mixed to both the left and right binaural signals, with different SNR. Speech intelligibility tests were conducted through a Google Forms questionnaire containing Internet links to YouTube videos embedding the corresponding test audio files. Symbolic text difference metrics: Levenshtein distance, and word error rate (WER), commonly used in the field of automatic speech recognition (ASR), were used to automatically calculate estimations of speech intelligibility scores, more conventionally used for (human) speech intelligibility research at the word level, together with microscopic speech intelligibility scores at the phonetic symbol level. Speech reception thresholds (SRT), and intelligibility slopes were determined, showing that the known beneficial effect of spatial release from masking (SRM) in improving speech intelligibility when the speaker is oriented at lateral azimuth angles relative to the listener, is preserved when using virtual speaker lateralization, and by the use of Internet transmitted audio using headphones. Results show that for azimuth angles of 20 and 30 degrees, left or right, speech intelligibility improves with an average unmasking benefit of 3.7±0.7 dB SRM, and an average intelligibility slope of 5.6±0.8%/dB, while also maintaining at these angles, a desirable impression of a virtual speaker with a reasonably frontal orientation. This provides a useful technique for improved Internet speech delivery. |
---|---|
ISSN: | 0003-682X 1872-910X |
DOI: | 10.1016/j.apacoust.2022.109146 |