Enabling Smart Mobility Features Using Spectrogram Images and Convolutional Neural Networks

Pitch (also called F0 or fundamental frequency) is a very important voice feature for smart mobility features, such as driver emotion detection, vehicle personalized profiles, and secured speaker identification. This paper presents a novel approach to detect F0 through Convolutional Neural Networks...

Full description

Saved in:
Bibliographic Details
Published in:2024 IEEE International Conference on Smart Mobility (SM) pp. 105 - 109
Main Authors: Zhao, Xu Fang, Tsimhoni, Omer
Format: Conference Proceeding
Language:English
Published: IEEE 16-09-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Pitch (also called F0 or fundamental frequency) is a very important voice feature for smart mobility features, such as driver emotion detection, vehicle personalized profiles, and secured speaker identification. This paper presents a novel approach to detect F0 through Convolutional Neural Networks (CNN) and image processing techniques to directly estimate pitch from spectrogram images. Our new approach demonstrates very good detection accuracy; a total of 92% of predicted pitch contours have strong or moderate correlations to the true pitch contours. Furthermore, the experimental comparison between our approach and other state-of-the-art CNN methods reveals that our approach can increase detection accuracy by 3~5% (percentage points) across various Signal-toNoise Ratio (SNR) conditions.
DOI:10.1109/SM63044.2024.10733384