Enabling Smart Mobility Features Using Spectrogram Images and Convolutional Neural Networks
Pitch (also called F0 or fundamental frequency) is a very important voice feature for smart mobility features, such as driver emotion detection, vehicle personalized profiles, and secured speaker identification. This paper presents a novel approach to detect F0 through Convolutional Neural Networks...
Saved in:
Published in: | 2024 IEEE International Conference on Smart Mobility (SM) pp. 105 - 109 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
16-09-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Pitch (also called F0 or fundamental frequency) is a very important voice feature for smart mobility features, such as driver emotion detection, vehicle personalized profiles, and secured speaker identification. This paper presents a novel approach to detect F0 through Convolutional Neural Networks (CNN) and image processing techniques to directly estimate pitch from spectrogram images. Our new approach demonstrates very good detection accuracy; a total of 92% of predicted pitch contours have strong or moderate correlations to the true pitch contours. Furthermore, the experimental comparison between our approach and other state-of-the-art CNN methods reveals that our approach can increase detection accuracy by 3~5% (percentage points) across various Signal-toNoise Ratio (SNR) conditions. |
---|---|
DOI: | 10.1109/SM63044.2024.10733384 |