Improving CNN-based solutions for emotion recognition using evolutionary algorithms

AI-based approaches, especially deep learning have made remarkable achievements in Speech Emotion Recognition (SER). Needless to say, Convolutional Neural Networks (CNNs) have been the backbone of many of these solutions. Although the use of CNNs have resulted in high performing models, building the...

Full description

Saved in:
Bibliographic Details
Published in:Results in applied mathematics Vol. 18; p. 100360
Main Authors: Mohammadrezaei, Parsa, Aminan, Mohammad, Soltanian, Mohammad, Borna, Keivan
Format: Journal Article
Language:English
Published: Elsevier B.V 01-05-2023
Elsevier
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:AI-based approaches, especially deep learning have made remarkable achievements in Speech Emotion Recognition (SER). Needless to say, Convolutional Neural Networks (CNNs) have been the backbone of many of these solutions. Although the use of CNNs have resulted in high performing models, building them needs domain knowledge and direct human intervention. The same issue arises while improving a model. To solve this problem, we use techniques that were firstly introduced in Neural Architecture Search (NAS) and use a genetic process to search for models with improved accuracy. More specifically, we insert blocks with dynamic structures in between the layers of an already existing model and then use genetic operations (i.e. selection, mutation, and crossover) to find the best performing structures. To validate our method, we use this algorithm to improve architectures by searching on the Berlin Database of Emotional Speech (EMODB). The experimental results show at least 1.7% performance improvement in terms of Accuracy on EMODB test set.
ISSN:2590-0374
2590-0374
DOI:10.1016/j.rinam.2023.100360