GAFX: A General Audio Feature eXtractor

Most machine learning models for audio tasks are dealing with a handcrafted feature, the spectrogram. However, it is still unknown whether the spectrogram could be replaced with deep learning based features. In this paper, we answer this question by comparing the different learnable neural networks...

Full description

Saved in:
Bibliographic Details
Main Authors: Bu, Zhaoyang, Zhang, Hanhaodi, Zhu, Xiaohu
Format: Journal Article
Language:English
Published: 19-07-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Most machine learning models for audio tasks are dealing with a handcrafted feature, the spectrogram. However, it is still unknown whether the spectrogram could be replaced with deep learning based features. In this paper, we answer this question by comparing the different learnable neural networks extracting features with a successful spectrogram model and proposed a General Audio Feature eXtractor (GAFX) based on a dual U-Net (GAFX-U), ResNet (GAFX-R), and Attention (GAFX-A) modules. We design experiments to evaluate this model on the music genre classification task on the GTZAN dataset and perform a detailed ablation study of different configurations of our framework and our model GAFX-U, following the Audio Spectrogram Transformer (AST) classifier achieves competitive performance.
DOI:10.48550/arxiv.2207.09145