GAFX: A General Audio Feature eXtractor
Most machine learning models for audio tasks are dealing with a handcrafted feature, the spectrogram. However, it is still unknown whether the spectrogram could be replaced with deep learning based features. In this paper, we answer this question by comparing the different learnable neural networks...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
19-07-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Most machine learning models for audio tasks are dealing with a handcrafted
feature, the spectrogram. However, it is still unknown whether the spectrogram
could be replaced with deep learning based features. In this paper, we answer
this question by comparing the different learnable neural networks extracting
features with a successful spectrogram model and proposed a General Audio
Feature eXtractor (GAFX) based on a dual U-Net (GAFX-U), ResNet (GAFX-R), and
Attention (GAFX-A) modules. We design experiments to evaluate this model on the
music genre classification task on the GTZAN dataset and perform a detailed
ablation study of different configurations of our framework and our model
GAFX-U, following the Audio Spectrogram Transformer (AST) classifier achieves
competitive performance. |
---|---|
DOI: | 10.48550/arxiv.2207.09145 |