Supervised Metric Learning for Music Structure Features
Music structure analysis (MSA) methods traditionally search for musically meaningful patterns in audio: homogeneity, repetition, novelty, and segment-length regularity. Hand-crafted audio features such as MFCCs or chromagrams are often used to elicit these patterns. However, with more annotations of...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
17-10-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Music structure analysis (MSA) methods traditionally search for musically
meaningful patterns in audio: homogeneity, repetition, novelty, and
segment-length regularity. Hand-crafted audio features such as MFCCs or
chromagrams are often used to elicit these patterns. However, with more
annotations of section labels (e.g., verse, chorus, and bridge) becoming
available, one can use supervised feature learning to make these patterns even
clearer and improve MSA performance. To this end, we take a supervised metric
learning approach: we train a deep neural network to output embeddings that are
near each other for two spectrogram inputs if both have the same section type
(according to an annotation), and otherwise far apart. We propose a batch
sampling scheme to ensure the labels in a training pair are interpreted
meaningfully. The trained model extracts features that can be used in existing
MSA algorithms. In evaluations with three datasets (HarmonixSet, SALAMI, and
RWC), we demonstrate that using the proposed features can improve a traditional
MSA algorithm significantly in both intra- and cross-dataset scenarios. |
---|---|
DOI: | 10.48550/arxiv.2110.09000 |