Training sound event detection with soft labels from crowdsourced annotations
In this paper, we study the use of soft labels to train a system for sound event detection (SED). Soft labels can result from annotations which account for human uncertainty about categories, or emerge as a natural representation of multiple opinions in annotation. Converting annotations to hard lab...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
28-02-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, we study the use of soft labels to train a system for sound
event detection (SED). Soft labels can result from annotations which account
for human uncertainty about categories, or emerge as a natural representation
of multiple opinions in annotation. Converting annotations to hard labels
results in unambiguous categories for training, at the cost of losing the
details about the labels distribution. This work investigates how soft labels
can be used, and what benefits they bring in training a SED system. The results
show that the system is capable of learning information about the activity of
the sounds which is reflected in the soft labels and is able to detect sounds
that are missed in the typical binary target training setup. We also release a
new dataset produced through crowdsourcing, containing temporally strong labels
for sound events in real-life recordings, with both soft and hard labels. |
---|---|
DOI: | 10.48550/arxiv.2302.14572 |