Implementation of Constant-Q Transform (CQT) and Mel Spectrogram to converting Bird's Sound

Classification of bird sounds can be done in various methods and ways. One method that can be used is CNN (Convolutional Neural Network). CNN is an algorithm used for image classification. For bird sounds to be classified by CNN, conversion from analogue sound to digital images is required objective...

Full description

Saved in:
Bibliographic Details
Published in:2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT) pp. 52 - 56
Main Authors: Dian Handy Permana, Silvester, Bayu Yogha Bintoro, Ketut
Format: Conference Proceeding
Language:English
Published: IEEE 17-07-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Classification of bird sounds can be done in various methods and ways. One method that can be used is CNN (Convolutional Neural Network). CNN is an algorithm used for image classification. For bird sounds to be classified by CNN, conversion from analogue sound to digital images is required objectively and accurately. This study will discuss the conversion of analogue sound from birds into spectrogram images using one of Constant-Q Transform (CQT) and Mel Spectrogram. Bird voices are recorded using a voice recorder. The recorded voice will represent the audio signal digitally. Constant-Q Transform will map the audio signal from a time domain to a frequency domain. The frequency will be converted into a log scale and the colour dimensions (amplitude) into decibels to form a spectrogram. The spectrogram will be mapped on a mel scale to form a mel spectrogram. This research is the change of bird's voice analogously to mel spectrogram, classified in CNN. The resulting images from this study can be classified using CNN to help classify bird sounds.
DOI:10.1109/COMNETSAT53002.2021.9530779