Prodorshok I: A bengali isolated speech dataset for voice-based assistive technologies: A comparative analysis of the effects of data augmentation on HMM-GMM and DNN classifiers

Prodorshok I is a Bengali isolated word dataset tailored to help create speaker-independent, voice-command driven automated speech recognition (ASR) based assistive technologies to help improve human-computer interaction (HCI). This paper presents the results of an objective analysis that was undert...

Full description

Saved in:

Bibliographic Details
Published in:	2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC) pp. 396 - 399
Main Authors:	Reza, Mohi, Rashid, Warida, Mostakim, Moin
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-12-2017
Subjects:	Assistive Technology Automatic Speech Recognition Bengali Deep Neural Network Discrete Fourier transforms Feature extraction Filter banks Gaussian Mixture Model Hidden Markov Model Hidden Markov models Human Computer Interaction Mel frequency cepstral coefficient Speech Speech recognition
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Prodorshok I is a Bengali isolated word dataset tailored to help create speaker-independent, voice-command driven automated speech recognition (ASR) based assistive technologies to help improve human-computer interaction (HCI). This paper presents the results of an objective analysis that was undertaken using a subset of words from Prodorshok I to assess its reliability in ASR systems that utilize Hidden Markov Models (HMM) with Gaussian emissions and Deep Neural Networks (DNN). The results show that simple data augmentation involving a small pitch shift can make surprisingly tangible improvements to accuracy levels in speech recognition.
ISSN:	2572-7621
DOI:	10.1109/R10-HTC.2017.8288983