Creation of Marathi speech corpus for automatic speech recognition

This paper describes the collection of audio corpus for Marathi language. Marathi is one of the regional Indian languages. There are 12 vowels and 36 consonants present in Marathi languages. The objective of the research is to create the speech corpus which can be used for automatic speech recogniti...

Full description

Saved in:
Bibliographic Details
Published in:2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) pp. 1 - 5
Main Authors: Gaikwad, Santosh, Gawali, Bharti, Mehrotra, Suresh
Format: Conference Proceeding
Language:English
Published: IEEE 01-11-2013
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper describes the collection of audio corpus for Marathi language. Marathi is one of the regional Indian languages. There are 12 vowels and 36 consonants present in Marathi languages. The objective of the research is to create the speech corpus which can be used for automatic speech recognition system for various domains like telephonic inquiry system, teaching tutor etc. The size of corpus collected is 28420 isolated words and 17470 sentences from around 500 speakers. The speech utterances were recorded in 16 kHz in three recording medium, a headset, desktop mounted microphone and Mobile phone. The corpus is transcripted as well as annotated and is available for recognition system.
DOI:10.1109/ICSDA.2013.6709893