Classification of Indian media titles using deep learning techniques

•The goal of the paper is to identify the language of Indian media titles (Song names, movie titles) for automatic speech recognition training data.•Transliterated data of songs and movie titles were scraped from various sources like Kaggle, Wikipedia, IMDB, spotify and more.•We built classifier mod...

Full description

Saved in:
Bibliographic Details
Published in:International journal of cognitive computing in engineering Vol. 3; pp. 114 - 123
Main Authors: Kumar, Sujit, Rajesh, Devesh D, Pranesh, Sarthak, Kollipara, V N Hemanth, Agrawal, Gopal Kumar, Anbarasi, M, J, Valarmathi
Format: Journal Article
Language:English
Published: Elsevier B.V 01-06-2022
KeAi Communications Co., Ltd
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•The goal of the paper is to identify the language of Indian media titles (Song names, movie titles) for automatic speech recognition training data.•Transliterated data of songs and movie titles were scraped from various sources like Kaggle, Wikipedia, IMDB, spotify and more.•We built classifier models using various machine learning and deep learning techniques to achieve this purpose.•Used N-gram sequences and BERT tokens as inputs for our classification models.•Models used - SVM, LSTM, ANN, transfer learning using MuRIL.•Compared results and analyzed each model in detail.•Result - Achieved an accuracy of 92% on our best model (ANN using N-gram sequences). Automatic speech recognition is being used everywhere these days. An essential part of this is language identification. Our goal here is to identify the language of the media title, such as song names and movie titles, to help in speech recognition. The focus here is to classify solely using the title of the media without any additional data in their transliterated form to classify them into their original native language using natural language processing, machine learning, and deep learning techniques. Transliterated titles of the song and movie names are being used. This work explores and implements various natural language processing and machine learning methods such as N-grams, SVMs, LSTMs, and MuRIL to classify the text titles according to their language. The results of various implementations are compared and contrasted as an approach of its own to classify the data.
ISSN:2666-3074
2666-3074
DOI:10.1016/j.ijcce.2022.04.001