Classification of Indian media titles using deep learning techniques
•The goal of the paper is to identify the language of Indian media titles (Song names, movie titles) for automatic speech recognition training data.•Transliterated data of songs and movie titles were scraped from various sources like Kaggle, Wikipedia, IMDB, spotify and more.•We built classifier mod...
Saved in:
Published in: | International journal of cognitive computing in engineering Vol. 3; pp. 114 - 123 |
---|---|
Main Authors: | , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Elsevier B.V
01-06-2022
KeAi Communications Co., Ltd |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •The goal of the paper is to identify the language of Indian media titles (Song names, movie titles) for automatic speech recognition training data.•Transliterated data of songs and movie titles were scraped from various sources like Kaggle, Wikipedia, IMDB, spotify and more.•We built classifier models using various machine learning and deep learning techniques to achieve this purpose.•Used N-gram sequences and BERT tokens as inputs for our classification models.•Models used - SVM, LSTM, ANN, transfer learning using MuRIL.•Compared results and analyzed each model in detail.•Result - Achieved an accuracy of 92% on our best model (ANN using N-gram sequences).
Automatic speech recognition is being used everywhere these days. An essential part of this is language identification. Our goal here is to identify the language of the media title, such as song names and movie titles, to help in speech recognition. The focus here is to classify solely using the title of the media without any additional data in their transliterated form to classify them into their original native language using natural language processing, machine learning, and deep learning techniques. Transliterated titles of the song and movie names are being used. This work explores and implements various natural language processing and machine learning methods such as N-grams, SVMs, LSTMs, and MuRIL to classify the text titles according to their language. The results of various implementations are compared and contrasted as an approach of its own to classify the data. |
---|---|
ISSN: | 2666-3074 2666-3074 |
DOI: | 10.1016/j.ijcce.2022.04.001 |