A Roman Urdu Corpus for sentiment analysis

Abstract Sentiment analysis is a dynamic field focused on understanding and predicting emotional sentiments in text or images. With the prevalence of smartphones, e-commerce and social networks, individuals readily express opinions, aiding businesses, political analysts and organizations in decision...

Full description

Saved in:
Bibliographic Details
Published in:Computer journal Vol. 67; no. 9; pp. 2864 - 2876
Main Authors: Khan, Marwa, Naseer, Asma, Wali, Aamir, Tamoor, Maria
Format: Journal Article
Language:English
Published: Oxford University Press 09-10-2024
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Sentiment analysis is a dynamic field focused on understanding and predicting emotional sentiments in text or images. With the prevalence of smartphones, e-commerce and social networks, individuals readily express opinions, aiding businesses, political analysts and organizations in decision-making. Despite extensive research in sentiment analysis for various languages, challenges persist in low-resource languages like Roman Urdu. Roman Urdu, the use of Roman script to write Urdu, has gained popularity, yet limited linguistic resources hinder sentiment analysis research. This study addresses this gap by developing a bidirectional long short-term memory network with FastText embeddings and additional layers. A large Roman Urdu corpus for sentiment analysis, consisting of over 51 000 reviews, is crated and the proposed model is trained and compared with 14 other models, demonstrating an accuracy of 0.854 and an F1-score of 0.84.
ISSN:0010-4620
1460-2067
DOI:10.1093/comjnl/bxae052