Attention-based Bilateral LSTM-CNN for the Sentiment Analysis of Code-mixed Filipino-English Social Media Texts
The prevalence of code-mixed texts in social networks presents a challenge in Sentiment Analysis. Most existing methods are generally trained on monolingual samples, which can't be used to analyze sentiments in a code-mixed text. Also, most existing models use classical machine learning and lex...
Saved in:
Published in: | 2023 International Conference on Digital Applications, Transformation & Economy (ICDATE) pp. 1 - 5 |
---|---|
Main Authors: | , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
14-07-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The prevalence of code-mixed texts in social networks presents a challenge in Sentiment Analysis. Most existing methods are generally trained on monolingual samples, which can't be used to analyze sentiments in a code-mixed text. Also, most existing models use classical machine learning and lexicon-based approaches, producing low-accuracy results. This study examines the sentiment classification of code-mixed Filipino-English texts from social networking sites and synthetic data. The deep-learning approach was used to build a pioneering bilingual sentiment analysis model implementing Convolutional Neural Network and Bi-LSTM with an attention mechanism. In this study, the researchers initially gathered and developed two datasets (datasets from Higher Education Institutions (HEIs) stakeholders' feedback and synthetic data). Synthetic data was used to address language resource scarcity. This technique is commonly used in Natural Language Processing to train models that can understand and artificially generate code-mixed texts. The code-mixed data is then preprocessed, and a word embedding procedure takes place. Lastly, the researchers employ sentiment classification in the code-mixed corpus using Attention-based bilateral Long Short-Term Memory CNN (A-BLSTM-CNN). The result shows that the suggested method outperforms the alternatives in classifying the sentiment of code-mixed Filipino-English texts, with an accuracy rate of 92% (2 classes) and 80.24% (3 classes). |
---|---|
DOI: | 10.1109/ICDATE58146.2023.10248926 |