Sentiment Analysis of Public Opinion Towards Tourism in Bangkalan Regency Using Naïve Bayes Method

Sentiment analysis is natural language processing (NLP) that uses text analysis to recognize and extract opinions in text. Analysis is used to convert unstructured information into more structured information, also to determine whether an object has a positive, negative, or neutral tendency, and is...

Full description

Saved in:
Bibliographic Details
Published in:E3S web of conferences Vol. 499; p. 1016
Main Authors: Fatah, Doni Abdul, Rochman, Eka Mala Sari, Setiawan, Wahyudi, Aulia, Ayussy Rahma, Kamil, Fajrul Ihsan, Su’ud, Ahmad
Format: Journal Article
Language:English
Published: EDP Sciences 01-01-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sentiment analysis is natural language processing (NLP) that uses text analysis to recognize and extract opinions in text. Analysis is used to convert unstructured information into more structured information, also to determine whether an object has a positive, negative, or neutral tendency, and is an effort to facilitate decision making for tourism managers as a recommendation in developing tourist attractions. In this study, opinions were conducted on tourism reviews in Bangkalan using the Naïve Bayes method. This method is a machine learning algorithm to classify text into concepts that are easy to understand and provide accurate results with high efficiency. This method is proven to provide excellent results with a high level of accuracy, especially for large data, but has some drawbacks, sensitive to feature selection. Thus, a feature selection process is needed to improve classification efficiency by reducing the amount of data analyzed, with the Information Gain feature selection method. The word weighting method uses TF-IDF, while the data used comes from google maps reviews taken through web scraping, where tourist visitors provide reviews and ratings of places that have been visited. However, the large number of reviews can make it difficult for tourist attractions managers to manage them, so the process of labeling the sentiment class of the review data obtained 3649 reviews, with 2583 positive, 275 negative, and 457 neutral. Based on the test results that have been carried out using the Information Gain threshold of 0.0001, 0.0003, and 0.0007 can improve the accuracy of the Naïve Bayes model, for the best test at threshold 0.0007, with an accuracy value of 78.68%, precision 80.44%, recall 82.59%, and f1-score 82.53%, from the test results it shows that the use of information gain feature selection and SMOTE technique has a fairly good performance in classifying public opinion sentiment data on tourism in Bangkalan Regency, meaning that tourism management is good seen from the results of visitor satisfaction sentiment.
ISSN:2267-1242
2267-1242
DOI:10.1051/e3sconf/202449901016