Hoax Detection on Social Media with Convolutional Neural Network (CNN) and Support Vector Machine (SVM)

Hoax news has long been a problem for society that is quite worrying because receiving hoax news can change a person's point of view to something that is not good, the impact of which is detrimental to many individuals and groups of people. Machine learning and deep learning can be implemented...

Full description

Saved in:
Bibliographic Details
Published in:2023 11th International Conference on Information and Communication Technology (ICoICT) pp. 361 - 366
Main Authors: Benedict, Manuel, Setiawan, Erwin Budi
Format: Conference Proceeding
Language:English
Published: IEEE 23-08-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hoax news has long been a problem for society that is quite worrying because receiving hoax news can change a person's point of view to something that is not good, the impact of which is detrimental to many individuals and groups of people. Machine learning and deep learning can be implemented to detect hoax news. Examples of methods used in previous studies are SVM (Support Vector Machine) and CNN (Convolutional Neural Network). This research proposes the application of the CNN and SVM methods. In addition, this research develops a CNN-SVM hybrid model, which is the uniqueness of this research. The dataset is sourced from Twitter which focuses on the Ferdy Sambo Case and the Kanjuruhan Tragedy that will occur in 2022. The dataset amounts to 25,325 and is divided into two with a splitting ratio of 90:10. After three algorithms was trained, they achieved excellent performance. This matter can be seen from the accuracy scores for the two methods, which managed to improve their performance after feature extraction and expansion were applied with TF-IDF (Term Frequency Inverse Document Frequency) feature extraction, unigram + bigram weighting, and feature expansion with GloVe (Global Vector for Word Representation). The highest performance model is the SVM model with the similarity top 1 and Tweet corpus (95.95% accuracy), followed by the hybrid CNN-SVM model with the similarity top 10 and Tweet + News corpus (95.79% accuracy) and CNN model with the similarity top 15 with Tweet + News corpus (95.11% accuracy).
ISSN:2162-1241
DOI:10.1109/ICoICT58202.2023.10262433