Web Traffic Anomaly Detection Using Isolation Forest

As companies increasingly undergo digital transformation, the value of their data assets also rises, making them even more attractive targets for hackers. The large volume of weblogs warrants the use of advanced classification methodologies in order for cybersecurity specialists to identify web traf...

Full description

Saved in:
Bibliographic Details
Published in:Informatics (Basel) Vol. 11; no. 4; p. 83
Main Authors: Chua, Wilson, Pajas, Arsenn Lorette Diamond, Castro, Crizelle Shane, Panganiban, Sean Patrick, Pasuquin, April Joy, Purganan, Merwin Jan, Malupeng, Rica, Pingad, Divine Jessa, Orolfo, John Paul, Lua, Haron Hakeen, Velasco, Lemuel Clark
Format: Journal Article
Language:English
Published: 05-11-2024
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As companies increasingly undergo digital transformation, the value of their data assets also rises, making them even more attractive targets for hackers. The large volume of weblogs warrants the use of advanced classification methodologies in order for cybersecurity specialists to identify web traffic anomalies. This study aims to implement Isolation Forest, an unsupervised machine learning methodology in the identification of anomalous and non-anomalous web traffic. The publicly available weblogs dataset from an e-commerce website underwent data preparation through a systematic pipeline of processes involving data ingestion, data type conversion, data cleaning, and normalization. This led to the addition of derived columns in the training set and manually labeled testing set that was then used to compare the anomaly detection performance of the Isolation Forest model with that of cybersecurity experts. The developed Isolation Forest model was implemented using the Python Scikit-learn library, and exhibited a superior Accuracy of 93%, Precision of 95%, Recall of 90% and F1-Score of 92%. By appropriate data preparation, model development, model implementation, and model evaluation, this study shows that Isolation Forest can be a viable solution for close to accurate web traffic anomaly detection.
ISSN:2227-9709
2227-9709
DOI:10.3390/informatics11040083