Discovering features for detecting malicious websites: An empirical study

Website features and characteristics have shown the ability to detect various web threats – phishing, drive-by downloads, and command and control (C2). Prior research has thoroughly explored the practice of choosing features ahead of time (a priori) and building detection models. However, there is a...

Full description

Saved in:
Bibliographic Details
Published in:Computers & security Vol. 109; p. 102374
Main Authors: McGahagan, John, Bhansali, Darshan, Pinto-Coelho, Ciro, Cukier, Michel
Format: Journal Article
Language:English
Published: Amsterdam Elsevier Ltd 01-10-2021
Elsevier Sequoia S.A
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Website features and characteristics have shown the ability to detect various web threats – phishing, drive-by downloads, and command and control (C2). Prior research has thoroughly explored the practice of choosing features ahead of time (a priori) and building detection models. However, there is an opportunity to investigate new techniques and features for detection. We perform a comprehensive evaluation of discovering features for malicious website detection versus selecting features a priori. We gather 46,580 features derived from a response to a web request and, through a series of feature selection techniques, discover features for detection and compare their performance to those used in prior research. We build several detection models using unsupervised and supervised learning algorithms over various sampling and feature transformation scenarios. Our approach is evaluated on a diverse dataset composed of common threats on the internet. Overall, we find that discovered features can achieve more efficient and comparable detection performance to a priori features with 66% fewer features and can achieve a Matthews Correlation Coefficient (MCC) of up to 0.9008.
ISSN:0167-4048
1872-6208
DOI:10.1016/j.cose.2021.102374