Discovering features for detecting malicious websites: An empirical study
Website features and characteristics have shown the ability to detect various web threats – phishing, drive-by downloads, and command and control (C2). Prior research has thoroughly explored the practice of choosing features ahead of time (a priori) and building detection models. However, there is a...
Saved in:
Published in: | Computers & security Vol. 109; p. 102374 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Amsterdam
Elsevier Ltd
01-10-2021
Elsevier Sequoia S.A |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Website features and characteristics have shown the ability to detect various web threats – phishing, drive-by downloads, and command and control (C2). Prior research has thoroughly explored the practice of choosing features ahead of time (a priori) and building detection models. However, there is an opportunity to investigate new techniques and features for detection. We perform a comprehensive evaluation of discovering features for malicious website detection versus selecting features a priori. We gather 46,580 features derived from a response to a web request and, through a series of feature selection techniques, discover features for detection and compare their performance to those used in prior research. We build several detection models using unsupervised and supervised learning algorithms over various sampling and feature transformation scenarios. Our approach is evaluated on a diverse dataset composed of common threats on the internet. Overall, we find that discovered features can achieve more efficient and comparable detection performance to a priori features with 66% fewer features and can achieve a Matthews Correlation Coefficient (MCC) of up to 0.9008. |
---|---|
ISSN: | 0167-4048 1872-6208 |
DOI: | 10.1016/j.cose.2021.102374 |