A multi-step outlier-based anomaly detection approach to network-wide traffic

•We propose a multi-step outlier-based anomaly detection approach to network-wide traffic.•We propose a feature selection algorithm to select relevant non-redundant subset of features.•We propose a tree-based clustering algorithm to generate non-redundant overlapped clusters.•We introduce an efficie...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences Vol. 348; pp. 243 - 271
Main Authors: Bhuyan, Monowar H., Bhattacharyya, D.K., Kalita, J.K.
Format: Journal Article
Language:English
Published: Elsevier Inc 20-06-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We propose a multi-step outlier-based anomaly detection approach to network-wide traffic.•We propose a feature selection algorithm to select relevant non-redundant subset of features.•We propose a tree-based clustering algorithm to generate non-redundant overlapped clusters.•We introduce an efficient score-based outlier estimation technique to detect anomalies in network-wide traffic.•We establish a fast distributed feature extraction framework to extract significant features from raw network-wide traffic.•We conduct extensive experiments using the proposed algorithms with synthetic and real-life network-wide traffic datasets. Outlier detection is of considerable interest in fields such as physical sciences, medical diagnosis, surveillance detection, fraud detection and network anomaly detection. The data mining and network management research communities are interested in improving existing score-based network traffic anomaly detection techniques because of ample scopes to increase performance. In this paper, we present a multi-step outlier-based approach for detection of anomalies in network-wide traffic. We identify a subset of relevant traffic features and use it during clustering and anomaly detection. To support outlier-based network anomaly identification, we use the following modules: a mutual information and generalized entropy based feature selection technique to select a relevant non-redundant subset of features, a tree-based clustering technique to generate a set of reference points and an outlier score function to rank incoming network traffic to identify anomalies. We also design a fast distributed feature extraction and data preparation framework to extract features from raw network-wide traffic. We evaluate our approach in terms of detection rate, false positive rate, precision, recall and F-measure using several high dimensional synthetic and real-world datasets and find the performance superior in comparison to competing algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2016.02.023