Unsupervised Detection and Clustering of Malicious TLS Flows
Malware abuses TLS to encrypt its malicious traffic, preventing examination by content signatures and deep packet inspection. Network detection of malicious TLS flows is an important, but challenging, problem. Prior works have proposed supervised machine learning detectors using TLS features. Howeve...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
23-12-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Malware abuses TLS to encrypt its malicious traffic, preventing examination
by content signatures and deep packet inspection. Network detection of
malicious TLS flows is an important, but challenging, problem. Prior works have
proposed supervised machine learning detectors using TLS features. However, by
trying to represent all malicious traffic, supervised binary detectors produce
models that are too loose, thus introducing errors. Furthermore, they do not
distinguish flows generated by different malware. On the other hand, supervised
multi-class detectors produce tighter models and can classify flows by malware
family, but require family labels, which are not available for many samples.
To address these limitations, this work proposes a novel unsupervised
approach to detect and cluster malicious TLS flows. Our approach takes as input
network traces from sandboxes. It clusters similar TLS flows using 90 features
that capture properties of the TLS client, TLS server, certificate, and
encrypted payload; and uses the clusters to build an unsupervised detector that
can assign a malicious flow to the cluster it belongs to, or determine it is
benign. We evaluate our approach using 972K traces from a commercial sandbox
and 35M TLS flows from a research network. Our clustering shows very high
precision and recall with an F1 score of 0.993. We compare our unsupervised
detector with two state-of-the-art approaches, showing that it outperforms
both. The false detection rate of our detector is 0.032% measured over four
months of traffic. |
---|---|
DOI: | 10.48550/arxiv.2109.03878 |