Explainability-Informed Targeted Malware Misclassification

In recent years, there has been a surge in malware attacks across critical infrastructures, requiring further research and development of appropriate response and remediation strategies in malware detection and classification. Several works have used machine learning models for malware classificatio...

Full description

Saved in:

Bibliographic Details
Published in:	2024 33rd International Conference on Computer Communications and Networks (ICCCN) pp. 1 - 8
Main Authors:	Card, Quincy, Aryal, Kshitiz, Gupta, Maanak
Format:	Conference Proceeding
Language:	English
Published:	IEEE 29-07-2024
Subjects:	Additives Adversarial Artificial neural networks Benchmark testing Critical infrastructure Dynamic Analysis Explainability Feeds Machine learning Malware Online Analysis Robust Trustworthy White-box attacks
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In recent years, there has been a surge in malware attacks across critical infrastructures, requiring further research and development of appropriate response and remediation strategies in malware detection and classification. Several works have used machine learning models for malware classification into categories, and deep neural networks have shown promising results. However, these models have shown its vulnerabilities against intentionally crafted adversarial attacks, which yield misclassification of a malicious file. Our paper explores such adversarial vulnerabilities of neural network based malware classification systems in the dynamic and online analysis environments. To evaluate our approach, we trained Feed Forward Neural Networks (FFNN) to classify malware categories based on features obtained from dynamic and online analysis environments. We use the state-of-the-art method, SHapley Additive exPlanations (SHAP), for the feature attribution for malware classification, to inform the adversarial attackers about the features with significant importance in classification decisions. Using the explainability-informed features, we perform targeted misclassification adversarial white-box evasion attacks using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks against the trained classifier. Our results demonstrated a high evasion rate for some instances of attacks, showing a clear vulnerability of a malware classifier for such attacks. We offer recommendations for a balanced approach and a benchmark for much-needed future research into evasion attacks against malware classifiers, and develop more robust and trustworthy solutions.
ISSN:	2637-9430
DOI:	10.1109/ICCCN61486.2024.10637629