A Server-side Phishing detection API using Long-Short Term Memory (LSTM) and Multi-layer Perceptron (MLP)

The objective of this study was to develop an application programming interface (API) for the detection of phishing attacks. The API harnesses the capabilities of two advanced machine learning models, namely Multi-Layer Perceptron (M`LP) and Long-Short Term Memory (LSTM), to achieve its objectives....

Full description

Saved in:

Bibliographic Details
Published in:	2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG) pp. 1 - 8
Main Authors:	Asani, Emmanuel O., Babalola, Oluwatobi Stephen, Akinola, Abolade Emmanuel, Barnabas, Ayomide Segun, Adams, Olumide Victor, Odumesi, John Olayemi
Format:	Conference Proceeding
Language:	English
Published:	IEEE 02-04-2024
Subjects:	Accuracy Adaptation models Analytical models Data models Deep Learning Electronic mail LSTM Machine Learning MLP Phishing Phishing detection Vectors
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The objective of this study was to develop an application programming interface (API) for the detection of phishing attacks. The API harnesses the capabilities of two advanced machine learning models, namely Multi-Layer Perceptron (M`LP) and Long-Short Term Memory (LSTM), to achieve its objectives. The LSTM model is specifically tailored for sequential data analysis, while the MLP model, with two hidden layers and an adaptive learning rate was equally utilized. By leveraging cutting-edge techniques such as deep learning and natural language processing, the API exhibits enhanced accuracy in identifying phishing emails, bolstering the defense against cyber threats. The study commenced with the collection and preparation of a comprehensive labelled dataset. The preprocessing included the removal of special characters and numerical elements, conversion of letters to lowercase, stemming of non-English stopwords, and word joining with whitespace and so on. To enable the models to process textual data effectively, one-hot encoding was employed to convert tokenized words into binary vectors representing word presence. These binary vectors are subsequently padded to ensure a uniform sequence length, a critical prerequisite for seamless data feeding to the models. Upon preparation, the API leverages the power of pre-trained LSTM and MLP models and chooses the best performing to make its decision. The performance evaluation of the models encompasses essential metrics such as precision, recall, and F1-score, providing valuable insights into false positives, false negatives, and overall classification accuracy. Remarkably, the MLP model achieves an outstanding accuracy rate of 98.8%, closely followed by the LSTM model with an accuracy of 96.8%. The models' performance validates the viability of the developed server-side phishing detection API, highlighting its potential as a critical tool in advancing email security.
DOI:	10.1109/SEB4SDG60871.2024.10629792