Comparing Resampling Techniques in Stroke Prediction with Machine and Deep Learning

Cerebrovascular accident (CVA), commonly known as a stroke, is a major cause of morbidity and mortality worldwide. Recent techniques in stroke prediction include the application of machine learning and deep learning algorithms, the integration of multimodal data, and the use of advanced feature sele...

Full description

Saved in:
Bibliographic Details
Published in:2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS) pp. 1415 - 1420
Main Authors: Thanka, M Roshni, Ram, Kommu Sri, Gandu, Shalem Preetham, Edwin, E Bijolin, Ebenezer, V, Joy, Priscilla
Format: Conference Proceeding
Language:English
Published: IEEE 14-06-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cerebrovascular accident (CVA), commonly known as a stroke, is a major cause of morbidity and mortality worldwide. Recent techniques in stroke prediction include the application of machine learning and deep learning algorithms, the integration of multimodal data, and the use of advanced feature selection methods. Challenges in stroke prediction systems include class imbalance, limited and heterogeneous data availability, interpretability of black-box models, and generalizability across diverse populations. The proposed objective of this study is to address these challenges and enhance the accuracy and reliability of stroke prediction models. Quick detection and management of stroke risk factors can reduce the incidence and severity of stroke. In recent times, machine learning approaches have been applied to estimate stroke risk based on patient data. This study uses three machine learning algorithms and an artificial neural network (ANN) model to predict stroke incidence based on a dataset containing 17 variables. The ANN model was optimized using the RandomSearch hyperparameter tuning technique and trained and tested on both the original unbalanced dataset and six resampled datasets generated using different techniques to address the class imbalance problem. The results indicate that the ANN model performed well on both the original dataset and the resampled datasets. The model achieved a higher accuracy of 99.3% on the dataset resampled using the SMOTE+RandomUnderSampling technique. The study suggests that the resampling techniques employed were effective in improving the performance of the ANN model, especially in dealing with class imbalance challenges in the dataset. The outcomes of this study recommend that the ANN model has the potential to be used as a predictive tool for stroke incidence. More study is needed, however, to confirm the effectiveness of the model on bigger, more diversified datasets and to evaluate its generalizability to other populations.
DOI:10.1109/ICSCSS57650.2023.10169237