An explainable machine learning approach for student dropout prediction

School dropout is a relevant socio-economic problem across the globe. Predictive models have been developed to determine the likelihood of students dropping out of their studies precociously to overcome such a problem. Academic systems, which gather data from many students, are potential sources for...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 233; p. 120933
Main Authors: Krüger, João Gabriel Corrêa, Britto, Alceu de Souza, Barddal, Jean Paul
Format: Journal Article
Language:English
Published: Elsevier Ltd 15-12-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:School dropout is a relevant socio-economic problem across the globe. Predictive models have been developed to determine the likelihood of students dropping out of their studies precociously to overcome such a problem. Academic systems, which gather data from many students, are potential sources for datasets that feed dropout prediction algorithms, thus leading to general improvements in education quality. Despite successful past attempts to predict dropout, several works depict small datasets with features that are hard to reproduce. Furthermore, predicting whether a student will drop out is not enough to diagnose and prevent the problem as it is also necessary to provide potential justifications for the dropout. This paper proposes an approach for creating and enriching a dataset for dropout prediction, which has been applied for dropout prediction using data from 19 schools in Brazil. With this dataset and using classifiers and model explaining techniques, our experiments achieved Area Under the Precision–Recall Curve (AUC-PR) scores of up to 89.5%, Precision up to 95%, Recall up to 93%, and Kolmogorov–Smirnov (KS) rates up to 97% when predicting dropout at different year moments. This study also shows differences when predicting dropouts in different educational stages, such as preschool and secondary education, with the former being more complex than the latter. In addition to the high recognition rates, our proposal identifies potential reasons for student dropout, which are relevant for educational institutions to take preemptive actions. •Dropout prediction is performed with educational, financial, and public data.•Dropout analysis is conducted in different educational stages and time of the year.•Explainable machine learning allows reasoning behind potential dropouts.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.120933