An explainable machine learning approach for student dropout prediction
School dropout is a relevant socio-economic problem across the globe. Predictive models have been developed to determine the likelihood of students dropping out of their studies precociously to overcome such a problem. Academic systems, which gather data from many students, are potential sources for...
Saved in:
Published in: | Expert systems with applications Vol. 233; p. 120933 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
Elsevier Ltd
15-12-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | School dropout is a relevant socio-economic problem across the globe. Predictive models have been developed to determine the likelihood of students dropping out of their studies precociously to overcome such a problem. Academic systems, which gather data from many students, are potential sources for datasets that feed dropout prediction algorithms, thus leading to general improvements in education quality. Despite successful past attempts to predict dropout, several works depict small datasets with features that are hard to reproduce. Furthermore, predicting whether a student will drop out is not enough to diagnose and prevent the problem as it is also necessary to provide potential justifications for the dropout. This paper proposes an approach for creating and enriching a dataset for dropout prediction, which has been applied for dropout prediction using data from 19 schools in Brazil. With this dataset and using classifiers and model explaining techniques, our experiments achieved Area Under the Precision–Recall Curve (AUC-PR) scores of up to 89.5%, Precision up to 95%, Recall up to 93%, and Kolmogorov–Smirnov (KS) rates up to 97% when predicting dropout at different year moments. This study also shows differences when predicting dropouts in different educational stages, such as preschool and secondary education, with the former being more complex than the latter. In addition to the high recognition rates, our proposal identifies potential reasons for student dropout, which are relevant for educational institutions to take preemptive actions.
•Dropout prediction is performed with educational, financial, and public data.•Dropout analysis is conducted in different educational stages and time of the year.•Explainable machine learning allows reasoning behind potential dropouts. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2023.120933 |