Predicting and Interpreting Student Performance Using Ensemble Models and Shapley Additive Explanations

In several areas, including education, the use of machine learning, such as artificial neural networks, has resulted in significant improvements in predicting tasks. The opacity of these models is one of the problems with their use. Prediction models that may offer valuable insights while still bein...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 9; pp. 152688 - 152703
Main Authors: Sahlaoui, Hayat, Alaoui, El Arbi Abdellaoui, Nayyar, Anand, Agoujil, Said, Jaber, Mustafa Musa
Format: Journal Article
Language:English
Published: Piscataway IEEE 2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In several areas, including education, the use of machine learning, such as artificial neural networks, has resulted in significant improvements in predicting tasks. The opacity of these models is one of the problems with their use. Prediction models that may offer valuable insights while still being simple to comprehend are preferred by decision-makers in education. Hence, this study suggests an approach that improves the previous student performance prediction by enhancing performance and explaining why a student's performance is attaining a certain score. A prediction model was proposed and tested using machine learning models. Our models outperform previous work models developed on the same dataset. Using a combined framework of data level and algorithm approaches, the proposed model achieves an accuracy of over 98%, inplying a 20.3% improvement compared with previous work models. As a balancing technique for upsampling data, we use the default strategy of synthetic minority oversampling technique (SMOTE) to oversample all classes to the number of examples in the majority class. We also use ensemble methods. For tuning the parameters, we use a simple grid search algorithm provided by scikit to estimate the optimal parameters of our model. This hyperparameter optimization along with a ten-fold cross-validation process demonstrates the dependability of the novel model. In addition, a novel visual and intuitive technique is used to help determine which factors most influence the score which helps to interpret and understand the entire model and visualizes feature attributions at the observation level for the machine learning model. Therefore, SHAP values are a powerful tool that should be incorporated within the student performance prediction framework by obtaining the prediction and explanation created through the experiment, educators can recognize students at risk early and provide suitable exhortation in an auspicious manner.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3124270