Extracting Rule RF in Educational Data Classification: From a Random Forest to Interpretable Refined Rules

To early detect in-trouble students in an academic credit system has been emerging in the educational data mining research arena. This problem has been taken into consideration with a multi-class educational data classification task. Although many existing supervised learning algorithms are availabl...

Full description

Saved in:
Bibliographic Details
Published in:2015 International Conference on Advanced Computing and Applications (ACOMP) pp. 20 - 27
Main Authors: Lu Thi, Kim Phung, Vo Thi, Ngoc Chau, Phung, Nguyen Hua
Format: Conference Proceeding
Language:English
Published: IEEE 01-11-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To early detect in-trouble students in an academic credit system has been emerging in the educational data mining research arena. This problem has been taken into consideration with a multi-class educational data classification task. Although many existing supervised learning algorithms are available and able to provide us with many acceptable classification models, the interpretability of these models needs to be investigated so that they can be applied in practice. On the other hand, random forests have been examined and appeared to be an appropriate solution to effectively classify the students for early in-trouble student detection in a credit system. However, random forests are black-box ensemble models which lack a capability of explanation for the reasoning behind their prediction. Therefore, in this paper, we define a rule extraction algorithm named ExtractingRuleRF to derive an interpretable refined classification rule set from a random forest for a multi-class data classification task. The proposed algorithm follows a greedy approach with two phases: rule refinement and rule extraction. In the first phase, we prepare a ranked weighted rule set with more interpretability and equivalent classification power of the input random forest by retaining its classification scheme. In the second phase, our rule extraction process returns the best rules for the highest accuracy and/or a full coverage based on the priority of each ranked rule. Consequently, the theoretical analysis of the algorithm and experimental results on real educational data sets have shown that ExtractingRuleRF can produce a more effective and interpretable rule-based classification model than its corresponding random forest. Such a result helps our knowledge-based educational decision support with interpretable classification rules to be more practical.
DOI:10.1109/ACOMP.2015.13