Extracting Rule RF in Educational Data Classification: From a Random Forest to Interpretable Refined Rules

To early detect in-trouble students in an academic credit system has been emerging in the educational data mining research arena. This problem has been taken into consideration with a multi-class educational data classification task. Although many existing supervised learning algorithms are availabl...

Full description

Saved in:
Bibliographic Details
Published in:2015 International Conference on Advanced Computing and Applications (ACOMP) pp. 20 - 27
Main Authors: Lu Thi, Kim Phung, Vo Thi, Ngoc Chau, Phung, Nguyen Hua
Format: Conference Proceeding
Language:English
Published: IEEE 01-11-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract To early detect in-trouble students in an academic credit system has been emerging in the educational data mining research arena. This problem has been taken into consideration with a multi-class educational data classification task. Although many existing supervised learning algorithms are available and able to provide us with many acceptable classification models, the interpretability of these models needs to be investigated so that they can be applied in practice. On the other hand, random forests have been examined and appeared to be an appropriate solution to effectively classify the students for early in-trouble student detection in a credit system. However, random forests are black-box ensemble models which lack a capability of explanation for the reasoning behind their prediction. Therefore, in this paper, we define a rule extraction algorithm named ExtractingRuleRF to derive an interpretable refined classification rule set from a random forest for a multi-class data classification task. The proposed algorithm follows a greedy approach with two phases: rule refinement and rule extraction. In the first phase, we prepare a ranked weighted rule set with more interpretability and equivalent classification power of the input random forest by retaining its classification scheme. In the second phase, our rule extraction process returns the best rules for the highest accuracy and/or a full coverage based on the priority of each ranked rule. Consequently, the theoretical analysis of the algorithm and experimental results on real educational data sets have shown that ExtractingRuleRF can produce a more effective and interpretable rule-based classification model than its corresponding random forest. Such a result helps our knowledge-based educational decision support with interpretable classification rules to be more practical.
AbstractList To early detect in-trouble students in an academic credit system has been emerging in the educational data mining research arena. This problem has been taken into consideration with a multi-class educational data classification task. Although many existing supervised learning algorithms are available and able to provide us with many acceptable classification models, the interpretability of these models needs to be investigated so that they can be applied in practice. On the other hand, random forests have been examined and appeared to be an appropriate solution to effectively classify the students for early in-trouble student detection in a credit system. However, random forests are black-box ensemble models which lack a capability of explanation for the reasoning behind their prediction. Therefore, in this paper, we define a rule extraction algorithm named ExtractingRuleRF to derive an interpretable refined classification rule set from a random forest for a multi-class data classification task. The proposed algorithm follows a greedy approach with two phases: rule refinement and rule extraction. In the first phase, we prepare a ranked weighted rule set with more interpretability and equivalent classification power of the input random forest by retaining its classification scheme. In the second phase, our rule extraction process returns the best rules for the highest accuracy and/or a full coverage based on the priority of each ranked rule. Consequently, the theoretical analysis of the algorithm and experimental results on real educational data sets have shown that ExtractingRuleRF can produce a more effective and interpretable rule-based classification model than its corresponding random forest. Such a result helps our knowledge-based educational decision support with interpretable classification rules to be more practical.
Author Lu Thi, Kim Phung
Phung, Nguyen Hua
Vo Thi, Ngoc Chau
Author_xml – sequence: 1
  givenname: Kim Phung
  surname: Lu Thi
  fullname: Lu Thi, Kim Phung
  email: lutkphung@gmail.com
  organization: Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
– sequence: 2
  givenname: Ngoc Chau
  surname: Vo Thi
  fullname: Vo Thi, Ngoc Chau
  email: chauvtn@cse.hcmut.edu.vn
  organization: Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
– sequence: 3
  givenname: Nguyen Hua
  surname: Phung
  fullname: Phung, Nguyen Hua
  email: phung@cse.hcmut.edu.vn
  organization: Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
BookMark eNotj8tOwzAQRY0EErR0yYqNf6DBj8R22FWhgUpFRRWsq4k9QUapUzmuBH_fQFnd0cyZI90JuQx9QELuOMs4Z-XDotq8vmWC8SLj8oJMeK60NELmxTWZDYNvmFBaFYyZG_K1_E4RbPLhk26PHdJtTX2gS3e0kHwfoKNPkIBWHYyfrT9vH2kd-z0FuoXgxqHuIw6Jpp6uQsJ4iJig-ZVh6wO6P_NwS65a6Aac_eeUfNTL9-plvt48r6rFem650mkupRVKGZ0bLRCFkMIo2TS2aEuQpcCxlnbOoHZWFMAMt67kyuqcj9iIyym5P3s9Iu4O0e8h_ux0Pt40kydliFZb
CODEN IEEPAD
CitedBy_id crossref_primary_10_1007_s10115_024_02069_8
crossref_primary_10_3390_electronics11132082
crossref_primary_10_1007_s12530_022_09434_4
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ACOMP.2015.13
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1467382345
9781467382342
EndPage 27
ExternalDocumentID 7422370
Genre orig-research
GroupedDBID 6IE
6IL
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-c167t-33c266874872ee2232863bbc5f9a392e0157dd8e7dc25a081cd916c74163b2233
IEDL.DBID RIE
IngestDate Thu Jan 18 11:13:32 EST 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c167t-33c266874872ee2232863bbc5f9a392e0157dd8e7dc25a081cd916c74163b2233
PageCount 8
ParticipantIDs ieee_primary_7422370
PublicationCentury 2000
PublicationDate 20151101
PublicationDateYYYYMMDD 2015-11-01
PublicationDate_xml – month: 11
  year: 2015
  text: 20151101
  day: 01
PublicationDecade 2010
PublicationTitle 2015 International Conference on Advanced Computing and Applications (ACOMP)
PublicationTitleAbbrev ACOMP
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib026765008
Score 1.6761947
Snippet To early detect in-trouble students in an academic credit system has been emerging in the educational data mining research arena. This problem has been taken...
SourceID ieee
SourceType Publisher
StartPage 20
SubjectTerms Cities and towns
Classification algorithms
Data mining
Data models
Decision trees
ensemble
interpretable classification model
multi-class classification
Prediction algorithms
random forest
rule extraction
Vegetation
Title Extracting Rule RF in Educational Data Classification: From a Random Forest to Interpretable Refined Rules
URI https://ieeexplore.ieee.org/document/7422370
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoJyZALeJbHhhJW-zYTthQ26gLUBWQ2Cp_XBCoJKhJJH4-Z7e0DCxslhU50TnRvXPeu0fIpUYMnjMuo9ilCgsUgd-c5DYyQggH3CWx8UfZk0d1_5KMxr5NztVGCwMAgXwGPT8M__JdaRt_VNbHMo5xhQV6S6XJSqv18-4wqRBrDJJtG83-7fDhburJW6LnzQt-maeE3JHt_e-u-6S7FeHR6Sa9HJAdKDrkffxVB11T8UpnzQLoLKNvBd3QNPSCjnStafC69CygMHtDs2X5QTWd6cLhwPtxVjWtS7rlHBq_GOQIOl1YueqS52z8NJxEa7uEyF5LVUecW8y2icIShAHgMzOMujFW5KlGFAQYCOVcAspZJjRCAesQG9oAyQxezg9JuygLOCLUKctlnBrBJY_ZINcgQaQm1UqqXHFxTDo-TvPPVUeM-TpEJ39Pn5JdvwsrBd8ZadfLBs5Jq3LNRdjDbxpcnI0
link.rule.ids 310,311,782,786,791,792,798,27934,54767
linkProvider IEEE
linkToHtml http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgDDABahHfeGAkLcSxnbChfqiItlSlSGyVPy4IVBLUJhI_n7NbWgYWNsuKnOic6N45790j5FIhBk9DJoLIJhILFI7fnGAm0JxzC8zGkXZH2d0nOXiJW23XJudqpYUBAE8-g7ob-n_5NjelOyprYBkXMokF-haPpJALtdbP2xMKiWjjOl430mzcNR_7Q0ff4nVnX_DLPsVnj87u_-67R2prGR4drhLMPtmArEre21-FVzZlr3RUToGOOvQtoyuihprSlioU9W6XjgfkZ29pZ5Z_UEVHKrM4cI6c84IWOV2zDrVbDFKEndavPK-R50573OwGS8OEwNwIWQSMGcy3scQiJATAZw4x7lobniYKcRBgIKS1MUhrQq4QDBiL6NB4UKbxcnZAKlmewSGhVhomokRzJlgUXqcKBPBEJwrjnkrGj0jVxWnyueiJMVmG6Pjv6Quy3R33e5Pe_eDhhOy4HVno-U5JpZiVcEY257Y89_v5DYf4n94
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+International+Conference+on+Advanced+Computing+and+Applications+%28ACOMP%29&rft.atitle=Extracting+Rule+RF+in+Educational+Data+Classification%3A+From+a+Random+Forest+to+Interpretable+Refined+Rules&rft.au=Lu+Thi%2C+Kim+Phung&rft.au=Vo+Thi%2C+Ngoc+Chau&rft.au=Phung%2C+Nguyen+Hua&rft.date=2015-11-01&rft.pub=IEEE&rft.spage=20&rft.epage=27&rft_id=info:doi/10.1109%2FACOMP.2015.13&rft.externalDocID=7422370