Supervised diagnostic classification of cognitive attributes using data augmentation

Over recent decades, machine learning, an integral subfield of artificial intelligence, has revolutionized diverse sectors, enabling data-driven decisions with minimal human intervention. In particular, the field of educational assessment emerges as a promising area for machine learning applications...

Full description

Saved in:

Bibliographic Details
Published in:	PloS one Vol. 19; no. 1; p. e0296464
Main Authors:	Yoon, Ji-Young, Gweon, Gahgene, Yoo, Yun Joo
Format:	Journal Article
Language:	English
Published:	United States Public Library of Science 05-01-2024 Public Library of Science (PLoS)
Subjects:	Analysis Artificial intelligence Classification Computational linguistics Data augmentation Datasets Educational evaluation Forecasts and trends Labels Language processing Learning algorithms Machine learning Medical diagnosis Natural language interfaces Neural networks Students Support vector machines Technology application South Korea
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Over recent decades, machine learning, an integral subfield of artificial intelligence, has revolutionized diverse sectors, enabling data-driven decisions with minimal human intervention. In particular, the field of educational assessment emerges as a promising area for machine learning applications, where students can be classified and diagnosed using their performance data. The objectives of Diagnostic Classification Models (DCMs), which provide a suite of methods for diagnosing students' cognitive states in relation to the mastery of necessary cognitive attributes for solving problems in a test, can be effectively addressed through machine learning techniques. However, the challenge lies in the latent nature of cognitive status, which makes it difficult to obtain labels for the training dataset. Consequently, the application of machine learning methods to DCMs often assumes smaller training sets with labels derived either from theoretical considerations or human experts. In this study, the authors propose a supervised diagnostic classification model with data augmentation (SDCM-DA). This method is designed to utilize the augmented data using a data generation model constructed by leveraging the probability of correct responses for each attribute mastery pattern derived from the expert-labeled dataset. To explore the benefits of data augmentation, a simulation study is carried out, contrasting it with classification methods that rely solely on the expert-labeled dataset for training. The findings reveal that utilizing data augmentation with the estimated probabilities of correct responses substantially enhances classification accuracy. This holds true even when the augmentation originates from a small labeled sample with occasional labeling errors, and when the tests contain lower-quality items that may inaccurately measure students' true cognitive status. Moreover, the study demonstrates that leveraging augmented data for learning can enable the successful classification of students, thereby eliminating the necessity for specifying an underlying response model.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0296464