SPE ^: Self-Paced Ensemble of Ensembles for Software Defect Prediction

Software defect prediction aims to predict defect-prone code regions automatically before defects are discovered. Accurate prediction helps software practitioners to prioritize their testing efforts. In recent decades, dozens of approaches have been put forward and acquired good results in this fiel...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on reliability Vol. 71; no. 2; pp. 865 - 879
Main Authors:	Wan, Xiaohui, Zheng, Zheng, Liu, Yang
Format:	Journal Article
Language:	English
Published:	New York IEEE 01-06-2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Complexity theory Datasets Decision trees Ensemble of ensembles Hardness imbalance learning instance hardness Learning Prediction algorithms Software software defect prediction Task analysis Training Training data undersampling
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Software defect prediction aims to predict defect-prone code regions automatically before defects are discovered. Accurate prediction helps software practitioners to prioritize their testing efforts. In recent decades, dozens of approaches have been put forward and acquired good results in this field. However, in practical scenarios, many projects have limited labeled instances; more than that, most of these labeled instances are nondefective. The lack of training data and class imbalance problem together bring serious challenges to software defect prediction tasks. So far, few of prevailing approaches can well handle these two difficulties simultaneously. One important reason is that they do not pay adequate attention to several key instances, which are difficult to classify in a small imbalanced dataset. This article introduces the concept of " instance hardness " to integrate various difficulties of imbalance classification tasks. Based on it, a novel imbalance learning framework named self-paced ensemble of ensembles (SPE<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula>) is proposed to perform software defect prediction. SPE<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula> aims to generate a strong ensemble of ensembles by self-paced harmonizing instance hardness via undersampling. Finally, SPE<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula> is extensively compared with eight imbalance learning approaches on ten open-source defect datasets. Experiments indicate that SPE<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula> improves the performance and achieves better and more significant F-measure values than its existing counterparts, based on Brunner's statistical significance test and Cliff's effect sizes.
ISSN:	0018-9529 1558-1721
DOI:	10.1109/TR.2022.3155183