An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems

Background. Solving the class-imbalance problem of within-project software defect prediction (SDP) is an important research topic. Although some class-imbalance learning methods have been presented, there exists room for improvement. For cross-project SDP, we found that the class-imbalanced source u...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on software engineering Vol. 43; no. 4; pp. 321 - 339
Main Authors: Jing, Xiao-Yuan, Wu, Fei, Dong, Xiwei, Xu, Baowen
Format: Journal Article
Language:English
Published: New York IEEE 01-04-2017
IEEE Computer Society
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background. Solving the class-imbalance problem of within-project software defect prediction (SDP) is an important research topic. Although some class-imbalance learning methods have been presented, there exists room for improvement. For cross-project SDP, we found that the class-imbalanced source usually leads to misclassification of defective instances. However, only one work has paid attention to this cross-project class-imbalance problem. Objective. We aim to provide effective solutions for both within-project and cross-project class-imbalance problems. Method. Subclass discriminant analysis (SDA), an effective feature learning method, is introduced to solve the problems. It can learn features with more powerful classification ability from original metrics. For within-project prediction, we improve SDA for achieving balanced subclasses and propose the improved SDA (ISDA) approach. For cross-project prediction, we employ the semi-supervised transfer component analysis (SSTCA) method to make the distributions of source and target data consistent, and propose the SSTCA+ISDA prediction approach. Results. Extensive experiments on four widely used datasets indicate that: 1) ISDA-based solution performs better than other state-of-the-art methods for within-project class-imbalance problem; 2) SSTCA+ISDA proposed for cross-project class-imbalance problem significantly outperforms related methods. Conclusion. Within-project and cross-project class-imbalance problems greatly affect prediction performance, and we provide a unified and effective prediction framework for both problems.
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2016.2597849