Grey Relational Analysis Based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets
Software quality estimation is important yet difficult in software engineering studies. Historical quality datasets are used to build classification models for estimating fault-proneness. However, the missing values in the datasets severely affect the estimation ability and therefore, cause inconclu...
Saved in:
Published in: | 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS) pp. 86 - 91 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-08-2016
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Software quality estimation is important yet difficult in software engineering studies. Historical quality datasets are used to build classification models for estimating fault-proneness. However, the missing values in the datasets severely affect the estimation ability and therefore, cause inconclusive decision-making. Among the single imputation approaches, k nearest neighbor (kNN) imputation is popular in empirical studies due to the relatively high accuracy. However, researchers are still calling for the optimal parameter setting of kNN imputation. In this study, a novel grey relational analysis based incomplete-instance kNN imputation is built for software quality data. An evaluation is conducted on four quality datasets with different simulated missingness scenarios to analyze the performance of the proposed imputation. The empirical results show that the proposed approach is superior to traditional kNN imputation and mean imputation in most cases. Moreover, the classification accuracy can be maintained or even improved by using this approach in classification tasks. |
---|---|
DOI: | 10.1109/QRS.2016.20 |