Grey Relational Analysis Based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets

Software quality estimation is important yet difficult in software engineering studies. Historical quality datasets are used to build classification models for estimating fault-proneness. However, the missing values in the datasets severely affect the estimation ability and therefore, cause inconclu...

Full description

Saved in:
Bibliographic Details
Published in:2016 IEEE International Conference on Software Quality, Reliability and Security (QRS) pp. 86 - 91
Main Authors: Jianglin Huang, Hongyi Sun
Format: Conference Proceeding
Language:English
Published: IEEE 01-08-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Software quality estimation is important yet difficult in software engineering studies. Historical quality datasets are used to build classification models for estimating fault-proneness. However, the missing values in the datasets severely affect the estimation ability and therefore, cause inconclusive decision-making. Among the single imputation approaches, k nearest neighbor (kNN) imputation is popular in empirical studies due to the relatively high accuracy. However, researchers are still calling for the optimal parameter setting of kNN imputation. In this study, a novel grey relational analysis based incomplete-instance kNN imputation is built for software quality data. An evaluation is conducted on four quality datasets with different simulated missingness scenarios to analyze the performance of the proposed imputation. The empirical results show that the proposed approach is superior to traditional kNN imputation and mean imputation in most cases. Moreover, the classification accuracy can be maintained or even improved by using this approach in classification tasks.
DOI:10.1109/QRS.2016.20