An Empirical Study on Spectral Clustering-based Software Defect Detection

Software defect detection is essential in software development. Most existing approaches often apply Supervised Machine Learning (SML) techniques for software defect detection. However, SML techniques need to a large number of manual labelling for model training, which is time-consuming and laboriou...

Full description

Saved in:

Bibliographic Details
Published in:	2021 8th International Conference on Dependable Systems and Their Applications (DSA) pp. 20 - 29
Main Authors:	Qing, Mingshuang, Ge, Xiuting, Hui, ZhanWei, Pan, Ya, Fan, Yong, Wang, Xiaojuan, Cao, Xu
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-08-2021
Subjects:	Clustering algorithms Machine learning Machine learning algorithms Manuals Measurement similarity algorithms Software algorithms software defect detection spectral clustering Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Software defect detection is essential in software development. Most existing approaches often apply Supervised Machine Learning (SML) techniques for software defect detection. However, SML techniques need to a large number of manual labelling for model training, which is time-consuming and laborious. An alternative solution is to apply UnSupervised Machine Learning (USML) in software defect detection. USML techniques, as an approach without requiring labeled datasets, have been applied for software defect detection. Spectral clustering, as one of approaches in USML, shows the potential performance in software defect detection. The core of spectral clustering is the similarity algorithms, which calculate the similarity between metric values of software entities to detect software defects. Yet, the current studies on spectral clustering-based software defect detection models rarely consider the impact of different similarity algorithms on defect detection results.To address this problem, we construct an empirical study to investigate the impact of similarity algorithms in the spectral clustering-based software defect detection models. We compare the differences of three similarity algorithms, which contains k-nearest neighbours, fully connected, and vector dot product. We conduct experiments on the two real-world data sets of AEEEM and PROMISE, and the experimental results show the fully connected algorithm has better performance than other algorithms in the spectral clustering-based software defect detection.
ISSN:	2767-6684
DOI:	10.1109/DSA52907.2021.00012