Robust Parameter Optimisation of Noise-Tolerant Clustering for DENCLUE Using Differential Evolution

Clustering samples based on similarity remains a significant challenge, especially when the goal is to accurately capture the underlying data clusters of complex arbitrary shapes. Existing density-based clustering techniques are known to be best suited for capturing arbitrarily shaped clusters. Howe...

Full description

Saved in:
Bibliographic Details
Published in:Mathematics (Basel) Vol. 12; no. 21; p. 3367
Main Authors: Ajmal, Omer, Arshad, Humaira, Arshed, Muhammad Asad, Ahmed, Saeed, Mumtaz, Shahzad
Format: Journal Article
Language:English
Published: Basel MDPI AG 01-11-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Clustering samples based on similarity remains a significant challenge, especially when the goal is to accurately capture the underlying data clusters of complex arbitrary shapes. Existing density-based clustering techniques are known to be best suited for capturing arbitrarily shaped clusters. However, a key limitation of these methods is the difficulty in automatically finding the optimal set of parameters adapted to dataset characteristics, which becomes even more challenging when the data contain inherent noise. In our recent work, we proposed a Differential Evolution-based DENsity CLUstEring (DE-DENCLUE) to optimise DENCLUE parameters. This study evaluates DE-DENCLUE for its robustness in finding accurate clusters in the presence of noise in the data. DE-DENCLUE performance is compared against three other density-based clustering algorithms—DPC based on weighted local density sequence and nearest neighbour assignment (DPCSA), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Variable Kernel Density Estimation–based DENCLUE (VDENCLUE)—across several datasets (i.e., synthetic and real). The study has consistently shown superior results for DE-DENCLUE compared to other models for most datasets with different noise levels. Clustering quality metrics such as the Silhouette Index (SI), Davies–Bouldin Index (DBI), Adjusted Rand Index (ARI), and Adjusted Mutual Information (AMI) consistently show superior SI, ARI, and AMI values across most datasets at different noise levels. However, in some cases regarding DBI, the DPCSA performed better. In conclusion, the proposed method offers a reliable and noise-resilient clustering solution for complex datasets.
ISSN:2227-7390
2227-7390
DOI:10.3390/math12213367