Robust Parameter Optimisation of Noise-Tolerant Clustering for DENCLUE Using Differential Evolution
Clustering samples based on similarity remains a significant challenge, especially when the goal is to accurately capture the underlying data clusters of complex arbitrary shapes. Existing density-based clustering techniques are known to be best suited for capturing arbitrarily shaped clusters. Howe...
Saved in:
Published in: | Mathematics (Basel) Vol. 12; no. 21; p. 3367 |
---|---|
Main Authors: | , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Basel
MDPI AG
01-11-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Clustering samples based on similarity remains a significant challenge, especially when the goal is to accurately capture the underlying data clusters of complex arbitrary shapes. Existing density-based clustering techniques are known to be best suited for capturing arbitrarily shaped clusters. However, a key limitation of these methods is the difficulty in automatically finding the optimal set of parameters adapted to dataset characteristics, which becomes even more challenging when the data contain inherent noise. In our recent work, we proposed a Differential Evolution-based DENsity CLUstEring (DE-DENCLUE) to optimise DENCLUE parameters. This study evaluates DE-DENCLUE for its robustness in finding accurate clusters in the presence of noise in the data. DE-DENCLUE performance is compared against three other density-based clustering algorithms—DPC based on weighted local density sequence and nearest neighbour assignment (DPCSA), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Variable Kernel Density Estimation–based DENCLUE (VDENCLUE)—across several datasets (i.e., synthetic and real). The study has consistently shown superior results for DE-DENCLUE compared to other models for most datasets with different noise levels. Clustering quality metrics such as the Silhouette Index (SI), Davies–Bouldin Index (DBI), Adjusted Rand Index (ARI), and Adjusted Mutual Information (AMI) consistently show superior SI, ARI, and AMI values across most datasets at different noise levels. However, in some cases regarding DBI, the DPCSA performed better. In conclusion, the proposed method offers a reliable and noise-resilient clustering solution for complex datasets. |
---|---|
ISSN: | 2227-7390 2227-7390 |
DOI: | 10.3390/math12213367 |