Improving K-Means Effectiveness and Efficiency with Initialization Estimates of Cluster Centroids
K-Means is known both for its usefulness in finding clusters of related data as well as its fragility with respect to initialization choices. This paper introduces a 95% more effective and 50% more efficient initialization methods, that could eliminate the need for multiple executions of K-Means to...
Saved in:
Published in: | 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC) pp. 1086 - 1091 |
---|---|
Main Authors: | , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
07-10-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | K-Means is known both for its usefulness in finding clusters of related data as well as its fragility with respect to initialization choices. This paper introduces a 95% more effective and 50% more efficient initialization methods, that could eliminate the need for multiple executions of K-Means to find high quality clustering. To initialize the centroids, it selects a multiple, m, of K real data points, computes (mK) 2 distances and keeps only the K maximum( minimum( distance ) ) points. A consequence of this technique enables O(lnK) binary search to find the optimal K on 'linearly' separable clusters. The effectiveness claim applies both to separable and intertwined clusters although the efficiency is lost on intertwined clusters. |
---|---|
DOI: | 10.1109/ICOSEC51865.2021.9591948 |