Improving K-Means Effectiveness and Efficiency with Initialization Estimates of Cluster Centroids

K-Means is known both for its usefulness in finding clusters of related data as well as its fragility with respect to initialization choices. This paper introduces a 95% more effective and 50% more efficient initialization methods, that could eliminate the need for multiple executions of K-Means to...

Full description

Saved in:
Bibliographic Details
Published in:2021 2nd International Conference on Smart Electronics and Communication (ICOSEC) pp. 1086 - 1091
Main Authors: Ojha, Rajesh Kumar, Srivastava, Sandeep, Goyal, Mohit, Kumar, Lalan, Kumar, Amit, Prasad, Chitturi
Format: Conference Proceeding
Language:English
Published: IEEE 07-10-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:K-Means is known both for its usefulness in finding clusters of related data as well as its fragility with respect to initialization choices. This paper introduces a 95% more effective and 50% more efficient initialization methods, that could eliminate the need for multiple executions of K-Means to find high quality clustering. To initialize the centroids, it selects a multiple, m, of K real data points, computes (mK) 2 distances and keeps only the K maximum( minimum( distance ) ) points. A consequence of this technique enables O(lnK) binary search to find the optimal K on 'linearly' separable clusters. The effectiveness claim applies both to separable and intertwined clusters although the efficiency is lost on intertwined clusters.
DOI:10.1109/ICOSEC51865.2021.9591948