Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm

Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of intelligent systems Vol. 32; no. 1; pp. 99 - 106
Main Authors:	Al-kababchee, Sarah Ghanim Mahmood, Algamal, Zakariya Yahya, Qasim, Omar Saber
Format:	Journal Article
Language:	English
Published:	Berlin De Gruyter 16-02-2023 Walter de Gruyter GmbH
Subjects:	Algorithms Big Data Cluster analysis Clustering Data mining equilibrium optimizer algorithm feature selection k-means Machine learning means Optimization penalized method swarms Unsupervised learning Vector quantization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the -means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of -means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support.
ISSN:	2191-026X 0334-1860 2191-026X
DOI:	10.1515/jisys-2022-0230