Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm

Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of intelligent systems Vol. 32; no. 1; pp. 99 - 106
Main Authors:	Al-kababchee, Sarah Ghanim Mahmood, Algamal, Zakariya Yahya, Qasim, Omar Saber
Format:	Journal Article
Language:	English
Published:	Berlin De Gruyter 16-02-2023 Walter de Gruyter GmbH
Subjects:	Algorithms Big Data Cluster analysis Clustering Data mining equilibrium optimizer algorithm feature selection k-means Machine learning means Optimization penalized method swarms Unsupervised learning Vector quantization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the -means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of -means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support.
AbstractList	Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the K-means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of K-means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support. Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the -means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of -means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support. Abstract Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the K -means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of K -means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support.
Author	Qasim, Omar Saber Al-kababchee, Sarah Ghanim Mahmood Algamal, Zakariya Yahya
Author_xml	– sequence: 1 givenname: Sarah Ghanim Mahmood surname: Al-kababchee fullname: Al-kababchee, Sarah Ghanim Mahmood email: sarahghanim@uohamdaniya.edu.iq organization: Department of Mathematics, Education College, University of AL-Hamdaniya, 41019 Bartella, Iraq – sequence: 2 givenname: Zakariya Yahya orcidid: 0000-0002-0229-7958 surname: Algamal fullname: Algamal, Zakariya Yahya email: zakariya.algamal@uomosul.edu.iq organization: College of Engineering, University of Warith Al-Anbiyaa, 56001 Karbala, Iraq – sequence: 3 givenname: Omar Saber surname: Qasim fullname: Qasim, Omar Saber email: omar.saber@uomosul.edu.iq organization: Department of Mathematics, University of Mosul, 41002 Mosul, Iraq
BookMark	eNp1kc1P3DAQxa2KSqWUc6-WOKfYTvzVG0IUVkXqBRA3a9Z2gleJvdiJ0Pavx8tWtJfOZZ6seT-P_T6jo5iiR-grJd8op_x8E8quNIww1hDWkg_omFFNqxaPR__oT-i0lA2p1WnKFT9GD1fxCaL1k48zTj3-2UweYsF2XMrsc4gDDhGvw4AdzIDXULzDKWL_vIQxrHNYJpy2c5jCb58xjEPKYX6avqCPPYzFn_7pJ-j-x9Xd5U1z--t6dXlx29iOybnRnNhOyK63WlOhmBJWeAuOt8BpJwVzklnWya6t64IDSTSvz2WOee4sUe0JWh24LsHGbHOYIO9MgmDeDlIeDOQ52NEb2VreMXCkV7byWkWJkL1TyktLe9tW1tmBtc3pefFlNpu05FjXN0xKoZVWQtep88OUzamU7Pv3Wykx-yzMWxZmn4XZZ1Ed3w-OFxjrlzo_5GVXxV_8_5yMtq94KJGZ
CitedBy_id	crossref_primary_10_1080_03610918_2023_2249271
Cites_doi	10.1080/00949655.2020.1822358 10.1109/ICICA.2014.38 10.3390/electronics8101130 10.1016/j.neucom.2012.04.025 10.1145/1497577.1497578 10.1016/j.patrec.2009.09.011 10.1016/j.asoc.2018.05.045 10.1007/s00357-019-09342-4 10.1007/s10462-013-9400-4 10.1109/IGARSS.2009.5417707 10.1007/s10462-019-09682-y 10.1088/1742-6596/1897/1/012004 10.1080/1062936X.2020.1818616 10.1016/S1001-0742(09)60082-6 10.1002/9780470977811 10.1016/j.eswa.2014.03.021 10.1016/j.ins.2012.08.023 10.1007/978-3-642-04005-4 10.1016/j.chemolab.2021.104288 10.1016/j.asoc.2015.09.045 10.1016/j.knosys.2021.107769 10.1016/j.knosys.2019.105190 10.1007/978-3-662-08968-2_16 10.1016/B978-0-12-405163-8.00009-0 10.1016/j.knosys.2020.106167 10.1016/j.eswa.2018.09.015 10.1016/j.engappai.2016.11.003 10.1016/j.patrec.2013.11.012 10.1016/j.knosys.2014.03.015 10.1007/s11047-020-09809-z 10.1080/03610926.2021.1872639 10.1016/j.future.2020.08.031 10.1080/1062936X.2022.2064546 10.1007/s12652-019-01445-5
ContentType	Journal Article
Copyright	2023. This work is published under http://creativecommons.org/licenses/by/4.0 (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: 2023. This work is published under http://creativecommons.org/licenses/by/4.0 (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	AAYXX CITATION JQ2 DOA
DOI	10.1515/jisys-2022-0230
DatabaseName	CrossRef ProQuest Computer Science Collection Directory of Open Access Journals
DatabaseTitle	CrossRef ProQuest Computer Science Collection
DatabaseTitleList	ProQuest Computer Science Collection CrossRef
Database_xml	– sequence: 1 dbid: DOA name: Directory of Open Access Journals url: http://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2191-026X
EndPage	106
ExternalDocumentID	oai_doaj_org_article_73c542ad0f8c439381067fd88e7c1fc3 10_1515_jisys_2022_0230 10_1515_jisys_2022_0230321
GroupedDBID	0R~ 0~D 4.4 7WY AAEMA AAFPC AAFWJ AAGVJ AAPJK AAQCX AASOL AASQH AAXCG ABAOT ABAQN ABFKT ABIQR ABSOE ABUVI ABXMZ ABYKJ ACEFL ACGFS ACTFP ACZBO ADGQD ADGYE ADJVZ ADOZN AEJTT AEQDQ AERZL AEXIE AFBAA AFCXV AFPKN AFQUK AHGBP AHGSO AIERV AIGSN AJATJ ALMA_UNASSIGNED_HOLDINGS ARCSS BAKPI BBCWN BCIFA CFGNV DBYYV EBS GROUPED_DOAJ HZ~ IY9 M0C O9- OK1 P2P QD8 RDG SA. AAYXX AKXKS CITATION M48 SLJYH JQ2
ID	FETCH-LOGICAL-c427t-950c4674fc99168286c6ecad53a514762d72c24743915ada70955152d2e5dc083
IEDL.DBID	DOA
ISSN	2191-026X 0334-1860
IngestDate	Tue Oct 22 15:12:49 EDT 2024 Thu Oct 10 17:58:48 EDT 2024 Fri Aug 23 00:35:34 EDT 2024 Thu Mar 16 03:15:34 EDT 2023
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
License	This work is licensed under the Creative Commons Attribution 4.0 International License.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c427t-950c4674fc99168286c6ecad53a514762d72c24743915ada70955152d2e5dc083
ORCID	0000-0002-0229-7958
OpenAccessLink	https://doaj.org/article/73c542ad0f8c439381067fd88e7c1fc3
PQID	2776989869
PQPubID	2031329
PageCount	12
ParticipantIDs	doaj_primary_oai_doaj_org_article_73c542ad0f8c439381067fd88e7c1fc3 proquest_journals_2776989869 crossref_primary_10_1515_jisys_2022_0230 walterdegruyter_journals_10_1515_jisys_2022_0230321
PublicationCentury	2000
PublicationDate	2023-02-16
PublicationDateYYYYMMDD	2023-02-16
PublicationDate_xml	– month: 02 year: 2023 text: 2023-02-16 day: 16
PublicationDecade	2020
PublicationPlace	Berlin
PublicationPlace_xml	– name: Berlin
PublicationTitle	Journal of intelligent systems
PublicationYear	2023
Publisher	De Gruyter Walter de Gruyter GmbH
Publisher_xml	– name: De Gruyter – name: Walter de Gruyter GmbH
References	2023031519385607452_j_jisys-2022-0230_ref_013 2023031519385607452_j_jisys-2022-0230_ref_035 2023031519385607452_j_jisys-2022-0230_ref_014 2023031519385607452_j_jisys-2022-0230_ref_036 2023031519385607452_j_jisys-2022-0230_ref_015 2023031519385607452_j_jisys-2022-0230_ref_037 2023031519385607452_j_jisys-2022-0230_ref_016 2023031519385607452_j_jisys-2022-0230_ref_038 2023031519385607452_j_jisys-2022-0230_ref_017 2023031519385607452_j_jisys-2022-0230_ref_039 2023031519385607452_j_jisys-2022-0230_ref_018 2023031519385607452_j_jisys-2022-0230_ref_019 2023031519385607452_j_jisys-2022-0230_ref_030 2023031519385607452_j_jisys-2022-0230_ref_031 2023031519385607452_j_jisys-2022-0230_ref_010 2023031519385607452_j_jisys-2022-0230_ref_032 2023031519385607452_j_jisys-2022-0230_ref_011 2023031519385607452_j_jisys-2022-0230_ref_033 2023031519385607452_j_jisys-2022-0230_ref_012 2023031519385607452_j_jisys-2022-0230_ref_034 2023031519385607452_j_jisys-2022-0230_ref_002 2023031519385607452_j_jisys-2022-0230_ref_024 2023031519385607452_j_jisys-2022-0230_ref_003 2023031519385607452_j_jisys-2022-0230_ref_025 2023031519385607452_j_jisys-2022-0230_ref_004 2023031519385607452_j_jisys-2022-0230_ref_026 2023031519385607452_j_jisys-2022-0230_ref_005 2023031519385607452_j_jisys-2022-0230_ref_027 2023031519385607452_j_jisys-2022-0230_ref_006 2023031519385607452_j_jisys-2022-0230_ref_028 2023031519385607452_j_jisys-2022-0230_ref_007 2023031519385607452_j_jisys-2022-0230_ref_029 2023031519385607452_j_jisys-2022-0230_ref_008 2023031519385607452_j_jisys-2022-0230_ref_009 2023031519385607452_j_jisys-2022-0230_ref_040 2023031519385607452_j_jisys-2022-0230_ref_020 2023031519385607452_j_jisys-2022-0230_ref_021 2023031519385607452_j_jisys-2022-0230_ref_022 2023031519385607452_j_jisys-2022-0230_ref_001 2023031519385607452_j_jisys-2022-0230_ref_023
References_xml	– ident: 2023031519385607452_j_jisys-2022-0230_ref_011 doi: 10.1080/00949655.2020.1822358 – ident: 2023031519385607452_j_jisys-2022-0230_ref_026 doi: 10.1109/ICICA.2014.38 – ident: 2023031519385607452_j_jisys-2022-0230_ref_033 doi: 10.3390/electronics8101130 – ident: 2023031519385607452_j_jisys-2022-0230_ref_010 doi: 10.1016/j.neucom.2012.04.025 – ident: 2023031519385607452_j_jisys-2022-0230_ref_012 doi: 10.1145/1497577.1497578 – ident: 2023031519385607452_j_jisys-2022-0230_ref_004 doi: 10.1016/j.patrec.2009.09.011 – ident: 2023031519385607452_j_jisys-2022-0230_ref_009 doi: 10.1016/j.asoc.2018.05.045 – ident: 2023031519385607452_j_jisys-2022-0230_ref_022 doi: 10.1007/s00357-019-09342-4 – ident: 2023031519385607452_j_jisys-2022-0230_ref_013 doi: 10.1007/s10462-013-9400-4 – ident: 2023031519385607452_j_jisys-2022-0230_ref_017 doi: 10.1109/IGARSS.2009.5417707 – ident: 2023031519385607452_j_jisys-2022-0230_ref_016 doi: 10.1007/s10462-019-09682-y – ident: 2023031519385607452_j_jisys-2022-0230_ref_032 doi: 10.1088/1742-6596/1897/1/012004 – ident: 2023031519385607452_j_jisys-2022-0230_ref_034 doi: 10.1080/1062936X.2020.1818616 – ident: 2023031519385607452_j_jisys-2022-0230_ref_025 – ident: 2023031519385607452_j_jisys-2022-0230_ref_030 doi: 10.1016/S1001-0742(09)60082-6 – ident: 2023031519385607452_j_jisys-2022-0230_ref_006 doi: 10.1002/9780470977811 – ident: 2023031519385607452_j_jisys-2022-0230_ref_027 doi: 10.1016/j.eswa.2014.03.021 – ident: 2023031519385607452_j_jisys-2022-0230_ref_037 doi: 10.1016/j.ins.2012.08.023 – ident: 2023031519385607452_j_jisys-2022-0230_ref_001 doi: 10.1007/978-3-642-04005-4 – ident: 2023031519385607452_j_jisys-2022-0230_ref_015 doi: 10.1016/j.chemolab.2021.104288 – ident: 2023031519385607452_j_jisys-2022-0230_ref_020 doi: 10.1016/j.asoc.2015.09.045 – ident: 2023031519385607452_j_jisys-2022-0230_ref_040 doi: 10.1016/j.knosys.2021.107769 – ident: 2023031519385607452_j_jisys-2022-0230_ref_031 doi: 10.1016/j.knosys.2019.105190 – ident: 2023031519385607452_j_jisys-2022-0230_ref_014 doi: 10.1007/978-3-662-08968-2_16 – ident: 2023031519385607452_j_jisys-2022-0230_ref_018 doi: 10.1016/B978-0-12-405163-8.00009-0 – ident: 2023031519385607452_j_jisys-2022-0230_ref_023 doi: 10.1016/j.knosys.2020.106167 – ident: 2023031519385607452_j_jisys-2022-0230_ref_036 – ident: 2023031519385607452_j_jisys-2022-0230_ref_021 doi: 10.1016/j.eswa.2018.09.015 – ident: 2023031519385607452_j_jisys-2022-0230_ref_005 – ident: 2023031519385607452_j_jisys-2022-0230_ref_003 doi: 10.1016/j.engappai.2016.11.003 – ident: 2023031519385607452_j_jisys-2022-0230_ref_002 doi: 10.1016/j.patrec.2013.11.012 – ident: 2023031519385607452_j_jisys-2022-0230_ref_019 doi: 10.1016/j.knosys.2014.03.015 – ident: 2023031519385607452_j_jisys-2022-0230_ref_029 doi: 10.1007/s11047-020-09809-z – ident: 2023031519385607452_j_jisys-2022-0230_ref_008 doi: 10.1080/03610926.2021.1872639 – ident: 2023031519385607452_j_jisys-2022-0230_ref_024 – ident: 2023031519385607452_j_jisys-2022-0230_ref_039 doi: 10.1016/j.future.2020.08.031 – ident: 2023031519385607452_j_jisys-2022-0230_ref_035 doi: 10.1080/1062936X.2022.2064546 – ident: 2023031519385607452_j_jisys-2022-0230_ref_028 – ident: 2023031519385607452_j_jisys-2022-0230_ref_007 doi: 10.1007/s12652-019-01445-5 – ident: 2023031519385607452_j_jisys-2022-0230_ref_038 doi: 10.1007/s11047-020-09809-z
SSID	ssj0000491585
Score	2.3205543
Snippet	Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a... Abstract Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data...
SourceID	doaj proquest crossref walterdegruyter
SourceType	Open Website Aggregation Database Publisher
StartPage	99
SubjectTerms	Algorithms Big Data Cluster analysis Clustering Data mining equilibrium optimizer algorithm feature selection k-means Machine learning means Optimization penalized method swarms Unsupervised learning Vector quantization
Title	Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm
URI	http://www.degruyter.com/doi/10.1515/jisys-2022-0230 https://www.proquest.com/docview/2776989869 https://doaj.org/article/73c542ad0f8c439381067fd88e7c1fc3
Volume	32
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELagE0t5i0BBHhhYoubtZOTRCgmJhYfYIsd22lRtAm0iVH49d05SKBJiYY0cx7qz_X0Xn78j5NzivnLCUJmBZ3PTU5KZPFSumQBZ5h5MMcF1EdsHdv8S3gxQJmdV6gtzwmp54NpwfeYK33O4tNJQAHiiIFXAUgndM2Gnotb5tKJvwdSk5r02EOFGywcwuz_JFssFzAmIvZB2r8GQVutfo5jdd31YLdVoXi3L9nBUY85wh3Qbskgv60Hukg2V75HtthADbdblPnke5GP0Hv7po0VK78yZAgiiYlqhDgKgE81ymmQjigmhFJFL0iKn6q3KdM5_NaMF7B2z7AO65dNRMc_K8eyAPA0Hj9e3ZlMwwRSew0oz8i2B1UNSgawPL4iLQAkufZcDL4JtTzJHOB7GILbPJWeoPwcALh3lSwFk7JB08iJXR4TyAJgcvBeFVuJZiQoTm0sn8IUlXeBggUEuWvvFr7UuRozxBHQXa1PHaOoYTW2QK7TvqhkKWusH4Oa4cXP8l5sN0mu9EzerDD7BmK5_GUQGcX947KvVL8NyHfv4P0Z2QrawBD1mcttBj3TKeaVOyeZCVmd6Tn4CILblhw
link.rule.ids	315,783,787,867,2109,27936,27937
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enhancement+of+K-means+clustering+in+big+data+based+on+equilibrium+optimizer+algorithm&rft.jtitle=Journal+of+intelligent+systems&rft.au=Sarah+Ghanim+Mahmood+Al-kababchee&rft.au=Zakariya+Yahya+Algamal&rft.au=Omar+Saber+Qasim&rft.date=2023-02-16&rft.pub=Walter+de+Gruyter+GmbH&rft.issn=0334-1860&rft.eissn=2191-026X&rft.issue=1&rft_id=info:doi/10.1515%2Fjisys-2022-0230&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2191-026X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2191-026X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2191-026X&client=summon