Fuzzy model-based sparse clustering with multivariate t-mixtures

Model-based clustering technique is an optimal choice for the distribution of data sets and to find the real structure using mixture of probability distributions. Many extensions of model-based clustering algorithms are available in the literature for getting most favorable results but still its cha...

Full description

Saved in:

Bibliographic Details
Published in:	Applied artificial intelligence Vol. 37; no. 1
Main Authors:	Ali, Wajid, Yang, Miin-Shen, Ali, Mehboob, Ud-Din, Saif
Format:	Journal Article
Language:	English
Published:	Philadelphia Taylor & Francis 31-12-2023 Taylor & Francis Ltd Taylor & Francis Group
Subjects:	Algorithms Clustering Data points Datasets Environmental laboratories Mixtures Statistical analysis
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Model-based clustering technique is an optimal choice for the distribution of data sets and to find the real structure using mixture of probability distributions. Many extensions of model-based clustering algorithms are available in the literature for getting most favorable results but still its challenging and important research objective for researchers. In the model-based clustering, many proposed methods are based on EM algorithm to overcome its sensitivity and initialization. However, these methods treat data points with feature (variable) components under equal importance, and so cannot distinguish the irrelevant feature components. In most of the cases, there exist some irrelevant features and outliers/noisy points in a data set, upsetting the performance of clustering algorithms. To overcome these issues, we propose a fuzzy model-based t-clustering algorithm using mixture of t-distribution with an $${L_1}$$ L 1 regularization for the identification and selection of better features. In order to demonstrate its novelty and usefulness, we apply our algorithm on artificial and real data sets. We further used our proposed method on soil data set, which was collected in collaboration with and the assistance of Environmental laboratory Karakoram International University (GB) from various point/places of Gilgit Baltistan, Pakistan. The comparison results validate the novelty and superiority of our newly proposed method for both the simulated and real data sets as well as effectiveness in addressing the weaknesses of existing methods.
ISSN:	0883-9514 1087-6545
DOI:	10.1080/08839514.2023.2169299