Fuzzy model-based sparse clustering with multivariate t-mixtures
Model-based clustering technique is an optimal choice for the distribution of data sets and to find the real structure using mixture of probability distributions. Many extensions of model-based clustering algorithms are available in the literature for getting most favorable results but still its cha...
Saved in:
Published in: | Applied artificial intelligence Vol. 37; no. 1 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Philadelphia
Taylor & Francis
31-12-2023
Taylor & Francis Ltd Taylor & Francis Group |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Model-based clustering technique is an optimal choice for the distribution of data sets and to find the real structure using mixture of probability distributions. Many extensions of model-based clustering algorithms are available in the literature for getting most favorable results but still its challenging and important research objective for researchers. In the model-based clustering, many proposed methods are based on EM algorithm to overcome its sensitivity and initialization. However, these methods treat data points with feature (variable) components under equal importance, and so cannot distinguish the irrelevant feature components. In most of the cases, there exist some irrelevant features and outliers/noisy points in a data set, upsetting the performance of clustering algorithms. To overcome these issues, we propose a fuzzy model-based t-clustering algorithm using mixture of t-distribution with an
$${L_1}$$
L
1
regularization for the identification and selection of better features. In order to demonstrate its novelty and usefulness, we apply our algorithm on artificial and real data sets. We further used our proposed method on soil data set, which was collected in collaboration with and the assistance of Environmental laboratory Karakoram International University (GB) from various point/places of Gilgit Baltistan, Pakistan. The comparison results validate the novelty and superiority of our newly proposed method for both the simulated and real data sets as well as effectiveness in addressing the weaknesses of existing methods. |
---|---|
ISSN: | 0883-9514 1087-6545 |
DOI: | 10.1080/08839514.2023.2169299 |