Model-Based Clustering with Nested Gaussian Clusters
A dataset may exhibit multiple class labels for each observation; sometimes, these class labels manifest in a hierarchical structure. A textbook analogy would be that a book can be labelled as statistics as well as the encompassing label of non-fiction. To capture this behaviour in a model-based clu...
Saved in:
Published in: | Journal of classification Vol. 41; no. 1; pp. 39 - 64 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
New York
Springer US
01-03-2024
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A dataset may exhibit multiple class labels for each observation; sometimes, these class labels manifest in a hierarchical structure. A textbook analogy would be that a book can be labelled as statistics as well as the encompassing label of non-fiction. To capture this behaviour in a model-based clustering context, we describe a model formulation and estimation procedure for performing clustering with nested Gaussian clusters in orthogonal intrinsic variable subspaces. We elucidate a two-stage clustering model, whereby the observed manifest variables are assumed to be a rotation of intrinsic primary and secondary clustering subspaces with additional noise subspaces. In a hierarchical sense, secondary clusters are presumed to be subclusters of primary clusters and so share Gaussian cluster parameters in the primary cluster subspace. An estimation procedure using the expectation-maximization algorithm is provided, with model selection via Bayesian information criterion. Real-world datasets are evaluated under the proposed model. |
---|---|
ISSN: | 0176-4268 1432-1343 |
DOI: | 10.1007/s00357-023-09453-z |