Model-Based Clustering with Nested Gaussian Clusters

A dataset may exhibit multiple class labels for each observation; sometimes, these class labels manifest in a hierarchical structure. A textbook analogy would be that a book can be labelled as statistics as well as the encompassing label of non-fiction. To capture this behaviour in a model-based clu...

Full description

Saved in:
Bibliographic Details
Published in:Journal of classification Vol. 41; no. 1; pp. 39 - 64
Main Authors: Hou-Liu, Jason, Browne, Ryan P.
Format: Journal Article
Language:English
Published: New York Springer US 01-03-2024
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A dataset may exhibit multiple class labels for each observation; sometimes, these class labels manifest in a hierarchical structure. A textbook analogy would be that a book can be labelled as statistics as well as the encompassing label of non-fiction. To capture this behaviour in a model-based clustering context, we describe a model formulation and estimation procedure for performing clustering with nested Gaussian clusters in orthogonal intrinsic variable subspaces. We elucidate a two-stage clustering model, whereby the observed manifest variables are assumed to be a rotation of intrinsic primary and secondary clustering subspaces with additional noise subspaces. In a hierarchical sense, secondary clusters are presumed to be subclusters of primary clusters and so share Gaussian cluster parameters in the primary cluster subspace. An estimation procedure using the expectation-maximization algorithm is provided, with model selection via Bayesian information criterion. Real-world datasets are evaluated under the proposed model.
ISSN:0176-4268
1432-1343
DOI:10.1007/s00357-023-09453-z