Consistency of Graphical Model-based Clustering: Robust Clustering using Bayesian Spanning Forest
For statistical inference on clustering, the mixture model-based framework is very popular. On the one hand, the model-based framework is convenient for producing probabilistic estimates of cluster assignments and uncertainty. On the other hand, the specification of a mixture model is fraught with t...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
27-09-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | For statistical inference on clustering, the mixture model-based framework is
very popular. On the one hand, the model-based framework is convenient for
producing probabilistic estimates of cluster assignments and uncertainty. On
the other hand, the specification of a mixture model is fraught with the danger
of misspecification that could lead to inconsistent clustering estimates.
Graphical model-based clustering takes a different model specification
strategy, in which the likelihood treats the data as arising dependently from a
disjoint union of component graphs. To counter the large uncertainty of the
graph, recent work on Bayesian spanning forest proposes using the integrated
posterior of the node partition (marginalized over the latent edge
distribution) to produce probabilistic estimates for clustering. Despite the
strong empirical performance, it is not yet known whether the clustering
estimator is consistent, especially when the data-generating mechanism is
different from the specified graphical model. This article gives a positive
answer in the asymptotic regime: when the data arise from an unknown mixture
distribution, under mild conditions, the posterior concentrates on the
ground-truth partition, producing correct clustering estimates including the
number of clusters. This theoretical result is an encouraging development for
the robust clustering literature, demonstrating the use of graphical models as
a robust alternative to mixture models in model-based clustering. |
---|---|
DOI: | 10.48550/arxiv.2409.19129 |