Shannon's entropy of partitions determined by hierarchical clustering trees in asymmetry and dimension identification
In the multivariate statistics community, it is commonly acknowledged that among the hierarchical clustering tree (HCT) procedures, the single linkage rule for inter-cluster distance, tends to produce trees which are significantly more asymmetric than those obtained using other rules such as complet...
Saved in:
Published in: | Communications in statistics. Simulation and computation Vol. 51; no. 10; pp. 5954 - 5966 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
Philadelphia
Taylor & Francis
03-10-2022
Taylor & Francis Ltd |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the multivariate statistics community, it is commonly acknowledged that among the hierarchical clustering tree (HCT) procedures, the single linkage rule for inter-cluster distance, tends to produce trees which are significantly more asymmetric than those obtained using other rules such as complete linkage, for instance. We consider the use of Shannon's entropy of the partitions determined by HCTs as a measure of the asymmetry of the clustering trees. On a different direction, our simulations show an unexpected relationship between Shannon's entropy of partitions and dimension of the data. Based on this observation a procedure for intrinsic dimension identification based on entropy of partitions is proposed and studied. A theoretical result is established for the dimension identification method stating that, locally, for continuous data on a d-dimensional manifold, the entropy of partitions behaves as if the local data were uniformly sampled from the unit ball of
Evaluation on simulated examples shows that the method proposed compares favorably with other procedures for dimension identification available in the literature. |
---|---|
ISSN: | 0361-0918 1532-4141 |
DOI: | 10.1080/03610918.2020.1788586 |