A Family of Mixture Models for Biclustering
Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has been introduced in a model-based clustering framework by uti...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
10-09-2020
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Biclustering is used for simultaneous clustering of the observations and
variables when there is no group structure known \textit{a priori}. It is being
increasingly used in bioinformatics, text analytics, etc. Previously,
biclustering has been introduced in a model-based clustering framework by
utilizing a structure similar to a mixture of factor analyzers. In such models,
observed variables $\mathbf{X}$ are modelled using a latent variable
$\mathbf{U}$ that is assumed to be from $N(\mathbf{0}, \mathbf{I})$. Clustering
of variables is introduced by imposing constraints on the entries of the factor
loading matrix to be 0 and 1 that results in a block diagonal covariance
matrices. However, this approach is overly restrictive as off-diagonal elements
in the blocks of the covariance matrices can only be 1 which can lead to
unsatisfactory model fit on complex data. Here, the latent variable
$\mathbf{U}$ is assumed to be from a $N(\mathbf{0}, \mathbf{T})$ where
$\mathbf{T}$ is a diagonal matrix. This ensures that the off-diagonal terms in
the block matrices within the covariance matrices are non-zero and not
restricted to be 1. This leads to a superior model fit on complex data. A
family of models are developed by imposing constraints on the components of the
covariance matrix. For parameter estimation, an alternating expectation
conditional maximization (AECM) algorithm is used. Finally, the proposed method
is illustrated using simulated and real datasets. |
---|---|
AbstractList | Biclustering is used for simultaneous clustering of the observations and
variables when there is no group structure known \textit{a priori}. It is being
increasingly used in bioinformatics, text analytics, etc. Previously,
biclustering has been introduced in a model-based clustering framework by
utilizing a structure similar to a mixture of factor analyzers. In such models,
observed variables $\mathbf{X}$ are modelled using a latent variable
$\mathbf{U}$ that is assumed to be from $N(\mathbf{0}, \mathbf{I})$. Clustering
of variables is introduced by imposing constraints on the entries of the factor
loading matrix to be 0 and 1 that results in a block diagonal covariance
matrices. However, this approach is overly restrictive as off-diagonal elements
in the blocks of the covariance matrices can only be 1 which can lead to
unsatisfactory model fit on complex data. Here, the latent variable
$\mathbf{U}$ is assumed to be from a $N(\mathbf{0}, \mathbf{T})$ where
$\mathbf{T}$ is a diagonal matrix. This ensures that the off-diagonal terms in
the block matrices within the covariance matrices are non-zero and not
restricted to be 1. This leads to a superior model fit on complex data. A
family of models are developed by imposing constraints on the components of the
covariance matrix. For parameter estimation, an alternating expectation
conditional maximization (AECM) algorithm is used. Finally, the proposed method
is illustrated using simulated and real datasets. |
Author | Subedi, Sanjeena Tu, Wangshu |
Author_xml | – sequence: 1 givenname: Wangshu surname: Tu fullname: Tu, Wangshu – sequence: 2 givenname: Sanjeena surname: Subedi fullname: Subedi, Sanjeena |
BackLink | https://doi.org/10.48550/arXiv.2009.05098$$DView paper in arXiv |
BookMark | eNotzr1uwjAUQGEPdADaB2Cqd5Rg59qJPQIqP1JQF_bo1r5GlkKCHKjg7avSTmc7-iZs1PUdMTaTIldGa7HAdI_feSGEzYUW1ozZfMk3eI7tg_eBH-L9ekvED72nduChT3wVXXsbrpRid3plLwHbgd7-O2XHzcdxvcvqz-1-vawzLCuToVNSQPBGOVkSoTJggiSUHjRUBIUHCtqS1dZ_gSsdlArJG1kVirRHmLL3v-1T21xSPGN6NL_q5qmGH_NgPck |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | EPD GOX |
DOI | 10.48550/arxiv.2009.05098 |
DatabaseName | arXiv Statistics arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2009_05098 |
GroupedDBID | EPD GOX |
ID | FETCH-LOGICAL-a678-ac4103fd84c16eea4838f1ea1d3537e32d3ef59e959db3c6c364aed81724e5da3 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:46:17 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a678-ac4103fd84c16eea4838f1ea1d3537e32d3ef59e959db3c6c364aed81724e5da3 |
OpenAccessLink | https://arxiv.org/abs/2009.05098 |
ParticipantIDs | arxiv_primary_2009_05098 |
PublicationCentury | 2000 |
PublicationDate | 2020-09-10 |
PublicationDateYYYYMMDD | 2020-09-10 |
PublicationDate_xml | – month: 09 year: 2020 text: 2020-09-10 day: 10 |
PublicationDecade | 2020 |
PublicationYear | 2020 |
Score | 1.7821069 |
SecondaryResourceType | preprint |
Snippet | Biclustering is used for simultaneous clustering of the observations and
variables when there is no group structure known \textit{a priori}. It is being... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Statistics - Computation Statistics - Methodology |
Title | A Family of Mixture Models for Biclustering |
URI | https://arxiv.org/abs/2009.05098 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NTwMhEJ24PXkxGjX1Mxy8GWJZYBeOVVt7UQ_20FvDwpA0MWq6runPd1hq9OIVJiQDGeYNH-8BXJk6qEgwlQdaba7KRnKKIs-NCd4K7Qz2lPmzl_ppYe4niSaH_fyFcevN6ivzAzftTaaTpJxmCijKMj3Zenhe5MvJnopra_9rRxizb_qTJKb7sLdFd2ycl-MAdvDtEK7HLMtLsPfIHlebdGbPkgbZa8sIMrJbsu0SXwFlkSOYTyfzuxnfahRwR9s8d16JkYzBKC8qRKeMNFGgE0FqWaMsg8SoLVptQyN95WWlHAZDsEGhDk4ew4DKfBwCi8E509RCeC1VE6zFhoJL0lgpKQt1AsPes-VHpqFIApJ22Tt9-n_XGeyWqUJMogejcxh8rju8gKIN3WU_l9_lh3GW |
link.rule.ids | 228,230,782,887 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Family+of+Mixture+Models+for+Biclustering&rft.au=Tu%2C+Wangshu&rft.au=Subedi%2C+Sanjeena&rft.date=2020-09-10&rft_id=info:doi/10.48550%2Farxiv.2009.05098&rft.externalDocID=2009_05098 |