A Family of Mixture Models for Biclustering

Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has been introduced in a model-based clustering framework by uti...

Full description

Saved in:
Bibliographic Details
Main Authors: Tu, Wangshu, Subedi, Sanjeena
Format: Journal Article
Language:English
Published: 10-09-2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has been introduced in a model-based clustering framework by utilizing a structure similar to a mixture of factor analyzers. In such models, observed variables $\mathbf{X}$ are modelled using a latent variable $\mathbf{U}$ that is assumed to be from $N(\mathbf{0}, \mathbf{I})$. Clustering of variables is introduced by imposing constraints on the entries of the factor loading matrix to be 0 and 1 that results in a block diagonal covariance matrices. However, this approach is overly restrictive as off-diagonal elements in the blocks of the covariance matrices can only be 1 which can lead to unsatisfactory model fit on complex data. Here, the latent variable $\mathbf{U}$ is assumed to be from a $N(\mathbf{0}, \mathbf{T})$ where $\mathbf{T}$ is a diagonal matrix. This ensures that the off-diagonal terms in the block matrices within the covariance matrices are non-zero and not restricted to be 1. This leads to a superior model fit on complex data. A family of models are developed by imposing constraints on the components of the covariance matrix. For parameter estimation, an alternating expectation conditional maximization (AECM) algorithm is used. Finally, the proposed method is illustrated using simulated and real datasets.
AbstractList Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has been introduced in a model-based clustering framework by utilizing a structure similar to a mixture of factor analyzers. In such models, observed variables $\mathbf{X}$ are modelled using a latent variable $\mathbf{U}$ that is assumed to be from $N(\mathbf{0}, \mathbf{I})$. Clustering of variables is introduced by imposing constraints on the entries of the factor loading matrix to be 0 and 1 that results in a block diagonal covariance matrices. However, this approach is overly restrictive as off-diagonal elements in the blocks of the covariance matrices can only be 1 which can lead to unsatisfactory model fit on complex data. Here, the latent variable $\mathbf{U}$ is assumed to be from a $N(\mathbf{0}, \mathbf{T})$ where $\mathbf{T}$ is a diagonal matrix. This ensures that the off-diagonal terms in the block matrices within the covariance matrices are non-zero and not restricted to be 1. This leads to a superior model fit on complex data. A family of models are developed by imposing constraints on the components of the covariance matrix. For parameter estimation, an alternating expectation conditional maximization (AECM) algorithm is used. Finally, the proposed method is illustrated using simulated and real datasets.
Author Subedi, Sanjeena
Tu, Wangshu
Author_xml – sequence: 1
  givenname: Wangshu
  surname: Tu
  fullname: Tu, Wangshu
– sequence: 2
  givenname: Sanjeena
  surname: Subedi
  fullname: Subedi, Sanjeena
BackLink https://doi.org/10.48550/arXiv.2009.05098$$DView paper in arXiv
BookMark eNotzr1uwjAUQGEPdADaB2Cqd5Rg59qJPQIqP1JQF_bo1r5GlkKCHKjg7avSTmc7-iZs1PUdMTaTIldGa7HAdI_feSGEzYUW1ozZfMk3eI7tg_eBH-L9ekvED72nduChT3wVXXsbrpRid3plLwHbgd7-O2XHzcdxvcvqz-1-vawzLCuToVNSQPBGOVkSoTJggiSUHjRUBIUHCtqS1dZ_gSsdlArJG1kVirRHmLL3v-1T21xSPGN6NL_q5qmGH_NgPck
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID EPD
GOX
DOI 10.48550/arxiv.2009.05098
DatabaseName arXiv Statistics
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2009_05098
GroupedDBID EPD
GOX
ID FETCH-LOGICAL-a678-ac4103fd84c16eea4838f1ea1d3537e32d3ef59e959db3c6c364aed81724e5da3
IEDL.DBID GOX
IngestDate Mon Jan 08 05:46:17 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a678-ac4103fd84c16eea4838f1ea1d3537e32d3ef59e959db3c6c364aed81724e5da3
OpenAccessLink https://arxiv.org/abs/2009.05098
ParticipantIDs arxiv_primary_2009_05098
PublicationCentury 2000
PublicationDate 2020-09-10
PublicationDateYYYYMMDD 2020-09-10
PublicationDate_xml – month: 09
  year: 2020
  text: 2020-09-10
  day: 10
PublicationDecade 2020
PublicationYear 2020
Score 1.7821069
SecondaryResourceType preprint
Snippet Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known \textit{a priori}. It is being...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Statistics - Computation
Statistics - Methodology
Title A Family of Mixture Models for Biclustering
URI https://arxiv.org/abs/2009.05098
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NTwMhEJ24PXkxGjX1Mxy8GWJZYBeOVVt7UQ_20FvDwpA0MWq6runPd1hq9OIVJiQDGeYNH-8BXJk6qEgwlQdaba7KRnKKIs-NCd4K7Qz2lPmzl_ppYe4niSaH_fyFcevN6ivzAzftTaaTpJxmCijKMj3Zenhe5MvJnopra_9rRxizb_qTJKb7sLdFd2ycl-MAdvDtEK7HLMtLsPfIHlebdGbPkgbZa8sIMrJbsu0SXwFlkSOYTyfzuxnfahRwR9s8d16JkYzBKC8qRKeMNFGgE0FqWaMsg8SoLVptQyN95WWlHAZDsEGhDk4ew4DKfBwCi8E509RCeC1VE6zFhoJL0lgpKQt1AsPes-VHpqFIApJ22Tt9-n_XGeyWqUJMogejcxh8rju8gKIN3WU_l9_lh3GW
link.rule.ids 228,230,782,887
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Family+of+Mixture+Models+for+Biclustering&rft.au=Tu%2C+Wangshu&rft.au=Subedi%2C+Sanjeena&rft.date=2020-09-10&rft_id=info:doi/10.48550%2Farxiv.2009.05098&rft.externalDocID=2009_05098