Optimal Permutation Recovery in Permuted Monotone Matrix Model

Motivated by recent research on quantifying bacterial growth dynamics based on genome assemblies, we consider a permuted monotone matrix model , where the rows represent different samples, the columns represent contigs in genome assemblies and the elements represent log-read counts after preprocessi...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the American Statistical Association Vol. 116; no. 535; pp. 1358 - 1372
Main Authors: Ma, Rong, Tony Cai, T., Li, Hongzhe
Format: Journal Article
Language:English
Published: United States Taylor & Francis 2021
Taylor & Francis Ltd
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Motivated by recent research on quantifying bacterial growth dynamics based on genome assemblies, we consider a permuted monotone matrix model , where the rows represent different samples, the columns represent contigs in genome assemblies and the elements represent log-read counts after preprocessing steps and Guanine-Cytosine (GC) adjustment. In this model, Θ is an unknown mean matrix with monotone entries for each row, Π is a permutation matrix that permutes the columns of Θ, and Z is a noise matrix. This article studies the problem of estimation/recovery of Π given the observed noisy matrix Y. We propose an estimator based on the best linear projection, which is shown to be minimax rate-optimal for both exact recovery, as measured by the 0-1 loss, and partial recovery, as quantified by the normalized Kendall's tau distance. Simulation studies demonstrate the superior empirical performance of the proposed estimator over alternative methods. We demonstrate the methods using a synthetic metagenomics dataset of 45 closely related bacterial species and a real metagenomic dataset to compare the bacterial growth dynamics between the responders and the nonresponders of the IBD patients after 8 weeks of treatment. Supplementary materials for this article are available online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Hongzhe Li is Professor of Biostatistics and Statistics, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104.
T. Tony Cai is Daniel H. Silberberg Professor of Statistics, Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104
ISSN:0162-1459
1537-274X
DOI:10.1080/01621459.2020.1713794