Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models

High-dimensional logistic regression is widely used in analyzing data with binary outcomes. In this article, global testing and large-scale multiple testing for the regression coefficients are considered in both single- and two-regression settings. A test statistic for testing the global null hypoth...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the American Statistical Association Vol. 116; no. 534; pp. 984 - 998
Main Authors: Ma, Rong, Tony Cai, T., Li, Hongzhe
Format: Journal Article
Language:English
Published: United States Taylor & Francis 03-04-2021
Taylor & Francis Ltd
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:High-dimensional logistic regression is widely used in analyzing data with binary outcomes. In this article, global testing and large-scale multiple testing for the regression coefficients are considered in both single- and two-regression settings. A test statistic for testing the global null hypothesis is constructed using a generalized low-dimensional projection for bias correction and its asymptotic null distribution is derived. A lower bound for the global testing is established, which shows that the proposed test is asymptotically minimax optimal over some sparsity range. For testing the individual coefficients simultaneously, multiple testing procedures are proposed and shown to control the false discovery rate and falsely discovered variables asymptotically. Simulation studies are carried out to examine the numerical performance of the proposed tests and their superiority over existing methods. The testing procedures are also illustrated by analyzing a dataset of a metabolomics study that investigates the association between fecal metabolites and pediatric Crohn's disease and the effects of treatment on such associations. Supplementary materials for this article are available online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Hongzhe Li is Professor of Biostatistics and Statistics, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104.
T. Tony Cai is Daniel H. Silberberg Professor of Statistics, Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104.
Rong Ma is PhD Candidate, Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104.
ISSN:0162-1459
1537-274X
DOI:10.1080/01621459.2019.1699421