Quantitative Assessment of Tissue Biomarkers and Construction of a Model to Predict Outcome in Breast Cancer Using Multiple Imputation

John W. Emerson1, Marisa Dolled-Filhart2, Lyndsay Harris3, David L. Rimm2 and David P. Tuck21Department of Statistics, Yale University, New Haven, Connecticut 06520. 2Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510. 3Medical Oncology, Yale University School...

Full description

Saved in:
Bibliographic Details
Published in:Cancer informatics Vol. 2009; no. 7; pp. 29 - 40
Main Authors: Emerson, John W., Dolled-Filhart, Marisa, Harris, Lyndsay, Rimm, David L., Tuck, David P.
Format: Journal Article
Language:English
Published: London, England SAGE Publishing 01-01-2009
SAGE Publications
Sage Publications Ltd
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:John W. Emerson1, Marisa Dolled-Filhart2, Lyndsay Harris3, David L. Rimm2 and David P. Tuck21Department of Statistics, Yale University, New Haven, Connecticut 06520. 2Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510. 3Medical Oncology, Yale University School of Medicine, New Haven, Connecticut 06510.AbstractMissing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the loss of data points because of unevaluable staining, core loss, or the lack of tumor in the histospot. This paper presents a novel approach to these common problems in the context of a tissue protein biomarker analysis in a cohort of patients with breast cancer. Our analysis develops techniques based on multiple imputation to address the missing value problem. We first select markers using a training cohort, identifying a small subset of protein expression levels that are most useful in predicting patient survival. The best model is obtained by including both protein markers (including COX6C, GATA3, NAT1, and ESR1) and lymph node status. The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fi t, with both significantly better than a baseline clinical model. Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort. Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1176-9351
1176-9351
DOI:10.4137/CIN.S911