Estimation of diagnostic test accuracy: A “Rule of Three” for data with repeated observations but without a gold standard

This article considers how to estimate the accuracy of a diagnostic test when there are repeated observations, but without the availability of a gold standard or reference test. We identify conditions under which the structure of the observed data is rich enough to provide sufficient degrees of free...

Full description

Saved in:
Bibliographic Details
Published in:Statistics in medicine Vol. 40; no. 22; pp. 4815 - 4829
Main Author: Walter, Stephen D.
Format: Journal Article
Language:English
Published: Hoboken, USA John Wiley & Sons, Inc 30-09-2021
Wiley Subscription Services, Inc
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This article considers how to estimate the accuracy of a diagnostic test when there are repeated observations, but without the availability of a gold standard or reference test. We identify conditions under which the structure of the observed data is rich enough to provide sufficient degrees of freedom, such that a suitable latent class model can be fitted with identifiable accuracy parameters. We show that a Rule of Three applies, specifying that accuracy can be evaluated as long as there are at least three observations per individual with the given test. This rule also applies if the three observations arise from combinations of different test methods, or from a sequential design in which individuals are tested for a maximum number of times with the same test but stopping if a positive (or negative) result occurs. The rule pertains to tests having an arbitrary number of response categories. Accuracy is evaluated by parameters reflecting rates of misclassification among the response categories, and the model also provides estimates of the underlying distribution of the true disease state. These ideas are illustrated by data from two medical studies. Issues discussed include the advantages and disadvantages of analyzing the response variable as binary or multinomial, as well as the feasibility of testing goodness of fit when the model incorporates a large number of parameters. Comparisons are possible between models that do or do not assume equal accuracy rates for the observations, and between models where certain misclassification parameters are or are not assumed to be zero.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0277-6715
1097-0258
DOI:10.1002/sim.9097