Binary and multi-category ratings in a laboratory observer performance study: A comparison

The authors investigated radiologists, performances during retrospective interpretation of screening mammograms when using a binary decision whether to recall a woman for additional procedures or not and compared it with their receiver operating characteristic (ROC) type performance curves using a s...

Full description

Saved in:

Bibliographic Details
Published in:	Medical physics (Lancaster) Vol. 35; no. 10; pp. 4404 - 4409
Main Authors:	Gur, David, Bandos, Andriy I., King, Jill L., Klym, Amy H., Cohen, Cathy S., Hakim, Christiane M., Hardesty, Lara A., Ganott, Marie A., Perrin, Ronald L., Poller, William R., Shah, Ratan, Sumkin, Jules H., Wallace, Luisa P., Rockette, Howard E.
Format:	Journal Article
Language:	English
Published:	United States American Association of Physicists in Medicine 01-10-2008
Subjects:	binary operating point Breast Neoplasms - diagnostic imaging Breast Neoplasms - epidemiology Cancer Computer software Data analysis Female Humans Laboratories Laboratory procedures mammography Mammography - statistics & numerical data Medical imaging Number theory observer performance Observer Variation Other topics in biological and medical physics (restricted to new topics in section 87) Pathology patient diagnosis Pennsylvania - epidemiology Radiation Therapy Physics Radiographic Image Interpretation, Computer-Assisted - methods Radiographic Image Interpretation, Computer-Assisted - utilization Radiologists Reproducibility of Results ROC curves screening mammography sensitivity analysis Sensitivity and Specificity Sequence analysis Task Performance and Analysis screening mammography binary operating point observer performance ROC curves
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The authors investigated radiologists, performances during retrospective interpretation of screening mammograms when using a binary decision whether to recall a woman for additional procedures or not and compared it with their receiver operating characteristic (ROC) type performance curves using a semi-continuous rating scale. Under an Institutional Review Board approved protocol nine experienced radiologists independently rated an enriched set of 155 examinations that they had not personally read in the clinic, mixed with other enriched sets of examinations that they had individually read in the clinic, using both a screening BI-RADS rating scale (recall/not recall) and a semi-continuous ROC type rating scale (0 to 100). The vertical distance, namely the difference in sensitivity levels at the same specificity levels, between the empirical ROC curve and the binary operating point were computed for each reader. The vertical distance averaged over all readers was used to assess the proximity of the performance levels under the binary and ROC-type rating scale. There does not appear to be any systematic tendency of the readers towards a better performance when using either of the two rating approaches, namely four readers performed better using the semi-continuous rating scale, four readers performed better with the binary scale, and one reader had the point exactly on the empirical ROC curve. Only one of the nine readers had a binary “operating point” that was statistically distant from the same reader’s empirical ROC curve. Reader-specific differences ranged from − 0.046 to 0.128 with an average width of the corresponding 95% confidence intervals of 0.2 and p -values ranging for individual readers from 0.050 to 0.966. On average, radiologists performed similarly when using the two rating scales in that the average distance between the run in individual reader’s binary operating point and their ROC curve was close to zero. The 95% confidence interval for the fixed-reader average (0.016) was ( − 0.0206 , 0.0631) (two-sided p -value 0.35). In conclusion the authors found that in retrospective observer performance studies the use of a binary response or a semi-continuous rating scale led to consistent results in terms of performance as measured by sensitivity-specificity operating points.
Bibliography:	Address for correspondence: University of Pittsburgh, Department of Radiology, Imaging Research, 3362 Fifth Avenue, Pittsburgh, PA 15213‐3180. Telephone: 412‐641‐2513; Fax: 412‐641‐2582; Electronic mail gurd@upmc.edu ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Address for correspondence: University of Pittsburgh, Department of Radiology, Imaging Research, 3362 Fifth Avenue, Pittsburgh, PA 15213-3180. Telephone: 412-641-2513; Fax: 412-641-2582; Electronic mail: gurd@upmc.edu
ISSN:	0094-2405 2473-4209 0094-2405
DOI:	10.1118/1.2977766