Deep Learning to Distinguish Recalled but Benign Mammography Images in Breast Cancer Screening

False positives in digital mammography screening lead to high recall rates, resulting in unnecessary medical procedures to patients and health care costs. This study aimed to investigate the revolutionary deep learning methods to distinguish recalled but benign mammography images from negative exams...

Full description

Saved in:
Bibliographic Details
Published in:Clinical cancer research Vol. 24; no. 23; pp. 5902 - 5909
Main Authors: Aboutalib, Sarah S, Mohamed, Aly A, Berg, Wendie A, Zuley, Margarita L, Sumkin, Jules H, Wu, Shandong
Format: Journal Article
Language:English
Published: United States 01-12-2018
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:False positives in digital mammography screening lead to high recall rates, resulting in unnecessary medical procedures to patients and health care costs. This study aimed to investigate the revolutionary deep learning methods to distinguish recalled but benign mammography images from negative exams and those with malignancy. Deep learning convolutional neural network (CNN) models were constructed to classify mammography images into malignant (breast cancer), negative (breast cancer free), and recalled-benign categories. A total of 14,860 images of 3,715 patients from two independent mammography datasets: Full-Field Digital Mammography Dataset (FFDM) and a digitized film dataset, Digital Dataset of Screening Mammography (DDSM), were used in various settings for training and testing the CNN models. The ROC curve was generated and the AUC was calculated as a metric of the classification accuracy. Training and testing using only the FFDM dataset resulted in AUC ranging from 0.70 to 0.81. When the DDSM dataset was used, AUC ranged from 0.77 to 0.96. When datasets were combined for training and testing, AUC ranged from 0.76 to 0.91. When pretrained on a large nonmedical dataset and DDSM, the models showed consistent improvements in AUC ranging from 0.02 to 0.05 (all > 0.05), compared with pretraining only on the nonmedical dataset. This study demonstrates that automatic deep learning CNN methods can identify nuanced mammographic imaging features to distinguish recalled-benign images from malignant and negative cases, which may lead to a computerized clinical toolkit to help reduce false recalls.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1078-0432
1557-3265
DOI:10.1158/1078-0432.ccr-18-1115