Generalizable Inter-Institutional Classification of Abnormal Chest Radiographs Using Efficient Convolutional Neural Networks

Our objective is to evaluate the effectiveness of efficient convolutional neural networks (CNNs) for abnormality detection in chest radiographs and investigate the generalizability of our models on data from independent sources. We used the National Institutes of Health ChestX-ray14 (NIH-CXR) and th...

Full description

Saved in:
Bibliographic Details
Published in:Journal of digital imaging Vol. 32; no. 5; pp. 888 - 896
Main Authors: Pan, Ian, Agarwal, Saurabh, Merck, Derek
Format: Journal Article
Language:English
Published: Cham Springer International Publishing 01-10-2019
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Our objective is to evaluate the effectiveness of efficient convolutional neural networks (CNNs) for abnormality detection in chest radiographs and investigate the generalizability of our models on data from independent sources. We used the National Institutes of Health ChestX-ray14 (NIH-CXR) and the Rhode Island Hospital chest radiograph (RIH-CXR) datasets in this study. Both datasets were split into training, validation, and test sets. The DenseNet and MobileNetV2 CNN architectures were used to train models on each dataset to classify chest radiographs into normal or abnormal categories; models trained on NIH-CXR were designed to also predict the presence of 14 different pathological findings. Models were evaluated on both NIH-CXR and RIH-CXR test sets based on the area under the receiver operating characteristic curve (AUROC). DenseNet and MobileNetV2 models achieved AUROCs of 0.900 and 0.893 for normal versus abnormal classification on NIH-CXR and AUROCs of 0.960 and 0.951 on RIH-CXR. For the 14 pathological findings in NIH-CXR, MobileNetV2 achieved an AUROC within 0.03 of DenseNet for each finding, with an average difference of 0.01. When externally validated on independently collected data (e.g., RIH-CXR-trained models on NIH-CXR), model AUROCs decreased by 3.6–5.2% relative to their locally trained counterparts. MobileNetV2 achieved comparable performance to DenseNet in our analysis, demonstrating the efficacy of efficient CNNs for chest radiograph abnormality detection. In addition, models were able to generalize to external data albeit with performance decreases that should be taken into consideration when applying models on data from different institutions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0897-1889
1618-727X
1618-727X
DOI:10.1007/s10278-019-00180-9