Comparison of Commercial AI Software Performance for Radiograph Lung Nodule Detection and Bone Age Prediction

Background Multiple commercial artificial intelligence (AI) products exist for assessing radiographs; however, comparable performance data for these algorithms are limited. Purpose To perform an independent, stand-alone validation of commercially available AI products for bone age prediction based o...

Full description

Saved in:
Bibliographic Details
Published in:Radiology Vol. 310; no. 1; p. e230981
Main Authors: van Leeuwen, Kicky G, Schalekamp, Steven, Rutten, Matthieu J C M, Huisman, Merel, Schaefer-Prokop, Cornelia M, de Rooij, Maarten, van Ginneken, Bram, Maresch, Bas, Geurts, Bram H J, van Dijke, Cornelius F, Laupman-Koedam, Emmeline, Hulleman, Enzo V, Verhoeff, Eric L, Meys, Evelyne M J, Mohamed Hoesein, Firdaus A A, Ter Brugge, Floor M, van Hoorn, Francois, van der Wel, Frank, van den Berk, Inge A H, Luyendijk, Jacqueline M, Meakin, James, Habets, Jesse, Verbeke, Jonathan I M L, Nederend, Joost, Meys, Karlijn M E, Deden, Laura N, Langezaal, Lucianne C M, Nasrollah, Mahtab, Meij, Marleen, Boomsma, Martijn F, Vermeulen, Matthijs, Vestering, Myrthe M, Vijlbrief, Onno, Algra, Paul, Algra, Selma, Bollen, Stijn M, Samson, Tijs, von Brucken Fock, Yntor H G
Format: Journal Article
Language:English
Published: United States 01-01-2024
Subjects:
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background Multiple commercial artificial intelligence (AI) products exist for assessing radiographs; however, comparable performance data for these algorithms are limited. Purpose To perform an independent, stand-alone validation of commercially available AI products for bone age prediction based on hand radiographs and lung nodule detection on chest radiographs. Materials and Methods This retrospective study was carried out as part of Project AIR. Nine of 17 eligible AI products were validated on data from seven Dutch hospitals. For bone age prediction, the root mean square error (RMSE) and Pearson correlation coefficient were computed. The reference standard was set by three to five expert readers. For lung nodule detection, the area under the receiver operating characteristic curve (AUC) was computed. The reference standard was set by a chest radiologist based on CT. Randomized subsets of hand ( = 95) and chest ( = 140) radiographs were read by 14 and 17 human readers, respectively, with varying experience. Results Two bone age prediction algorithms were tested on hand radiographs (from January 2017 to January 2022) in 326 patients (mean age, 10 years ± 4 [SD]; 173 female patients) and correlated strongly with the reference standard ( = 0.99; < .001 for both). No difference in RMSE was observed between algorithms (0.63 years [95% CI: 0.58, 0.69] and 0.57 years [95% CI: 0.52, 0.61]) and readers (0.68 years [95% CI: 0.64, 0.73]). Seven lung nodule detection algorithms were validated on chest radiographs (from January 2012 to May 2022) in 386 patients (mean age, 64 years ± 11; 223 male patients). Compared with readers (mean AUC, 0.81 [95% CI: 0.77, 0.85]), four algorithms performed better (AUC range, 0.86-0.93; value range, <.001 to .04). Conclusions Compared with human readers, four AI algorithms for detecting lung nodules on chest radiographs showed improved performance, whereas the remaining algorithms tested showed no evidence of a difference in performance. © RSNA, 2024 . See also the editorial by Omoumi and Richiardi in this issue.
ISSN:1527-1315
DOI:10.1148/radiol.230981