Artificial intelligence (AI) models for the ultrasonographic diagnosis of liver tumors and comparison of diagnostic accuracies between AI and human experts

Background Ultrasonography (US) is widely used for the diagnosis of liver tumors. However, the accuracy of the diagnosis largely depends on the visual perception of humans. Hence, we aimed to construct artificial intelligence (AI) models for the diagnosis of liver tumors in US. Methods We constructe...

Full description

Saved in:
Bibliographic Details
Published in:Journal of gastroenterology Vol. 57; no. 4; pp. 309 - 321
Main Authors: Nishida, Naoshi, Yamakawa, Makoto, Shiina, Tsuyoshi, Mekada, Yoshito, Nishida, Mutsumi, Sakamoto, Naoya, Nishimura, Takashi, Iijima, Hiroko, Hirai, Toshiko, Takahashi, Ken, Sato, Masaya, Tateishi, Ryosuke, Ogawa, Masahiro, Mori, Hideaki, Kitano, Masayuki, Toyoda, Hidenori, Ogawa, Chikara, Kudo, Masatoshi
Format: Journal Article
Language:English
Published: Singapore Springer Singapore 01-04-2022
Springer
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background Ultrasonography (US) is widely used for the diagnosis of liver tumors. However, the accuracy of the diagnosis largely depends on the visual perception of humans. Hence, we aimed to construct artificial intelligence (AI) models for the diagnosis of liver tumors in US. Methods We constructed three AI models based on still B-mode images: model-1 using 24,675 images, model-2 using 57,145 images, and model-3 using 70,950 images. A convolutional neural network was used to train the US images. The four-class liver tumor discrimination by AI, namely, cysts, hemangiomas, hepatocellular carcinoma, and metastatic tumors, was examined. The accuracy of the AI diagnosis was evaluated using tenfold cross-validation. The diagnostic performances of the AI models and human experts were also compared using an independent test cohort of video images. Results The diagnostic accuracies of model-1, model-2, and model-3 in the four tumor types are 86.8%, 91.0%, and 91.1%, whereas those for malignant tumor are 91.3%, 94.3%, and 94.3%, respectively. In the independent comparison of the AIs and physicians, the percentages of correct diagnoses (accuracies) by the AIs are 80.0%, 81.8%, and 89.1% in model-1, model-2, and model-3, respectively. Meanwhile, the median percentages of correct diagnoses are 67.3% (range 63.6%–69.1%) and 47.3% (45.5%–47.3%) by human experts and non-experts, respectively. Conclusion  The performance of the AI models surpassed that of human experts in the four-class discrimination and benign and malignant discrimination of liver tumors. Thus, the AI models can help prevent human errors in US diagnosis.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0944-1174
1435-5922
DOI:10.1007/s00535-022-01849-9