Artificial intelligence (AI) models for the ultrasonographic diagnosis of liver tumors and comparison of diagnostic accuracies between AI and human experts
Background Ultrasonography (US) is widely used for the diagnosis of liver tumors. However, the accuracy of the diagnosis largely depends on the visual perception of humans. Hence, we aimed to construct artificial intelligence (AI) models for the diagnosis of liver tumors in US. Methods We constructe...
Saved in:
Published in: | Journal of gastroenterology Vol. 57; no. 4; pp. 309 - 321 |
---|---|
Main Authors: | , , , , , , , , , , , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Singapore
Springer Singapore
01-04-2022
Springer Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Background
Ultrasonography (US) is widely used for the diagnosis of liver tumors. However, the accuracy of the diagnosis largely depends on the visual perception of humans. Hence, we aimed to construct artificial intelligence (AI) models for the diagnosis of liver tumors in US.
Methods
We constructed three AI models based on still B-mode images: model-1 using 24,675 images, model-2 using 57,145 images, and model-3 using 70,950 images. A convolutional neural network was used to train the US images. The four-class liver tumor discrimination by AI, namely, cysts, hemangiomas, hepatocellular carcinoma, and metastatic tumors, was examined. The accuracy of the AI diagnosis was evaluated using tenfold cross-validation. The diagnostic performances of the AI models and human experts were also compared using an independent test cohort of video images.
Results
The diagnostic accuracies of model-1, model-2, and model-3 in the four tumor types are 86.8%, 91.0%, and 91.1%, whereas those for malignant tumor are 91.3%, 94.3%, and 94.3%, respectively. In the independent comparison of the AIs and physicians, the percentages of correct diagnoses (accuracies) by the AIs are 80.0%, 81.8%, and 89.1% in model-1, model-2, and model-3, respectively. Meanwhile, the median percentages of correct diagnoses are 67.3% (range 63.6%–69.1%) and 47.3% (45.5%–47.3%) by human experts and non-experts, respectively.
Conclusion
The performance of the AI models surpassed that of human experts in the four-class discrimination and benign and malignant discrimination of liver tumors. Thus, the AI models can help prevent human errors in US diagnosis. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0944-1174 1435-5922 |
DOI: | 10.1007/s00535-022-01849-9 |