Towards a guideline for evaluation metrics in medical image segmentation

In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these models have powerful prediction capabilities and achieved similar results as clinicians. However, re...

Full description

Saved in:

Bibliographic Details
Published in:	BMC research notes Vol. 15; no. 1; p. 210
Main Authors:	Müller, Dominik, Soto-Rey, Iñaki, Kramer, Frank
Format:	Journal Article
Language:	English
Published:	England BioMed Central Ltd 20-06-2022 BioMed Central BMC
Subjects:	Accuracy Algorithms Artificial Intelligence Automation Benchmarking Bias Biomedical image segmentation; Semantic segmentation; Medical Image Analysis Computer-aided medical diagnosis Deep learning Diagnostic imaging Evaluation Guideline Image processing Image Processing, Computer-Assisted - methods Image segmentation Methods Performance assessment Practice guidelines (Medicine) Reproducibility Reproducibility of Results ROC Curve Segmentation Statistics Germany Biomedical image segmentation; Semantic segmentation; Medical Image Analysis Evaluation Guideline Reproducibility Performance assessment
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these models have powerful prediction capabilities and achieved similar results as clinicians. However, recent studies revealed that the evaluation in image segmentation studies lacks reliable model performance assessment and showed statistical bias by incorrect metric implementation or usage. Thus, this work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary as well as multi-class problems: Dice similarity coefficient, Jaccard, Sensitivity, Specificity, Rand index, ROC curves, Cohen's Kappa, and Hausdorff distance. Furthermore, common issues like class imbalance and statistical as well as interpretation biases in evaluation are discussed. As a summary, we propose a guideline for standardized medical image segmentation evaluation to improve evaluation quality, reproducibility, and comparability in the research field.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Correspondence-2 content type line 23 ObjectType-Review-1
ISSN:	1756-0500 1756-0500
DOI:	10.1186/s13104-022-06096-y