Independent evaluation of a multi-view multi-task convolutional neural network breast cancer classification model using Finnish mammography screening data
Development of deep convolutional neural networks for breast cancer classification has taken significant steps towards clinical adoption. It is though unclear how the models perform for unseen data, and what is required to adapt them to different demographic populations. In this retrospective study,...
Saved in:
Published in: | Computers in biology and medicine Vol. 161; p. 107023 |
---|---|
Main Authors: | , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
United States
Elsevier Ltd
01-07-2023
Elsevier Limited |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Development of deep convolutional neural networks for breast cancer classification has taken significant steps towards clinical adoption. It is though unclear how the models perform for unseen data, and what is required to adapt them to different demographic populations. In this retrospective study, we adopt an openly available pre-trained mammography breast cancer multi-view classification model and evaluate it by utilizing an independent Finnish dataset.
Transfer learning was used, and the pre-trained model was finetuned with 8,829 examinations from the Finnish dataset (4,321 normal, 362 malignant and 4,146 benign examinations). Holdout dataset with 2,208 examinations from the Finnish dataset (1,082 normal, 70 malignant and 1,056 benign examinations) was used in the evaluation. The performance was also evaluated on a manually annotated malignant suspect subset. Receiver Operating Characteristic (ROC) and Precision–Recall curves were used to performance measures.
The Area Under ROC [95%CI] values for malignancy classification obtained with the finetuned model for the entire holdout set were 0.82 [0.76, 0.87], 0.84 [0.77, 0.89], 0.85 [0.79, 0.90], and 0.83 [0.76, 0.89] for R-MLO, L-MLO, R-CC and L-CC views respectively. Performance on the malignant suspect subset was slightly better. On the auxiliary benign classification task performance remained low.
The results indicate that the model performs well also in an out-of-distribution setting. Finetuning allowed the model to adapt to some of the underlying local demographics. Future research should concentrate to identify breast cancer subgroups adversely affecting performance, as it is a requirement for increasing the model’s readiness level for a clinical setting.
•There is an overwhelming reading workload from routine mammography screening.•This reading workload can be reduced with artificial intelligence.•Developing reliable intelligent analytics require large amount of rich data.•Transfer learning can adapt classification models to different demographics.•Strong pre-trained model is crucial for finetuning with single center datasets. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0010-4825 1879-0534 |
DOI: | 10.1016/j.compbiomed.2023.107023 |