Independent evaluation of a multi-view multi-task convolutional neural network breast cancer classification model using Finnish mammography screening data

Development of deep convolutional neural networks for breast cancer classification has taken significant steps towards clinical adoption. It is though unclear how the models perform for unseen data, and what is required to adapt them to different demographic populations. In this retrospective study,...

Full description

Saved in:

Bibliographic Details
Published in:	Computers in biology and medicine Vol. 161; p. 107023
Main Authors:	Isosalo, A., Inkinen, S.I., Turunen, T., Ipatti, P.S., Reponen, J., Nieminen, M.T.
Format:	Journal Article
Language:	English
Published:	United States Elsevier Ltd 01-07-2023 Elsevier Limited
Subjects:	Artificial neural networks Breast cancer Breast radiology Classification Computer vision Datasets Demographics Demography DICOM Malignancy Mammography Neural networks Performance evaluation Population studies Screening Subgroups Transfer learning Computer vision Screening Mammography Breast radiology DICOM Classification
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Development of deep convolutional neural networks for breast cancer classification has taken significant steps towards clinical adoption. It is though unclear how the models perform for unseen data, and what is required to adapt them to different demographic populations. In this retrospective study, we adopt an openly available pre-trained mammography breast cancer multi-view classification model and evaluate it by utilizing an independent Finnish dataset. Transfer learning was used, and the pre-trained model was finetuned with 8,829 examinations from the Finnish dataset (4,321 normal, 362 malignant and 4,146 benign examinations). Holdout dataset with 2,208 examinations from the Finnish dataset (1,082 normal, 70 malignant and 1,056 benign examinations) was used in the evaluation. The performance was also evaluated on a manually annotated malignant suspect subset. Receiver Operating Characteristic (ROC) and Precision–Recall curves were used to performance measures. The Area Under ROC [95%CI] values for malignancy classification obtained with the finetuned model for the entire holdout set were 0.82 [0.76, 0.87], 0.84 [0.77, 0.89], 0.85 [0.79, 0.90], and 0.83 [0.76, 0.89] for R-MLO, L-MLO, R-CC and L-CC views respectively. Performance on the malignant suspect subset was slightly better. On the auxiliary benign classification task performance remained low. The results indicate that the model performs well also in an out-of-distribution setting. Finetuning allowed the model to adapt to some of the underlying local demographics. Future research should concentrate to identify breast cancer subgroups adversely affecting performance, as it is a requirement for increasing the model’s readiness level for a clinical setting. •There is an overwhelming reading workload from routine mammography screening.•This reading workload can be reduced with artificial intelligence.•Developing reliable intelligent analytics require large amount of rich data.•Transfer learning can adapt classification models to different demographics.•Strong pre-trained model is crucial for finetuning with single center datasets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0010-4825 1879-0534
DOI:	10.1016/j.compbiomed.2023.107023