Machine learning can reliably predict malignancy of breast lesions based on clinical and ultrasonographic features

To establish a reliable machine learning model to predict malignancy in breast lesions identified by ultrasound (US) and optimize the negative predictive value to minimize unnecessary biopsies. We included clinical and ultrasonographic attributes from 1526 breast lesions classified as BI-RADS 3, 4a,...

Full description

Saved in:

Bibliographic Details
Published in:	Breast cancer research and treatment
Main Authors:	Buzatto, I P C, Recife, S A, Miguel, L, Bonini, R M, Onari, N, Faim, A L P A, Silvestre, L, Carlotti, D P, Fröhlich, A, Tiezzi, D G
Format:	Journal Article
Language:	English
Published:	Netherlands 13-07-2024
Subjects:	Artificial intelligence Breast ultrasound Breast biopsy Machine learning Prediction
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	To establish a reliable machine learning model to predict malignancy in breast lesions identified by ultrasound (US) and optimize the negative predictive value to minimize unnecessary biopsies. We included clinical and ultrasonographic attributes from 1526 breast lesions classified as BI-RADS 3, 4a, 4b, 4c, 5, and 6 that underwent US-guided breast biopsy in four institutions. We selected the most informative attributes to train nine machine learning models, ensemble models and models with tuned threshold to make inferences about the diagnosis of BI-RADS 4a and 4b lesions (validation dataset). We tested the performance of the final model with 403 new suspicious lesions. The most informative attributes were shape, margin, orientation and size of the lesions, the resistance index of the internal vessel, the age of the patient and the presence of a palpable lump. The highest mean negative predictive value (NPV) was achieved with the K-Nearest Neighbors algorithm (97.9%). Making ensembles did not improve the performance. Tuning the threshold did improve the performance of the models and we chose the algorithm XGBoost with the tuned threshold as the final one. The tested performance of the final model was: NPV 98.1%, false negative 1.9%, positive predictive value 77.1%, false positive 22.9%. Applying this final model, we would have missed 2 of the 231 malignant lesions of the test dataset (0.8%). Machine learning can help physicians predict malignancy in suspicious breast lesions identified by the US. Our final model would be able to avoid 60.4% of the biopsies in benign lesions missing less than 1% of the cancer cases.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0167-6806 1573-7217 1573-7217
DOI:	10.1007/s10549-024-07429-0