Automated Identification and Measurement Extraction of Pancreatic Cystic Lesions from Free-Text Radiology Reports Using Natural Language Processing

To automatically identify a cohort of patients with pancreatic cystic lesions (PCLs) and extract PCL measurements from historical CT and MRI reports using natural language processing (NLP) and a question answering system. Institutional review board approval was obtained for this retrospective Health...

Full description

Saved in:
Bibliographic Details
Published in:Radiology. Artificial intelligence Vol. 4; no. 2; p. e210092
Main Authors: Yamashita, Rikiya, Bird, Kristen, Cheung, Philip Yue-Cheng, Decker, Johannes Hugo, Flory, Marta Nicole, Goff, Daniel, Morimoto, Linda Nayeli, Shon, Andy, Wentland, Andrew Louis, Rubin, Daniel L, Desser, Terry S
Format: Journal Article
Language:English
Published: United States Radiological Society of North America 01-03-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To automatically identify a cohort of patients with pancreatic cystic lesions (PCLs) and extract PCL measurements from historical CT and MRI reports using natural language processing (NLP) and a question answering system. Institutional review board approval was obtained for this retrospective Health Insurance Portability and Accountability Act-compliant study, and the requirement to obtain informed consent was waived. A cohort of free-text CT and MRI reports generated between January 1991 and July 2019 that covered the pancreatic region were identified. A PCL identification model was developed by modifying a rule-based information extraction model; measurement extraction was performed using a state-of-the-art question answering system. The system's performance was evaluated against radiologists' annotations. For this study, 430 426 free-text radiology reports from 199 783 unique patients were identified. The NLP model for identifying PCL was applied to 1000 test samples. The interobserver agreement between the model and two radiologists was almost perfect (Fleiss κ = 0.951), and the false-positive rate and true-positive rate were 3.0% and 98.2%, respectively, against consensus of radiologists' annotations as ground truths. The overall accuracy and Lin concordance correlation coefficient for measurement extraction were 0.958 and 0.874, respectively, against radiologists' annotations as ground truths. An NLP-based system was developed that identifies patients with PCLs and extracts measurements from a large single-institution archive of free-text radiology reports. This approach may prove valuable to study the natural history and potential risks of PCLs and can be applied to many other use cases. Informatics, Abdomen/GI, Pancreas, Cysts, Computer Applications-General (Informatics), Named Entity Recognition © RSNA, 2022See also commentary by Horii in this issue.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Author contributions: Guarantors of integrity of entire study, R.Y., J.H.D., D.G., T.S.D.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, R.Y., T.S.D.; clinical studies, K.B., P.Y.C.C., J.H.D., M.N.F., D.G., A.S., A.L.W., T.S.D.; statistical analysis, R.Y.; and manuscript editing, R.Y., P.Y.C.C., J.H.D., M.N.F., A.L.W., D.L.R., T.S.D.
ISSN:2638-6100
2638-6100
DOI:10.1148/ryai.210092