Accuracy of a Proprietary Large Language Model in Labeling Obstetric Incident Reports
Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early ide...
Saved in:
Published in: | Joint Commission journal on quality and patient safety Vol. 50; no. 12; pp. 877 - 881 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Netherlands
Elsevier Inc
06-08-2024
|
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early identification of opportunities to prevent patient harm. This study assessed the capability of a proprietary LLM (GPT-3.5) to automatically label a cross-sectional sample of real-world obstetric incident reports.
A sample of 370 incident reports submitted to inpatient obstetric units between December 2022 and May 2023 was extracted. Human-annotated labels were assigned by a clinician reviewer and considered gold standard. The LLM was prompted to label incident reports relying solely on its pretrained knowledge and information included in the prompt. Primary outcomes assessed were sensitivity, specificity, positive predictive value, and negative predictive value. A secondary outcome assessed the human-perceived quality of the model's justification for the label(s) applied.
The LLM demonstrated the ability to label incident reports with high sensitivity and specificity. The model applied a total of 79 labels compared to the reviewer's 49 labels. Overall sensitivity for the model was 85.7%, and specificity was 97.9%. Positive and negative predictive values were 53.2% and 99.6%, respectively. For 60.8% of labels, the reviewer approved of the model's justification for applying the label.
The proprietary LLM demonstrated the ability to label obstetric incident reports with high sensitivity and specificity. LLMs offer the potential to enable more efficient use of data from incident reporting systems. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1553-7250 1938-131X 1938-131X |
DOI: | 10.1016/j.jcjq.2024.08.001 |