Are Generative Pretrained Transformer 4 Responses to Developmental Dysplasia of the Hip Clinical Scenarios Universal? An International Review

There is increasing interest in applying artificial intelligence chatbots like generative pretrained transformer 4 (GPT-4) in the medical field. This study aimed to explore the universality of GPT-4 responses to simulated clinical scenarios of developmental dysplasia of the hip (DDH) across diverse...

Full description

Saved in:
Bibliographic Details
Published in:Journal of pediatric orthopaedics Vol. 44; no. 6; pp. e504 - e511
Main Authors: Luo, Shaoting, Canavese, Federico, Aroojis, Alaric, Andreacchio, Antonio, Anticevic, Darko, Bouchard, Maryse, Castaneda, Pablo, De Rosa, Vincenzo, Fiogbe, Michel Armand, Frick, Steven L, Hui, James H, Johari, Ashok N, Loro, Antonio, Lyu, Xuemin, Matsushita, Masaki, Omeroglu, Hakan, Roye, David P, Shah, Maulin M, Yong, Bicheng, Li, Lianyong
Format: Journal Article
Language:English
Published: United States 01-07-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:There is increasing interest in applying artificial intelligence chatbots like generative pretrained transformer 4 (GPT-4) in the medical field. This study aimed to explore the universality of GPT-4 responses to simulated clinical scenarios of developmental dysplasia of the hip (DDH) across diverse global settings. Seventeen international experts with more than 15 years of experience in pediatric orthopaedics were selected for the evaluation panel. Eight simulated DDH clinical scenarios were created, covering 4 key areas: (1) initial evaluation and diagnosis, (2) initial examination and treatment, (3) nursing care and follow-up, and (4) prognosis and rehabilitation planning. Each scenario was completed independently in a new GPT-4 session. Interrater reliability was assessed using Fleiss kappa, and the quality, relevance, and applicability of GPT-4 responses were analyzed using median scores and interquartile ranges. Following scoring, experts met in ZOOM sessions to generate Regional Consensus Assessment Scores, which were intended to represent a consistent regional assessment of the use of the GPT-4 in pediatric orthopaedic care. GPT-4's responses to the 8 clinical DDH scenarios received performance scores ranging from 44.3% to 98.9% of the 88-point maximum. The Fleiss kappa statistic of 0.113 ( P = 0.001) indicated low agreement among experts in their ratings. When assessing the responses' quality, relevance, and applicability, the median scores were 3, with interquartile ranges of 3 to 4, 3 to 4, and 2 to 3, respectively. Significant differences were noted in the prognosis and rehabilitation domain scores ( P < 0.05 for all). Regional consensus scores were 75 for Africa, 74 for Asia, 73 for India, 80 for Europe, and 65 for North America, with the Kruskal-Wallis test highlighting significant disparities between these regions ( P = 0.034). This study demonstrates the promise of GPT-4 in pediatric orthopaedic care, particularly in supporting preliminary DDH assessments and guiding treatment strategies for specialist care. However, effective integration of GPT-4 into clinical practice will require adaptation to specific regional health care contexts, highlighting the importance of a nuanced approach to health technology adaptation. Level IV.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
ObjectType-Review-3
content type line 23
ISSN:0271-6798
1539-2570
1539-2570
DOI:10.1097/BPO.0000000000002682