Comparative performance of intensive care mortality prediction models based on manually curated versus automatically extracted electronic health record data

•Commonly applied mortality prediction models rely on variables traditionally requiring manual curation.•The advent of EHRs has facilitated the automated collection of many patient variables.•Classical approaches to mortality prediction perform equally to novel machine learning methods that leverage...

Full description

Saved in:
Bibliographic Details
Published in:International journal of medical informatics (Shannon, Ireland) Vol. 188; p. 105477
Main Authors: Jagesar, A.R., Otten, M., Dam, T.A., Biesheuvel, L.A., Dongelmans, D.A., Brinkman, S., Thoral, P.J., François-Lavet, V., Girbes, A.R.J., de Keizer, N.F., de Grooth, H.J.S., Elbers, P.W.G.
Format: Journal Article
Language:English
Published: Ireland Elsevier B.V 01-08-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Commonly applied mortality prediction models rely on variables traditionally requiring manual curation.•The advent of EHRs has facilitated the automated collection of many patient variables.•Classical approaches to mortality prediction perform equally to novel machine learning methods that leverage the EHR.•Mortality prediction using automatically extracted EHR data provides a feasible and practical alternative to classical approaches, facilitating subsequent audit and feedback. Benchmarking intensive care units for audit and feedback is frequently based on comparing actual mortality versus predicted mortality. Traditionally, mortality prediction models rely on a limited number of input variables and significant manual data entry and curation. Using automatically extracted electronic health record data may be a promising alternative. However, adequate data on comparative performance between these approaches is currently lacking. The AmsterdamUMCdb intensive care database was used to construct a baseline APACHE IV in-hospital mortality model based on data typically available through manual data curation. Subsequently, new in-hospital mortality models were systematically developed and evaluated. New models differed with respect to the extent of automatic variable extraction, classification method, recalibration usage and the size of collection window. A total of 13 models were developed based on data from 5,077 admissions divided into a train (80%) and test (20%) cohort. Adding variables or extending collection windows only marginally improved discrimination and calibration. An XGBoost model using only automatically extracted variables, and therefore no acute or chronic diagnoses, was the best performing automated model with an AUC of 0.89 and a Brier score of 0.10. Performance of intensive care mortality prediction models based on manually curated versus automatically extracted electronic health record data is similar. Importantly, our results suggest that variables typically requiring manual curation, such as diagnosis at admission and comorbidities, may not be necessary for accurate mortality prediction. These proof-of-concept results require replication using multi-centre data.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1386-5056
1872-8243
DOI:10.1016/j.ijmedinf.2024.105477