Comparing the accuracy of four machine learning models in predicting type 2 diabetes onset within the Chinese population: a retrospective study

Objective To evaluate the effectiveness of machine learning (ML) models in predicting 5-year type 2 diabetes mellitus (T2DM) risk within the Chinese population by retrospectively analyzing annual health checkup records. Methods We included 46,247 patients (32,372 and 13,875 in training and validatio...

Full description

Saved in:
Bibliographic Details
Published in:Journal of international medical research Vol. 52; no. 6; p. 3000605241253786
Main Authors: Liu, Hongzhou, Dong, Song, Yang, Hua, Wang, Linlin, Liu, Jia, Du, Yangfan, Liu, Jing, Lyu, Zhaohui, Wang, Yuhan, Jiang, Li, Yu, Shasha, Fu, Xiaomin
Format: Journal Article
Language:English
Published: London, England SAGE Publications 01-06-2024
Sage Publications Ltd
SAGE Publishing
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objective To evaluate the effectiveness of machine learning (ML) models in predicting 5-year type 2 diabetes mellitus (T2DM) risk within the Chinese population by retrospectively analyzing annual health checkup records. Methods We included 46,247 patients (32,372 and 13,875 in training and validation sets, respectively) from a national health checkup center database. Univariate and multivariate Cox analyses were performed to identify factors influencing T2DM risk. Extreme Gradient Boosting (XGBoost), support vector machine (SVM), logistic regression (LR), and random forest (RF) models were trained to predict 5-year T2DM risk. Model performances were analyzed using receiver operating characteristic (ROC) curves for discrimination and calibration plots for prediction accuracy. Results Key variables included fasting plasma glucose, age, and sedentary time. The LR model showed good accuracy with respective areas under the ROC (AUCs) of 0.914 and 0.913 in training and validation sets; the RF model exhibited favorable AUCs of 0.998 and 0.838. In calibration analysis, the LR model displayed good fit for low-risk patients; the RF model exhibited satisfactory fit for low- and high-risk patients. Conclusions LR and RF models can effectively predict T2DM risk in the Chinese population. These models may help identify high-risk patients and guide interventions to prevent complications and disabilities.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
These authors contributed equally to this work.
ISSN:0300-0605
1473-2300
1473-2300
DOI:10.1177/03000605241253786