Comparing the accuracy of four machine learning models in predicting type 2 diabetes onset within the Chinese population: a retrospective study
Objective To evaluate the effectiveness of machine learning (ML) models in predicting 5-year type 2 diabetes mellitus (T2DM) risk within the Chinese population by retrospectively analyzing annual health checkup records. Methods We included 46,247 patients (32,372 and 13,875 in training and validatio...
Saved in:
Published in: | Journal of international medical research Vol. 52; no. 6; p. 3000605241253786 |
---|---|
Main Authors: | , , , , , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
London, England
SAGE Publications
01-06-2024
Sage Publications Ltd SAGE Publishing |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Objective
To evaluate the effectiveness of machine learning (ML) models in predicting 5-year type 2 diabetes mellitus (T2DM) risk within the Chinese population by retrospectively analyzing annual health checkup records.
Methods
We included 46,247 patients (32,372 and 13,875 in training and validation sets, respectively) from a national health checkup center database. Univariate and multivariate Cox analyses were performed to identify factors influencing T2DM risk. Extreme Gradient Boosting (XGBoost), support vector machine (SVM), logistic regression (LR), and random forest (RF) models were trained to predict 5-year T2DM risk. Model performances were analyzed using receiver operating characteristic (ROC) curves for discrimination and calibration plots for prediction accuracy.
Results
Key variables included fasting plasma glucose, age, and sedentary time. The LR model showed good accuracy with respective areas under the ROC (AUCs) of 0.914 and 0.913 in training and validation sets; the RF model exhibited favorable AUCs of 0.998 and 0.838. In calibration analysis, the LR model displayed good fit for low-risk patients; the RF model exhibited satisfactory fit for low- and high-risk patients.
Conclusions
LR and RF models can effectively predict T2DM risk in the Chinese population. These models may help identify high-risk patients and guide interventions to prevent complications and disabilities. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 These authors contributed equally to this work. |
ISSN: | 0300-0605 1473-2300 1473-2300 |
DOI: | 10.1177/03000605241253786 |