425Artificial Intelligence Approaches to Type 2 Diabetes Risk Prediction and Exploration of Predictive Factors
Abstract Background Major barriers exist in incorporating artificial intelligence into epidemiology, particularly in data interpretation. Thus, we examined the application of highly interpretable machine-learning methods— Random Forest (RF) and Sparse Logistic Regression (SLR)— to a large-scale heal...
Saved in:
Published in: | International journal of epidemiology Vol. 50; no. Supplement_1 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
01-09-2021
|
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract
Background
Major barriers exist in incorporating artificial intelligence into epidemiology, particularly in data interpretation. Thus, we examined the application of highly interpretable machine-learning methods— Random Forest (RF) and Sparse Logistic Regression (SLR)— to a large-scale health check-up dataset, examining the advantages of creating prediction models using these.
Methods
This study involved 392,791 participants who underwent healthcare checkups in Japan from 1999 to 2018. Participants who received diabetes treatment, or had an HbA1c level of 6.5% or higher, were excluded. The objective variable examined was type 2 diabetes onset over five years. Each prediction model was created using 26 health status items over three consecutive years. We examined three analytical methods to compare their predictive powers: RF, SLR, and a multivariate stepwise logistic regression (MSLR) as a conventional method. Variable Importance (VI) was calculated in the RF analysis, with Standard Regression Coefficients (SRC) being calculated in the SLR and MSLR analyses.
Results
Predictive accuracy is highest in the SLR model (AUC:0.955), followed by the RF model (AUC:0.949), and then the MSLR model (AUC:0.939). The RF model measures blood glucose, HbA1c, height, red blood cells, and aspartate transaminase with a higher predictive power. In the SLR model, HbA1c, blood glucose, systolic blood pressure, HDL-Cholesterol, and age have higher SRC.
Conclusions
Machine learning techniques enable more accurate diabetes risk predictions than existing methods and suggest new ways of identifying associated predictors.
Key messages
Applying machine-learning methods to health check-up data achieves a high accuracy in predicting type 2 diabetes while maintaining data interpretability. |
---|---|
ISSN: | 0300-5771 1464-3685 |
DOI: | 10.1093/ije/dyab168.515 |