425Artificial Intelligence Approaches to Type 2 Diabetes Risk Prediction and Exploration of Predictive Factors

Abstract Background Major barriers exist in incorporating artificial intelligence into epidemiology, particularly in data interpretation. Thus, we examined the application of highly interpretable machine-learning methods— Random Forest (RF) and Sparse Logistic Regression (SLR)— to a large-scale heal...

Full description

Saved in:
Bibliographic Details
Published in:International journal of epidemiology Vol. 50; no. Supplement_1
Main Authors: Ooka, Tadao, Yokomichi, Hiroshi, Yamagata, Zentaro
Format: Journal Article
Language:English
Published: 01-09-2021
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Major barriers exist in incorporating artificial intelligence into epidemiology, particularly in data interpretation. Thus, we examined the application of highly interpretable machine-learning methods— Random Forest (RF) and Sparse Logistic Regression (SLR)— to a large-scale health check-up dataset, examining the advantages of creating prediction models using these. Methods This study involved 392,791 participants who underwent healthcare checkups in Japan from 1999 to 2018. Participants who received diabetes treatment, or had an HbA1c level of 6.5% or higher, were excluded. The objective variable examined was type 2 diabetes onset over five years. Each prediction model was created using 26 health status items over three consecutive years. We examined three analytical methods to compare their predictive powers: RF, SLR, and a multivariate stepwise logistic regression (MSLR) as a conventional method. Variable Importance (VI) was calculated in the RF analysis, with Standard Regression Coefficients (SRC) being calculated in the SLR and MSLR analyses. Results Predictive accuracy is highest in the SLR model (AUC:0.955), followed by the RF model (AUC:0.949), and then the MSLR model (AUC:0.939). The RF model measures blood glucose, HbA1c, height, red blood cells, and aspartate transaminase with a higher predictive power. In the SLR model, HbA1c, blood glucose, systolic blood pressure, HDL-Cholesterol, and age have higher SRC. Conclusions Machine learning techniques enable more accurate diabetes risk predictions than existing methods and suggest new ways of identifying associated predictors. Key messages Applying machine-learning methods to health check-up data achieves a high accuracy in predicting type 2 diabetes while maintaining data interpretability.
ISSN:0300-5771
1464-3685
DOI:10.1093/ije/dyab168.515