Using machine-learning approaches to predict non-participation in a nationwide general health check-up scheme

•Our predictive models applying machine learning methods were able to identify non-participants more precisely than heuristic method.•The present study revealed the important variables for prediction of participation in general health check-up.•The knowledge added by the present study will improve a...

Full description

Saved in:

Bibliographic Details
Published in:	Computer methods and programs in biomedicine Vol. 163; pp. 39 - 46
Main Authors:	Shimoda, Akihiro, Ichikawa, Daisuke, Oyama, Hiroshi
Format:	Journal Article
Language:	English
Published:	Ireland Elsevier B.V 01-09-2018
Subjects:	Health check-up Machine-learning Prediction Segmentation Health check-up Segmentation Machine-learning Prediction
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•Our predictive models applying machine learning methods were able to identify non-participants more precisely than heuristic method.•The present study revealed the important variables for prediction of participation in general health check-up.•The knowledge added by the present study will improve appropriate targeting of non-participants. In the time since the launch of a nationwide general health check-up and instruction program in Japan in 2008, interest in the formulation of an effective and efficient strategy to improve the participation rate has been growing. The aim of this study was to develop and evaluate models identifying those who are unlikely to undergo general health check-ups. We used machine-learning methods to select interventional targets more efficiently. We used information from a local government database of Japan. The study population included 7290 individuals aged 40–74 years who underwent at least one general health check-up between 2012 and 2015. We developed four predictive models based on the extreme gradient boosting (XGBoost), random forest (RF), support vector machines (SVMs), and logistic regression (LR) algorithms, using machine-learning techniques, and compared the areas under the curves (AUCs) of the models with those of the heuristic method (which presumes that the individuals who underwent a general health check-up in the previous year will do so again in the following year). The AUCs for the XGBoost, RF, SVMs, LR, and heuristic models/method were 0.829 (95% confidence interval [CI]: 0.806–0.853), 0.821 (95% CI: 0.797–0.845), 0.812 (95% CI: 0.787–0.837), 0.816 (95% CI: 0.791–0.841), and 0.683 (95% CI: 0.657–0.708), respectively. XGBoost model exhibited the best AUC, and the performance was significantly better than that of SVMs (p = 0.034), LR (p = 0.017), and heuristic method (p < 0.001). However, the performance of XGBoost did not differ significantly from that of RF (p = 0.229). Predictive models using machine-learning techniques outperformed the existing heuristic method when used to predict participation in a general health check-up system by eligible participants.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0169-2607 1872-7565
DOI:	10.1016/j.cmpb.2018.05.032