Robustness of XGBoost Algorithm to Missing Features for Binary Classification of Medical Data
The ability of the Extreme Gradient Boosting (XG-Boost) algorithm to classify subjects with different type of breast cancer and those with and without heart disease is explored by artificially imputing fractions of missing values on two diverse medical datasets. Likewise, substitution of missing val...
Saved in:
Published in: | 2024 23rd International Symposium INFOTEH-JAHORINA (INFOTEH) pp. 1 - 6 |
---|---|
Main Authors: | , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
20-03-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The ability of the Extreme Gradient Boosting (XG-Boost) algorithm to classify subjects with different type of breast cancer and those with and without heart disease is explored by artificially imputing fractions of missing values on two diverse medical datasets. Likewise, substitution of missing values by mean and median is considered, as well as influence of learning rate on XGBoost model. Our results indicate that XGBoost algorithm can handle missing data internally yielding slightly better or slightly worse results (<2.5%) when simple imputation is applied. Further, learning rate hyperparameter did not display a major influence on the classifier performance. Expectantly, XGBoost reaches different evaluation scores for diverse datasets, while F1-Score is not severely affected by missing values: for 90% of missing instances it dropped to 94.1% and to 73% for breast cancer and heart failure data, respectively. We would argue that XGBoost with lower learning rates may be a good choice for classification of medical data, especially in cases when missing values are inevitable. |
---|---|
ISSN: | 2767-9470 |
DOI: | 10.1109/INFOTEH60418.2024.10495929 |