An Improved Machine Learning-Based Employees Attrition Prediction Framework with Emphasis on Feature Selection

Companies always seek ways to make their professional employees stay with them to reduce extra recruiting and training costs. Predicting whether a particular employee may leave or not will help the company to make preventive decisions. Unlike physical systems, human resource problems cannot be descr...

Full description

Saved in:

Bibliographic Details
Published in:	Mathematics (Basel) Vol. 9; no. 11; p. 1226
Main Authors:	Najafi-Zangeneh, Saeed, Shams-Gharneh, Naser, Arjomandi-Nezhad, Ali, Hashemkhani Zolfani, Sarfaraz
Format:	Journal Article
Language:	English
Published:	Basel MDPI AG 01-06-2021
Subjects:	Algorithms attrition prediction bootstrap Datasets Discriminant analysis Employees Feature selection human resource management Human resources logistic regression Machine learning Mean square errors Methods Parameters Post-production processing Principal components analysis Regression models Standard deviation Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Companies always seek ways to make their professional employees stay with them to reduce extra recruiting and training costs. Predicting whether a particular employee may leave or not will help the company to make preventive decisions. Unlike physical systems, human resource problems cannot be described by a scientific-analytical formula. Therefore, machine learning approaches are the best tools for this aim. This paper presents a three-stage (pre-processing, processing, post-processing) framework for attrition prediction. An IBM HR dataset is chosen as the case study. Since there are several features in the dataset, the “max-out” feature selection method is proposed for dimension reduction in the pre-processing stage. This method is implemented for the IBM HR dataset. The coefficient of each feature in the logistic regression model shows the importance of the feature in attrition prediction. The results show improvement in the F1-score performance measure due to the “max-out” feature selection method. Finally, the validity of parameters is checked by training the model for multiple bootstrap datasets. Then, the average and standard deviation of parameters are analyzed to check the confidence value of the model’s parameters and their stability. The small standard deviation of parameters indicates that the model is stable and is more likely to generalize well.
ISSN:	2227-7390 2227-7390
DOI:	10.3390/math9111226