An Evaluation of Four Resampling Methods Used in Machine Learning Classification

This article investigates resampling methods used to evaluate the performance of machine learning classification algorithms. It compares four key resampling methods: 1) Monte Carlo resampling, 2) the Bootstrap Method, 3) k-fold Cross Validation, and 4) Repeated k-fold Cross Validation. Two classific...

Full description

Saved in:
Bibliographic Details
Published in:IEEE intelligent systems Vol. 36; no. 3; pp. 51 - 57
Main Author: Nakatsu, Robbie T.
Format: Journal Article
Language:English
Published: Los Alamitos IEEE 01-05-2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This article investigates resampling methods used to evaluate the performance of machine learning classification algorithms. It compares four key resampling methods: 1) Monte Carlo resampling, 2) the Bootstrap Method, 3) k-fold Cross Validation, and 4) Repeated k-fold Cross Validation. Two classification algorithms, Support Vector Machines and Random Forests, applied to three datasets, are used in this article. Nine variations of the four resampling methods are used to tune parameters on the two classification algorithms on each of the three datasets. Performance is defined by how well the resampling method chooses a parameter value that fits the data well. A main finding is that Repeated k-fold Cross Validation, overall, outperforms the other resampling methods in selecting the best-fit parameter value across the three different datasets.
ISSN:1541-1672
1941-1294
DOI:10.1109/MIS.2020.2978066