Parallel Implementation of Random Forest using MPI

Computer vision is a quickly rising field in machine learning. It has many use cases and implementations across various industries. Powerful classifiers can be trained to recognize visual patterns and detect people and objects. However, since these models use images as inputs, they often end up bein...

Full description

Saved in:
Bibliographic Details
Published in:2023 4th International Conference on Data Analytics for Business and Industry (ICDABI) pp. 518 - 522
Main Authors: Allam, Zyad Atef, Shajera, Fatehan Tael, Majed, Reem Jamal, Alqaddoumi, Abdulla
Format: Conference Proceeding
Language:English
Published: IEEE 25-10-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Computer vision is a quickly rising field in machine learning. It has many use cases and implementations across various industries. Powerful classifiers can be trained to recognize visual patterns and detect people and objects. However, since these models use images as inputs, they often end up being trained with very complex architectures, with a high number of epochs and very large data sizes. This often results in very long training times, especially for weaker machines. This study investigated the use of parallel processing to speed up machine learning model training. Message-passing Interface (MPI) is used to parallelize the training of Random Forest, achieving speed-ups of up to 5 and an efficiency of up to 1.1. The study concludes that when dealing with big and complex data and building complex models, parallel processing can significantly reduce the training time of the model with minimal loss in accuracy.
DOI:10.1109/ICDABI60145.2023.10629534