Analysis of Fat Big Data Using Factor Models and Penalization Techniques: A Monte Carlo Simulation and Application

This article assesses the predictive accuracy of factor models utilizing Partial·Least·Squares (PLS) and Principal·Component·Analysis (PCA) in comparison to autometrics and penalization techniques. The simulation exercise examines three types of scenarios by introducing the issues of multicollineari...

Full description

Saved in:

Bibliographic Details
Published in:	Axioms Vol. 13; no. 7; p. 418
Main Authors:	Khan, Faridoon, Albalawi, Olayan
Format:	Journal Article
Language:	English
Published:	Basel MDPI AG 01-07-2024
Subjects:	Accuracy Autocorrelation Big Data Datasets Econometrics factor models fat big data Fines & penalties Forecasting inflation Machine learning machine learning techniques Macroeconomics Methods Monte Carlo experiments Monte Carlo method Monte Carlo simulation Network topologies Neural networks Regression analysis Root-mean-square errors Simulation methods Statistical analysis Time series Variables Pakistan
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This article assesses the predictive accuracy of factor models utilizing Partial·Least·Squares (PLS) and Principal·Component·Analysis (PCA) in comparison to autometrics and penalization techniques. The simulation exercise examines three types of scenarios by introducing the issues of multicollinearity, heteroscedasticity, and autocorrelation. The number of predictors and sample size are adjusted to observe the effects. The accuracy of the models is evaluated by calculating the Root·Mean·Square·Error (RMSE) and the Mean·Absolute·Error (MAE). In the presence of severe multicollinearity, the factor approach utilizing (PLS demonstrates exceptional performance in comparison. Autometrics achieves the lowest RMSE and MAE values across all levels of heteroscedasticity. Autometrics provides better forecasts with low and moderate autocorrelation. However, Elastic·Smoothly·Clipped·Absolute·Deviation (E-SCAD) forecasts well with severe autocorrelation. In addition to the simulation, we employ a popular Pakistani macroeconomic dataset for empirical research. The dataset contains 79 monthly variables from January 2013 to December 2020. The competing approaches perform differently compared to the simulation datasets, although “The PLS factor approach outperforms its competing approaches in forecasting, with lower RMSE and MAE”. It is more probable that the actual dataset exhibits a high degree of multicollinearity.
ISSN:	2075-1680 2075-1680
DOI:	10.3390/axioms13070418