Liu regression after random forest for prediction and modeling in high dimension

In the modern era, using advanced technology, we have access to data with many features, and therefore, feature engineering has become a vital task in data analysis. One of the challenges in model estimation is to combat multicollinearity in high‐dimensional data problems where the number of feature...

Full description

Saved in:
Bibliographic Details
Published in:Journal of chemometrics Vol. 36; no. 4
Main Authors: Arashi, Mohammad, Lukman, Adewale F., Algamal, Zakariya Y.
Format: Journal Article
Language:English
Published: Chichester Wiley Subscription Services, Inc 01-04-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the modern era, using advanced technology, we have access to data with many features, and therefore, feature engineering has become a vital task in data analysis. One of the challenges in model estimation is to combat multicollinearity in high‐dimensional data problems where the number of features ( p) exceeds the number of samples n. We propose a novel, yet simple, strategy to estimate the regression parameters in a high‐dimensional regime in the presence of multicollinearity. The proposed approach enjoys the good properties of the random forest and the simple structure of a class of linear unified estimators. We give a fast and straightforward algorithm to estimate the regression coefficients when p>n and multicollinearity exist. Numerical investigation reveals the superior performance of the method in test mean squared error. The technique is also applied to melting chemical data, where we conducted an estimation among 4885 features and discussed advantages. One of the challenges in model estimation is to combat multicollinearity in high‐dimensional data problems where the number of features exceeds the number of samples. We propose a novel strategy to estimate the regression parameters in a high‐dimensional regime in the presence of multicollinearity. The proposed approach enjoys the good properties of the random forest and the simple structure of a class of linear unified estimators. Numerical investigation reveals the superior performance of the method in test mean squared error.
Bibliography:Funding information
Ferdowsi University of Mashhad, Grant/Award Number: N.2/56535
ISSN:0886-9383
1099-128X
DOI:10.1002/cem.3393