Liu regression after random forest for prediction and modeling in high dimension
In the modern era, using advanced technology, we have access to data with many features, and therefore, feature engineering has become a vital task in data analysis. One of the challenges in model estimation is to combat multicollinearity in high‐dimensional data problems where the number of feature...
Saved in:
Published in: | Journal of chemometrics Vol. 36; no. 4 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
Chichester
Wiley Subscription Services, Inc
01-04-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the modern era, using advanced technology, we have access to data with many features, and therefore, feature engineering has become a vital task in data analysis. One of the challenges in model estimation is to combat multicollinearity in high‐dimensional data problems where the number of features (
p) exceeds the number of samples
n. We propose a novel, yet simple, strategy to estimate the regression parameters in a high‐dimensional regime in the presence of multicollinearity. The proposed approach enjoys the good properties of the random forest and the simple structure of a class of linear unified estimators. We give a fast and straightforward algorithm to estimate the regression coefficients when
p>n and multicollinearity exist. Numerical investigation reveals the superior performance of the method in test mean squared error. The technique is also applied to melting chemical data, where we conducted an estimation among 4885 features and discussed advantages.
One of the challenges in model estimation is to combat multicollinearity in high‐dimensional data problems where the number of features exceeds the number of samples. We propose a novel strategy to estimate the regression parameters in a high‐dimensional regime in the presence of multicollinearity. The proposed approach enjoys the good properties of the random forest and the simple structure of a class of linear unified estimators. Numerical investigation reveals the superior performance of the method in test mean squared error. |
---|---|
Bibliography: | Funding information Ferdowsi University of Mashhad, Grant/Award Number: N.2/56535 |
ISSN: | 0886-9383 1099-128X |
DOI: | 10.1002/cem.3393 |