Robust Linear Model Selection Based on Least Angle Regression

In this article we consider the problem of building a linear prediction model when the number of candidate predictors is large and the data possibly contain anomalies that are difficult to visualize and clean. We want to predict the nonoutlying cases; therefore, we need a method that is simultaneous...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the American Statistical Association Vol. 102; no. 480; pp. 1289 - 1299
Main Authors: Khan, Jafar A, Van Aelst, Stefan, Zamar, Ruben H
Format: Journal Article
Language:English
Published: Alexandria, VA Taylor & Francis 01-12-2007
American Statistical Association
Taylor & Francis Ltd
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this article we consider the problem of building a linear prediction model when the number of candidate predictors is large and the data possibly contain anomalies that are difficult to visualize and clean. We want to predict the nonoutlying cases; therefore, we need a method that is simultaneously robust and scalable. We consider the stepwise least angle regression (LARS) algorithm which is computationally very efficient but sensitive to outliers. We introduce two different approaches to robustify LARS. The plug-in approach replaces the classical correlations in LARS by robust correlation estimates. The cleaning approach first transforms the data set by shrinking the outliers toward the bulk of the data (which we call multivariate Winsorization) and then applies LARS to the transformed data. We show that the plug-in approach is time-efficient and scalable and that the bootstrap can be used to stabilize its results. We recommend using bootstrapped robustified LARS to sequence a number of candidate predictors to form a reduced set from which a more refined model can be selected.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0162-1459
1537-274X
DOI:10.1198/016214507000000950