Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation

This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality para...

Full description

Saved in:
Bibliographic Details
Published in:Journal of chemometrics Vol. 29; no. 10; pp. 528 - 536
Main Authors: Farrés, Mireia, Platikanov, Stefan, Tsakovski, Stefan, Tauler, Romà
Format: Journal Article
Language:English
Published: Chichester Blackwell Publishing Ltd 01-10-2015
Wiley Subscription Services, Inc
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study compares the application of two variable selection methods in partial least squares regression (PLSR), the variable importance in projection (VIP) method and the selectivity ratio (SR) method. For this purpose, three different data sets were analysed: (a) physiochemical water quality parameters related to sensorial data, (b) gas chromatography–mass spectrometry (GC‐MS) chemical (organic compound) profiles from fossil sea sediment samples related to sea surface temperature (SST) changes, and (c) exposed genes of Daphnia magna female samples related to their total offspring production. Correlation coefficients (r), levels of significance (p‐value) and interpretation of the underlying experimental phenomena allowed the discussion about the best approach for variable selection in each case. The comparison of the two variable selection methods in the first water quality data set showed that the SR method is more accurate for sensorial prediction. For the climate data set, when raw total ion current (TIC) GC‐MS chromatograms were considered, variables selected using the VIP method were easier to interpret compared with those selected by the SR method. However, when only some chromatographic peak areas (concentrations) were considered, the SR method was more efficient for prediction, and the VIP method selected the most relevant variables for the interpretation of SST changes. Finally, for the transcriptomic data set, the SR method was found again to be more reliable for prediction purposes. Copyright © 2015 John Wiley & Sons, Ltd. VIP and SR variable selection methods in partial least squares regression were compared in three different data sets. VIP method was more reliable than the SR method for raw large chromatographic data sets, but for other types of preprocessed or transformed data sets both methods detected efficiently the most relevant variables. SR method was more accurate for prediction purposes, and VIP method was more reliable for the interpretation of the underlying experimental phenomena.
Bibliography:Supporting info item
ark:/67375/WNG-HD94WM6F-F
ArticleID:CEM2736
istex:148772FAF4DD19AAD1B7CE6E29286AC8B539E8A1
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0886-9383
1099-128X
DOI:10.1002/cem.2736