Variable Selection in Partial Least Squares Methods: overview and recent developments

Abstract : Recent developments in technology enable collecting a large amount of data from various sources. Moreover, many real world applications require studying relations among several groups of variables. The analysis of landscape matrices, i.e. matrices having more columns (variables, p) than rows (observations, n), is a challenging task in several domains. Two different kinds of problems arise when dealing with high dimensional data sets characterized by landscape matrices. The first refers to computational and numerical problems. The second deals with the difficulty in assessing and understanding the results. Dimension reduction seems to be a solution to solve both problems. We should distinguish between feature selection and feature extraction. The first refers to variable selection, while feature extraction aims to transform the data from high-dimensional space to low-dimensional space. Partial Least Squares (PLS) methods are classical feature extraction tools that work in the case of high-dimensional data sets. Since PLS methods do not require matrices inversion or diagonalization, they allow us to solve computational problems. However, results interpretation is still a hard problem when facing with very high-dimensional data sets. Moreover, recently Chun & Keles (2010) showed that asymptotic consistency of PLS regression estimator for the univariate case does not hold with the very large p and small n paradigm. Nowadays interest is increasing in developing new PLS methods able to be, at the same time, a feature extraction tool and a feature selection method. The first attempt to perform variable selection in univariate PLS Regression framework was presented by Bastien et al. in 2005. More recently Le Cao et al. (2008) and Chun & Keles (2010) proposed two different approaches to include variable selection in PLS Regression, based on L1 penalization (Tibshirani, 1996). In our work, we will investigate all these approaches and discuss the pros and cons. Moreover, a new version of PLS Path Modeling algorithm including variable selection will be presented.
Complete list of metadatas

https://hal-supelec.archives-ouvertes.fr/hal-00529791
Contributor : Karine El Rassi <>
Submitted on : Tuesday, October 26, 2010 - 3:44:28 PM
Last modification on : Thursday, March 29, 2018 - 11:06:05 AM
Long-term archiving on : Thursday, January 27, 2011 - 2:58:57 AM

File

ISBIS2010_Trinchera_et_al.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00529791, version 1

Collections

Citation

Laura Trinchera, Edith Le Floch, Arthur Tenenhaus. Variable Selection in Partial Least Squares Methods: overview and recent developments. International Symposium on Business and Industrial Statistics (ISBI'10), Jul 2010, Portoroz, Slovenia. pp.102. ⟨hal-00529791⟩

Share

Metrics

Record views

687

Files downloads

707