Open Access
2019 PLS for Big Data: A unified parallel algorithm for regularised group PLS
Pierre Lafaye de Micheaux, Benoît Liquet, Matthew Sutton
Statist. Surv. 13: 119-149 (2019). DOI: 10.1214/19-SS125

Abstract

Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocks of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in the presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modeling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparse PLS methods is the link between the singular value decomposition (SVD) of a matrix (constructed from deflated versions of the original data) and least squares minimization in linear regression. We review four popular PLS methods for two blocks of data. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. We present various approaches to decrease the computation time and show how the whole procedure can be scalable to big data sets. The bigsgPLS R package implements our unified algorithm and is available at https://github.com/matt-sutton/bigsgPLS.

Citation

Download Citation

Pierre Lafaye de Micheaux. Benoît Liquet. Matthew Sutton. "PLS for Big Data: A unified parallel algorithm for regularised group PLS." Statist. Surv. 13 119 - 149, 2019. https://doi.org/10.1214/19-SS125

Information

Received: 1 August 2018; Published: 2019
First available in Project Euclid: 2 September 2019

zbMATH: 07104724
MathSciNet: MR3998928
Digital Object Identifier: 10.1214/19-SS125

Subjects:
Primary: 6202 , 62J99

Keywords: high dimensional data , Lasso penalties , Partial least squares , Singular value decomposition

Vol.13 • 2019
Back to Top