- Statist. Surv.
- Volume 13 (2019), 119-149.
PLS for Big Data: A unified parallel algorithm for regularised group PLS
Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocks of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in the presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modeling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparse PLS methods is the link between the singular value decomposition (SVD) of a matrix (constructed from deflated versions of the original data) and least squares minimization in linear regression. We review four popular PLS methods for two blocks of data. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. We present various approaches to decrease the computation time and show how the whole procedure can be scalable to big data sets. The bigsgPLS R package implements our unified algorithm and is available at https://github.com/matt-sutton/bigsgPLS.
Statist. Surv., Volume 13 (2019), 119-149.
Received: August 2018
First available in Project Euclid: 2 September 2019
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Lafaye de Micheaux, Pierre; Liquet, Benoît; Sutton, Matthew. PLS for Big Data: A unified parallel algorithm for regularised group PLS. Statist. Surv. 13 (2019), 119--149. doi:10.1214/19-SS125. https://projecteuclid.org/euclid.ssu/1567411220