Sparse partial least squares (SPLS) is widely used in applied sciences as a method that performs dimension reduction and variable selection simultaneously in linear regression. Several implementations of SPLS have been derived, among which the SPLS proposed in Chun and Keleş (J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 (2010) 3–25) is very popular and highly cited. However, for all of these implementations, the theoretical properties of SPLS are largely unknown. In this paper, we propose a new version of SPLS, called the envelope-based SPLS, using a connection between envelope models and partial least squares (PLS). We establish the consistency, oracle property and asymptotic normality of the envelope-based SPLS estimator. The large-sample scenario and high-dimensional scenario are both considered. We also develop the envelope-based SPLS estimators under the context of generalized linear models, and discuss its theoretical properties including consistency, oracle property and asymptotic distribution. Numerical experiments and examples show that the envelope-based SPLS estimator has better variable selection and prediction performance over the SPLS estimator (J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 (2010) 3–25).
Ann. Statist.
48(1):
161-182
(February 2020).
DOI: 10.1214/18-AOS1796
Agresti, A. (2013). Categorical Data Analysis, 3rd ed. Wiley Series in Probability and Statistics. Wiley Interscience, Hoboken, NJ. 1281.62022Agresti, A. (2013). Categorical Data Analysis, 3rd ed. Wiley Series in Probability and Statistics. Wiley Interscience, Hoboken, NJ. 1281.62022
Chen, L. and Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Amer. Statist. Assoc. 107 1533–1545. 1258.62075 10.1080/01621459.2012.734178Chen, L. and Huang, J. Z. (2012). Sparse reduced-rank regression for simultaneous dimension reduction and variable selection. J. Amer. Statist. Assoc. 107 1533–1545. 1258.62075 10.1080/01621459.2012.734178
Chen, X., Zou, C. and Cook, R. D. (2010). Coordinate-independent sparse sufficient dimension reduction and variable selection. Ann. Statist. 38 3696–3723. 1204.62107 10.1214/10-AOS826 euclid.aos/1291126970Chen, X., Zou, C. and Cook, R. D. (2010). Coordinate-independent sparse sufficient dimension reduction and variable selection. Ann. Statist. 38 3696–3723. 1204.62107 10.1214/10-AOS826 euclid.aos/1291126970
Chun, H. and Keleş, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 3–25. 1411.62184 10.1111/j.1467-9868.2009.00723.xChun, H. and Keleş, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 3–25. 1411.62184 10.1111/j.1467-9868.2009.00723.x
Chung, D. and Keleş, S. (2010). Sparse partial least squares classification for high dimensional data. Stat. Appl. Genet. Mol. Biol. 9 Art. 17, 32. 1304.92041 10.2202/1544-6115.1492Chung, D. and Keleş, S. (2010). Sparse partial least squares classification for high dimensional data. Stat. Appl. Genet. Mol. Biol. 9 Art. 17, 32. 1304.92041 10.2202/1544-6115.1492
Cook, R. D., Forzani, L. and Su, Z. (2016). A note on fast envelope estimation. J. Multivariate Anal. 150 42–54. 1345.62082 10.1016/j.jmva.2016.05.006Cook, R. D., Forzani, L. and Su, Z. (2016). A note on fast envelope estimation. J. Multivariate Anal. 150 42–54. 1345.62082 10.1016/j.jmva.2016.05.006
Cook, R. D., Helland, I. S. and Su, Z. (2013). Envelopes and partial least squares regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 851–877. 1411.62137 10.1111/rssb.12018Cook, R. D., Helland, I. S. and Su, Z. (2013). Envelopes and partial least squares regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 851–877. 1411.62137 10.1111/rssb.12018
Cook, R. D., Li, B. and Chiaromonte, F. (2010). Envelope models for parsimonious and efficient multivariate linear regression. Statist. Sinica 20 927–1010. 1259.62059Cook, R. D., Li, B. and Chiaromonte, F. (2010). Envelope models for parsimonious and efficient multivariate linear regression. Statist. Sinica 20 927–1010. 1259.62059
Cook, R. D. and Zhang, X. (2015). Foundations for envelope models and methods. J. Amer. Statist. Assoc. 110 599–611. 1390.62131 10.1080/01621459.2014.983235Cook, R. D. and Zhang, X. (2015). Foundations for envelope models and methods. J. Amer. Statist. Assoc. 110 599–611. 1390.62131 10.1080/01621459.2014.983235
Diaconis, P. and Freedman, D. (1984). Asymptotics of graphical projection pursuit. Ann. Statist. 12 793–815. 0559.62002 10.1214/aos/1176346703 euclid.aos/1176346703Diaconis, P. and Freedman, D. (1984). Asymptotics of graphical projection pursuit. Ann. Statist. 12 793–815. 0559.62002 10.1214/aos/1176346703 euclid.aos/1176346703
Huang, X., Pan, W., Park, S., Han, X., Miller, L. W. and Hall, J. (2004). Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares. Bioinformatics 20 888–894.Huang, X., Pan, W., Park, S., Han, X., Miller, L. W. and Hall, J. (2004). Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares. Bioinformatics 20 888–894.
Karabulut, E. M. and Ibrikci, T. (2014). Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. J. Med. Syst. 38 1–9.Karabulut, E. M. and Ibrikci, T. (2014). Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. J. Med. Syst. 38 1–9.
Khare, K., Oh, S.-Y. and Rajaratnam, B. (2015). A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77 803–825. 1414.62183 10.1111/rssb.12088Khare, K., Oh, S.-Y. and Rajaratnam, B. (2015). A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77 803–825. 1414.62183 10.1111/rssb.12088
Khare, K., Pal, S. and Su, Z. (2017). A Bayesian approach for envelope models. Ann. Statist. 45 196–222. 1367.62174 10.1214/16-AOS1449 euclid.aos/1487667621Khare, K., Pal, S. and Su, Z. (2017). A Bayesian approach for envelope models. Ann. Statist. 45 196–222. 1367.62174 10.1214/16-AOS1449 euclid.aos/1487667621
Lê Cao, K.-A., Rossouw, D., Robert-Granié, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating omics data. Stat. Appl. Genet. Mol. Biol. 7 Art. 35, 31. 1276.62061Lê Cao, K.-A., Rossouw, D., Robert-Granié, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating omics data. Stat. Appl. Genet. Mol. Biol. 7 Art. 35, 31. 1276.62061
Lee, D., Lee, W., Lee, Y. and Pawitan, Y. (2011). Sparse partial least-squares regression and its applications to high-throughput data analysis. Chemom. Intell. Lab. Syst. 109 1–8.Lee, D., Lee, W., Lee, Y. and Pawitan, Y. (2011). Sparse partial least-squares regression and its applications to high-throughput data analysis. Chemom. Intell. Lab. Syst. 109 1–8.
Ma, Y. and Zhu, L. (2013). Efficiency loss and the linearity condition in dimension reduction. Biometrika 100 371–383. 1284.62262 10.1093/biomet/ass075Ma, Y. and Zhu, L. (2013). Efficiency loss and the linearity condition in dimension reduction. Biometrika 100 371–383. 1284.62262 10.1093/biomet/ass075
Marx, B. D. (1996). Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38 374–381. 0902.62081 10.1080/00401706.1996.10484549Marx, B. D. (1996). Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics 38 374–381. 0902.62081 10.1080/00401706.1996.10484549
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Monographs on Statistics and Applied Probability. CRC Press, London. 0744.62098McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Monographs on Statistics and Applied Probability. CRC Press, London. 0744.62098
Park, P. J., Tian, L. and Kohane, I. S. (2002). Linking gene expression data with patient survival times using partial least squares. Bioinformatics 18 S120–S127.Park, P. J., Tian, L. and Kohane, I. S. (2002). Linking gene expression data with patient survival times using partial least squares. Bioinformatics 18 S120–S127.
Peng, J., Wang, P., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746. 1388.62046 10.1198/jasa.2009.0126Peng, J., Wang, P., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746. 1388.62046 10.1198/jasa.2009.0126
Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515. 1320.62135 10.1214/08-EJS176Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515. 1320.62135 10.1214/08-EJS176
Su, Z. and Cook, R. D. (2011). Partial envelopes for efficient estimation in multivariate linear regression. Biometrika 98 133–146. 1214.62062 10.1093/biomet/asq063Su, Z. and Cook, R. D. (2011). Partial envelopes for efficient estimation in multivariate linear regression. Biometrika 98 133–146. 1214.62062 10.1093/biomet/asq063
Su, Z. and Cook, D. (2012). Inner envelopes: Efficient estimation in multivariate linear regression. Biometrika 99 687–702. 06085163 10.1093/biomet/ass024Su, Z. and Cook, D. (2012). Inner envelopes: Efficient estimation in multivariate linear regression. Biometrika 99 687–702. 06085163 10.1093/biomet/ass024
Su, Z., Zhu, G., Chen, X. and Yang, Y. (2016). Sparse envelope model: Efficient estimation and response variable selection in multivariate linear regression. Biometrika 103 579–593. 07072139 10.1093/biomet/asw036Su, Z., Zhu, G., Chen, X. and Yang, Y. (2016). Sparse envelope model: Efficient estimation and response variable selection in multivariate linear regression. Biometrika 103 579–593. 07072139 10.1093/biomet/asw036
Wold, H. (1966). Estimation of principal components and related models by iterative least squares. In Multivariate Analysis (Proc. Internat. Sympos., Dayton, Ohio, 1965) 391–420. Academic Press, New York.Wold, H. (1966). Estimation of principal components and related models by iterative least squares. In Multivariate Analysis (Proc. Internat. Sympos., Dayton, Ohio, 1965) 391–420. Academic Press, New York.
Wold, H. (1975). Path Models with Latent Variables: The NIPALS Approach. Academic Press, Cambridge. 0331.62058Wold, H. (1975). Path Models with Latent Variables: The NIPALS Approach. Academic Press, Cambridge. 0331.62058
Zhang, T. and Zou, H. (2014). Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika 101 103–120. 1285.62063 10.1093/biomet/ast059Zhang, T. and Zou, H. (2014). Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika 101 103–120. 1285.62063 10.1093/biomet/ast059
Zhu, G. and Su, Z. (2019). Supplement to “Envelope-based sparse partial least squares.” https://doi.org/10.1214/18-AOS1796SUPP.Zhu, G. and Su, Z. (2019). Supplement to “Envelope-based sparse partial least squares.” https://doi.org/10.1214/18-AOS1796SUPP.
Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429. 1171.62326 10.1198/016214506000000735Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429. 1171.62326 10.1198/016214506000000735