The Annals of Statistics

The sparse Laplacian shrinkage estimator for high-dimensional regression

Jian Huang, Shuangge Ma, Hongzhe Li, and Cun-Hui Zhang

Full-text: Open access

Abstract

We propose a new penalized method for variable selection and estimation that explicitly incorporates the correlation patterns among predictors. This method is based on a combination of the minimax concave penalty and Laplacian quadratic associated with a graph as the penalty function. We call it the sparse Laplacian shrinkage (SLS) method. The SLS uses the minimax concave penalty for encouraging sparsity and Laplacian quadratic penalty for promoting smoothness among coefficients associated with the correlated predictors. The SLS has a generalized grouping property with respect to the graph represented by the Laplacian quadratic. We show that the SLS possesses an oracle property in the sense that it is selection consistent and equal to the oracle Laplacian shrinkage estimator with high probability. This result holds in sparse, high-dimensional settings with pn under reasonable conditions. We derive a coordinate descent algorithm for computing the SLS estimates. Simulation studies are conducted to evaluate the performance of the SLS method and a real data example is used to illustrate its application.

Article information

Source
Ann. Statist., Volume 39, Number 4 (2011), 2021-2046.

Dates
First available in Project Euclid: 24 August 2011

Permanent link to this document
https://projecteuclid.org/euclid.aos/1314190622

Digital Object Identifier
doi:10.1214/11-AOS897

Mathematical Reviews number (MathSciNet)
MR2893860

Zentralblatt MATH identifier
1227.62049

Subjects
Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators
Secondary: 62H20: Measures of association (correlation, canonical correlation, etc.) 60F12

Keywords
Graphical structure minimax concave penalty penalized regression high-dimensional data variable selection oracle property

Citation

Huang, Jian; Ma, Shuangge; Li, Hongzhe; Zhang, Cun-Hui. The sparse Laplacian shrinkage estimator for high-dimensional regression. Ann. Statist. 39 (2011), no. 4, 2021--2046. doi:10.1214/11-AOS897. https://projecteuclid.org/euclid.aos/1314190622


Export citation

References

  • Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19 185–193.
  • Bondell, H. D. and Reich, B. J. (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64 115–123, 322–323.
  • Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression methods. Ann. Appl. Stat. 5 232–253.
  • Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • Chiang, A. P., Beck, J. S., Yen, H. J., Tayeh, M. K., Scheetz, T. E., Swiderski, R., Nishimura, D., Braun, T. A., Kim, K. Y., Huang, J., Elbedour, K., Carmi, R., Slusarski, D. C., Casavant, T. L., Stone, E. M. and Sheffield, V. C. (2006). Homozygosity mapping with SNP arrays identifies a novel gene for Bardet–Biedl Syndrome (BBS10). Proc. Natl. Acad. Sci. USA 103 6287–6292.
  • Chung, F. R. K. (1997). Spectral Graph Theory. CBMS Regional Conference Series in Mathematics 92. Conf. Board Math. Sci., Washington, DC.
  • Chung, F. and Lu, L. (2006). Complex Graphs and Networks. CBMS Regional Conference Series in Mathematics 107. Conf. Board Math. Sci., Washington, DC.
  • Daye, Z. J. and Jeng, X. J. (2009). Shrinkage and model selection with correlated variables via weighted fusion. Comput. Statist. Data Anal. 53 1284–1298.
  • Fan, J. (1997). Comments on “Wavelets in statistics: A review” by A. Antoniadis. J. Italian Statist. Assoc. 6 131–138.
  • Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive lasso and SCAD penalties. Ann. Appl. Stat. 3 521–541.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109–148.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatist. 9 432–441.
  • Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
  • Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. J. Comput. Graph. Statist. 7 397–416.
  • Genkin, A., Lewis, D. D. and Madigan, D. (2004). Large-scale Bayesian logistic regression for text categorization. Technical report, DIMACS, Rutgers Univ.
  • Hebiri, M. and van de Geer, S. (2010). The smooth-Lasso and other 1+2-penalized methods. Preprint. Available at http://arxiv4.library.cornell.edu/PS_cache/arxiv/pdf/1003/1003.4885v1.pdf.
  • Huang, J., Breheny, P., Ma, S. and Zhang, C. H. (2010a). The Mnet method for variable selection. Technical Report # 402, Dept. Statistics and Actuarial Science, Univ. Iowa.
  • Huang, J., Ma, S., Li, H. and Zhang, C. H. (2010b). The sparse Laplacian shrinkage estimator for high-dimensional regression. Technical Report # 403, Dept. Statistics and Actuarial Science, Univ. Iowa.
  • Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatist. 4 249–264.
  • Jia, J. and Yu, B. (2010). On model selection consistency of the elastic net when pn. Statist. Sinica 20 595–611.
  • Li, C. and Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24 1175–1182.
  • Li, C. and Li, H. (2010). Variable selection and regression analysis for covariates with graphical structure. Ann. Appl. Stat. 4 1498–1516.
  • Mazumder, R., Friedman, J. and Hastie, T. (2009). SparseNet: Coordinate descent with non-convex penalties. Technical report, Dept. Statistics, Stanford Univ.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Pan, W., Xie, B. and Shen, X. (2011). Incorporating predictor network in penalized regression with application to microarray data. Biometrics. To appear.
  • Scheetz, T. E., Kim, K. Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudtson, K. L., Dorrance, A. M., DiBona, G. F., Huang, J., Casavant, T. L., Sheffield, V. C. and Stone, E. M. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. USA 103 14429–14434.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • Tutz, G. and Ulbricht, J. (2009). Penalized regression with correlation-based penalty. Stat. Comput. 19 239–253.
  • Wu, T. T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 224–244.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zhang, B. and Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4 45 pp. (electronic).
  • Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.
  • Zou, H. and Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters. Ann. Statist. 37 1733–1751.