The Annals of Statistics

Some sharp performance bounds for least squares regression with L1 regularization

Tong Zhang

Full-text: Open access

Abstract

We derive sharp performance bounds for least squares regression with L1 regularization from parameter estimation accuracy and feature selection quality perspectives. The main result proved for L1 regularization extends a similar result in [Ann. Statist. 35 (2007) 2313–2351] for the Dantzig selector. It gives an affirmative answer to an open question in [Ann. Statist. 35 (2007) 2358–2364]. Moreover, the result leads to an extended view of feature selection that allows less restrictive conditions than some recent work. Based on the theoretical insights, a novel two-stage L1-regularization procedure with selective penalization is analyzed. It is shown that if the target parameter vector can be decomposed as the sum of a sparse parameter vector with large coefficients and another less sparse vector with relatively small coefficients, then the two-stage procedure can lead to improved performance.

Article information

Source
Ann. Statist. Volume 37, Number 5A (2009), 2109-2144.

Dates
First available in Project Euclid: 15 July 2009

Permanent link to this document
http://projecteuclid.org/euclid.aos/1247663750

Digital Object Identifier
doi:10.1214/08-AOS659

Mathematical Reviews number (MathSciNet)
MR2543687

Subjects
Primary: 62G05: Estimation
Secondary: 62J05: Linear regression

Keywords
L_1 regularization Lasso regression sparsity variable selection parameter estimation

Citation

Zhang, Tong. Some sharp performance bounds for least squares regression with L 1 regularization. Ann. Statist. 37 (2009), no. 5A, 2109--2144. doi:10.1214/08-AOS659. http://projecteuclid.org/euclid.aos/1247663750.


Export citation

References

  • [1] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [2] Bunea, C., Tsybakov, A. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • [3] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • [4] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [5] Candes, E. J. and Plan, Y. (2007). Near-ideal model selection by 1 minimization. Technical report, Caltech.
  • [6] Candes, E. J., Romberg, J. and Tao, T. (2005). Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59 1207–1223.
  • [7] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
  • [8] Candes, E. J. and Tao, T. (2007). Rejoinder: The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [9] Donoho, D. L., Elad, M. and Temlyakov, V. N. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
  • [10] Efron, B., Hastie, T. and Tibshirani, R. (2007). Discussion of “The Dantzig selector.” Ann. Statist. 35 2358–2364.
  • [11] Van De Geer, S. A. (2006). High-dimensional generalized linear models and the Lasso. Technical Report 133, ETH.
  • [12] James, G. M. and Radchenko, P. (2008). Generalized Dantzig selector with shrinkage tuning. Biometrika. To appear.
  • [13] Koltchinskii, V. (2006). The Dantzig selector and sparsity oracle inequalities. Manuscript.
  • [14] Koltchinskii, V. (2008). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré. To appear.
  • [15] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
  • [16] Meinshausen, N. and Buhlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • [17] Meinshausen, N. and Yu, B. (2006). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • [18] Wainwright, M. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Technical report, Dept. Statistics, UC Berkeley.
  • [19] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [20] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2567.