Annals of Statistics

A unified approach to model selection and sparse recovery using regularized least squares

Jinchi Lv and Yingying Fan

Full-text: Open access

Abstract

Model selection and sparse recovery are two important problems for which many regularization methods have been proposed. We study the properties of regularization methods in both problems under the unified framework of regularized least squares with concave penalties. For model selection, we establish conditions under which a regularized least squares estimator enjoys a nonasymptotic property, called the weak oracle property, where the dimensionality can grow exponentially with sample size. For sparse recovery, we present a sufficient condition that ensures the recoverability of the sparsest solution. In particular, we approach both problems by considering a family of penalties that give a smooth homotopy between L0 and L1 penalties. We also propose the sequentially and iteratively reweighted squares (SIRS) algorithm for sparse recovery. Numerical studies support our theoretical results and demonstrate the advantage of our new methods for model selection and sparse recovery.

Article information

Source
Ann. Statist., Volume 37, Number 6A (2009), 3498-3528.

Dates
First available in Project Euclid: 17 August 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1250515394

Digital Object Identifier
doi:10.1214/09-AOS683

Mathematical Reviews number (MathSciNet)
MR2549567

Zentralblatt MATH identifier
1369.62156

Subjects
Primary: 62J99: None of the above, but in this section
Secondary: 62F99: None of the above, but in this section

Keywords
Model selection sparse recovery high dimensionality concave penalty regularized least squares weak oracle property

Citation

Lv, Jinchi; Fan, Yingying. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37 (2009), no. 6A, 3498--3528. doi:10.1214/09-AOS683. https://projecteuclid.org/euclid.aos/1250515394


Export citation

References

  • Antoniadis, A. and Fan, J. (2001). Regularization of wavelets approximations (with discussion). J. Amer. Statist. Assoc. 96 939–967.
  • Bickel, P. J. and Li, B. (2006). Regularization in statistics (with discussion). Test 15 271–344.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. (2008). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. To appear.
  • Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373–384.
  • Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
  • Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406–5425.
  • Candes, E. J. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2404.
  • Candès, E. J., Wakin, M. B. and Boyd, S. P. (2008). Enhancing sparsity by reweighted 1 minimization. J. Fourier Anal. Appl. 14 877–905.
  • Chen, S., Donoho, D. and Saunders, M. (1999). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • Donoho, D. L. (2004). Neighborly polytopes and sparse solution of underdetermined linear equations. Technical report, Dept. Statistics, Stanford Univ.
  • Donoho, D. L. and Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization. Proc. Natl. Acad. Sci. USA 100 2197–2202.
  • Donoho, D., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
  • Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–451.
  • Fan, J. (1997). Comment on “Wavelets in statistics: A review” by A. Antoniadis. J. Italian Statist. Assoc. 6 131–138.
  • Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Proceedings of the International Congress of Mathematicians (M. Sanz-Sole, J. Soria, J. L. Varona and J. Verdera, eds.) 3 595–622. European Math. Soc. Publishing House, Zürich.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). J. Roy. Statist. Soc. Ser. B 70 849–911.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with diverging number of parameters. Ann. Statist. 32 928–961.
  • Fang, K.-T. and Zhang, Y.-T. (1990). Generalized Multivariate Analysis. Springer, Berlin.
  • Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109–148.
  • Fuchs, J.-J. (2004). Recovery of exact sparse representations in the presence of noise. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing 533–536. Montreal, QC.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Hunter, D. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617–1642.
  • James, G., Radchenko, P. and Lv, J. (2009). DASSO: Connections between the Dantzig selector and Lasso. J. Roy. Statist. Soc. Ser. B 71 127–142.
  • Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Ann. Statist. 36 261–286.
  • Liu, Y. and Wu, Y. (2007). Variable selection via a combination of the L0 and L1 penalties. J. Comput. Graph. Statist. 16 782–798.
  • Meinshausen, N., Rocha, G. and Yu, B. (2007). Discussion: A tale of three cousins: Lasso, L2Boosting and Dantzig. Ann. Statist. 35 2373–2384.
  • Nikolova, M. (2000). Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math. 61 633–658.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 5 1030–1051.
  • Vrahatis, M. N. (1989). A short proof and a generalization of Miranda’s existence theorem. Proc. Amer. Math. Soc. 107 701–703.
  • Wainwright, M. J. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • Wang, H., Li, R. and Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94 553–568.
  • Zhang, C.-H. (2007). Penalized linear unbiased selection. Technical report, Dept. Statistics, Rutgers Univ.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
  • Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann. Statist. 36 1509–1566.