Annals of Statistics
- Ann. Statist.
- Volume 37, Number 6A (2009), 3498-3528.
A unified approach to model selection and sparse recovery using regularized least squares
Full-text: Open access
Abstract
Model selection and sparse recovery are two important problems for which many regularization methods have been proposed. We study the properties of regularization methods in both problems under the unified framework of regularized least squares with concave penalties. For model selection, we establish conditions under which a regularized least squares estimator enjoys a nonasymptotic property, called the weak oracle property, where the dimensionality can grow exponentially with sample size. For sparse recovery, we present a sufficient condition that ensures the recoverability of the sparsest solution. In particular, we approach both problems by considering a family of penalties that give a smooth homotopy between L0 and L1 penalties. We also propose the sequentially and iteratively reweighted squares (SIRS) algorithm for sparse recovery. Numerical studies support our theoretical results and demonstrate the advantage of our new methods for model selection and sparse recovery.
Article information
Source
Ann. Statist., Volume 37, Number 6A (2009), 3498-3528.
Dates
First available in Project Euclid: 17 August 2009
Permanent link to this document
https://projecteuclid.org/euclid.aos/1250515394
Digital Object Identifier
doi:10.1214/09-AOS683
Mathematical Reviews number (MathSciNet)
MR2549567
Zentralblatt MATH identifier
1369.62156
Subjects
Primary: 62J99: None of the above, but in this section
Secondary: 62F99: None of the above, but in this section
Keywords
Model selection sparse recovery high dimensionality concave penalty regularized least squares weak oracle property
Citation
Lv, Jinchi; Fan, Yingying. A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37 (2009), no. 6A, 3498--3528. doi:10.1214/09-AOS683. https://projecteuclid.org/euclid.aos/1250515394
References
- Antoniadis, A. and Fan, J. (2001). Regularization of wavelets approximations (with discussion). J. Amer. Statist. Assoc. 96 939–967.Mathematical Reviews (MathSciNet): MR1946364
Zentralblatt MATH: 1072.62561
Digital Object Identifier: doi:10.1198/016214501753208942
JSTOR: links.jstor.org - Bickel, P. J. and Li, B. (2006). Regularization in statistics (with discussion). Test 15 271–344.Mathematical Reviews (MathSciNet): MR2273731
Zentralblatt MATH: 1110.62051
Digital Object Identifier: doi:10.1007/BF02607055 - Bickel, P. J., Ritov, Y. and Tsybakov, A. (2008). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. To appear.Mathematical Reviews (MathSciNet): MR2533469
Zentralblatt MATH: 1173.62022
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830 - Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373–384.Mathematical Reviews (MathSciNet): MR1365720
Digital Object Identifier: doi:10.2307/1269730
JSTOR: links.jstor.org
Zentralblatt MATH: 0862.62059 - Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.Mathematical Reviews (MathSciNet): MR2243152
Digital Object Identifier: doi:10.1109/TIT.2005.858979
Zentralblatt MATH: 1264.94121 - Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406–5425.Mathematical Reviews (MathSciNet): MR2300700
Digital Object Identifier: doi:10.1109/TIT.2006.885507
Zentralblatt MATH: 1309.94033 - Candes, E. J. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2404.Mathematical Reviews (MathSciNet): MR2382644
Zentralblatt MATH: 1139.62019
Digital Object Identifier: doi:10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958 - Candès, E. J., Wakin, M. B. and Boyd, S. P. (2008). Enhancing sparsity by reweighted ℓ1 minimization. J. Fourier Anal. Appl. 14 877–905.Mathematical Reviews (MathSciNet): MR2461611
Zentralblatt MATH: 1176.94014
Digital Object Identifier: doi:10.1007/s00041-008-9045-x - Chen, S., Donoho, D. and Saunders, M. (1999). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.Mathematical Reviews (MathSciNet): MR1639094
Zentralblatt MATH: 0919.94002
Digital Object Identifier: doi:10.1137/S1064827596304010 - Donoho, D. L. (2004). Neighborly polytopes and sparse solution of underdetermined linear equations. Technical report, Dept. Statistics, Stanford Univ.
- Donoho, D. L. and Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization. Proc. Natl. Acad. Sci. USA 100 2197–2202.Mathematical Reviews (MathSciNet): MR1963681
Zentralblatt MATH: 1064.94011
Digital Object Identifier: doi:10.1073/pnas.0437847100 - Donoho, D., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.Mathematical Reviews (MathSciNet): MR2237332
Digital Object Identifier: doi:10.1109/TIT.2005.860430
Zentralblatt MATH: 1288.94017 - Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.Mathematical Reviews (MathSciNet): MR1311089
Zentralblatt MATH: 0815.62019
Digital Object Identifier: doi:10.1093/biomet/81.3.425
JSTOR: links.jstor.org - Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–451.Mathematical Reviews (MathSciNet): MR2060166
Zentralblatt MATH: 1091.62054
Digital Object Identifier: doi:10.1214/009053604000000067
Project Euclid: euclid.aos/1083178935 - Fan, J. (1997). Comment on “Wavelets in statistics: A review” by A. Antoniadis. J. Italian Statist. Assoc. 6 131–138.
- Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.Mathematical Reviews (MathSciNet): MR2485009
Zentralblatt MATH: 05503372
Digital Object Identifier: doi:10.1214/07-AOS504
Project Euclid: euclid.aos/1231165181 - Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.Mathematical Reviews (MathSciNet): MR1946581
Zentralblatt MATH: 1073.62547
Digital Object Identifier: doi:10.1198/016214501753382273
JSTOR: links.jstor.org - Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Proceedings of the International Congress of Mathematicians (M. Sanz-Sole, J. Soria, J. L. Varona and J. Verdera, eds.) 3 595–622. European Math. Soc. Publishing House, Zürich.Zentralblatt MATH: 1117.62137
- Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). J. Roy. Statist. Soc. Ser. B 70 849–911.
- Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with diverging number of parameters. Ann. Statist. 32 928–961.Mathematical Reviews (MathSciNet): MR2065194
Zentralblatt MATH: 1092.62031
Digital Object Identifier: doi:10.1214/009053604000000256
Project Euclid: euclid.aos/1085408491 - Fang, K.-T. and Zhang, Y.-T. (1990). Generalized Multivariate Analysis. Springer, Berlin.Mathematical Reviews (MathSciNet): MR1079542
- Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109–148.
- Fuchs, J.-J. (2004). Recovery of exact sparse representations in the presence of noise. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing 533–536. Montreal, QC.
- Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.Mathematical Reviews (MathSciNet): MR2108039
Digital Object Identifier: doi:10.3150/bj/1106314846
Project Euclid: euclid.bj/1106314846
Zentralblatt MATH: 1055.62078 - Hunter, D. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617–1642.Mathematical Reviews (MathSciNet): MR2166557
Zentralblatt MATH: 1078.62028
Digital Object Identifier: doi:10.1214/009053605000000200
Project Euclid: euclid.aos/1123250224 - James, G., Radchenko, P. and Lv, J. (2009). DASSO: Connections between the Dantzig selector and Lasso. J. Roy. Statist. Soc. Ser. B 71 127–142.
- Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Ann. Statist. 36 261–286.Mathematical Reviews (MathSciNet): MR2387971
Zentralblatt MATH: 1132.62027
Digital Object Identifier: doi:10.1214/009053607000000604
Project Euclid: euclid.aos/1201877301 - Liu, Y. and Wu, Y. (2007). Variable selection via a combination of the L0 and L1 penalties. J. Comput. Graph. Statist. 16 782–798.Mathematical Reviews (MathSciNet): MR2412482
Digital Object Identifier: doi:10.1198/106186007X255676 - Meinshausen, N., Rocha, G. and Yu, B. (2007). Discussion: A tale of three cousins: Lasso, L2Boosting and Dantzig. Ann. Statist. 35 2373–2384.Mathematical Reviews (MathSciNet): MR2382649
Digital Object Identifier: doi:10.1214/009053607000000460
Project Euclid: euclid.aos/1201012963 - Nikolova, M. (2000). Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math. 61 633–658.Mathematical Reviews (MathSciNet): MR1780806
Zentralblatt MATH: 0991.94015
Digital Object Identifier: doi:10.1137/S0036139997327794 - Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
- Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 5 1030–1051.Mathematical Reviews (MathSciNet): MR2238069
Digital Object Identifier: doi:10.1109/TIT.2005.864420
Zentralblatt MATH: 1288.94025 - Vrahatis, M. N. (1989). A short proof and a generalization of Miranda’s existence theorem. Proc. Amer. Math. Soc. 107 701–703.Mathematical Reviews (MathSciNet): MR993760
Zentralblatt MATH: 0695.55001
Digital Object Identifier: doi:10.2307/2048168 - Wainwright, M. J. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Technical report, Dept. Statistics, Univ. California, Berkeley.
- Wang, H., Li, R. and Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94 553–568.Mathematical Reviews (MathSciNet): MR2410008
Zentralblatt MATH: 1135.62058
Digital Object Identifier: doi:10.1093/biomet/asm053 - Zhang, C.-H. (2007). Penalized linear unbiased selection. Technical report, Dept. Statistics, Rutgers Univ.
- Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
- Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.Mathematical Reviews (MathSciNet): MR2279469
Zentralblatt MATH: 1171.62326
Digital Object Identifier: doi:10.1198/016214506000000735 - Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann. Statist. 36 1509–1566.Mathematical Reviews (MathSciNet): MR2435443
Digital Object Identifier: doi:10.1214/009053607000000802
Project Euclid: euclid.aos/1216237287
Zentralblatt MATH: 1282.62112

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Thresholding-based iterative selection procedures for model selection and shrinkage
She, Yiyuan, Electronic Journal of Statistics, 2009 - Optimal computational and statistical rates of convergence for sparse nonconvex learning problems
Wang, Zhaoran, Liu, Han, and Zhang, Tong, Annals of Statistics, 2014 - High-dimensional generalizations of asymmetric least squares regression and their applications
Gu, Yuwen and Zou, Hui, Annals of Statistics, 2016
- Thresholding-based iterative selection procedures for model selection and shrinkage
She, Yiyuan, Electronic Journal of Statistics, 2009 - Optimal computational and statistical rates of convergence for sparse nonconvex learning problems
Wang, Zhaoran, Liu, Han, and Zhang, Tong, Annals of Statistics, 2014 - High-dimensional generalizations of asymmetric least squares regression and their applications
Gu, Yuwen and Zou, Hui, Annals of Statistics, 2016 - Smoothing proximal gradient method for general
structured sparse regression
Chen, Xi, Lin, Qihang, Kim, Seyoung, Carbonell, Jaime G., and Xing, Eric P., Annals of Applied Statistics, 2012 - Accuracy guaranties for $\ell_{1}$ recovery of block-sparse signals
Juditsky, Anatoli, Kılınç Karzan, Fatma, Nemirovski, Arkadi, and Polyak, Boris, Annals of Statistics, 2012 - Model selection and structure specification in ultra-high dimensional generalised semi-varying coefficient models
Li, Degui, Ke, Yuan, and Zhang, Wenyang, Annals of Statistics, 2015 - Statistical analysis of sparse approximate factor models
Poignard, Benjamin and Terada, Yoshikazu, Electronic Journal of Statistics, 2020 - Variable selection with Hamming loss
Butucea, Cristina, Ndaoud, Mohamed, Stepanova, Natalia A., and Tsybakov, Alexandre B., Annals of Statistics, 2018 - The composite absolute penalties family for grouped and hierarchical variable selection
Zhao, Peng, Rocha, Guilherme, and Yu, Bin, Annals of Statistics, 2009 - MAP model selection in Gaussian regression
Abramovich, Felix and Grinshtein, Vadim, Electronic Journal of Statistics, 2010
