Electronic Journal of Statistics

On model selection consistency of regularized M-estimators

Jason D. Lee, Yuekai Sun, and Jonathan E. Taylor

Full-text: Open access


Regularized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Usually the low-dimensional structure is encoded by the presence of the (unknown) parameters in some low-dimensional model subspace. In such settings, it is desirable for estimates of the model parameters to be model selection consistent: the estimates also fall in the model subspace. We develop a general framework for establishing consistency and model selection consistency of regularized M-estimators and show how it applies to some special cases of interest in statistical learning. Our analysis identifies two key properties of regularized M-estimators, referred to as geometric decomposability and irrepresentability, that ensure the estimators are consistent and model selection consistent.

Article information

Electron. J. Statist., Volume 9, Number 1 (2015), 608-642.

First available in Project Euclid: 2 April 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F10: Point estimation

Regularized M-estimator geometrically decomposable penalties lasso generalized lasso group lasso nuclear norm minimization


Lee, Jason D.; Sun, Yuekai; Taylor, Jonathan E. On model selection consistency of regularized M-estimators. Electron. J. Statist. 9 (2015), no. 1, 608--642. doi:10.1214/15-EJS1013. https://projecteuclid.org/euclid.ejs/1427990067

Export citation


  • Bach, F. R. (2008). Consistency of trace norm minimization., The Journal of Machine Learning Research 9 1019–1048.
  • Bach, F. R. (2010). Structured sparsity-inducing norms through submodular functions. In, Adv. Neural Inf. Process. Syst. (NIPS) 118–126.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and, Applications.
  • Bunea, F. (2008). Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization., Electron. J. Stat. 2 1153–1194.
  • Candès, E. and Recht, B. (2012). Simple bounds for recovering low-complexity models., Math. Prog. Ser. A 1–13.
  • Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2012). Latent variable graphical model selection via convex optimization., Ann. Statis. 40 1935–1967.
  • Chandrasekaran, V., Recht, B., Parrilo, P. A. and Willsky, A. S. (2012). The convex geometry of linear inverse problems., Foundations of Computational Mathematics 12 805–849.
  • Chen, S. S., Donoho, D. L. and Saunders, M. A. (2001). Atomic decomposition by basis pursuit., SIAM Review 43 129–159.
  • Cheng, J., Levina, E. and Zhu, J. (2013). High-dimensional mixed graphical models., arXiv:1304.2810.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso., Biostatistics 9 432–441.
  • Jalali, A., Ravikumar, P., Vasuki, V., Sanghavi, S. and ECE, U. (2011). On learning discrete graphical models using group-sparse regularization. In, Int. Conf. Artif. Intell. Stat. (AISTATS).
  • James, G. M., Paulson, C. and Rusmevichientong, P. (2012). The constrained lasso. Technical Report, University of Southern, California.
  • Kolar, M., Song, L., Ahmed, A. and Xing, E. (2010). Estimating time-varying networks., Ann. Appl. Stat. 4 94–123.
  • Lee, J. D. and Hastie, T. (2012). Learning mixed graphical models., arXiv:1205.5012.
  • Lee, J. D., Sun, Y. and Saunders, M. A. (2009). Proximal Newton-type methods for minimizing composite functions. In, Adv. Neural Inf. Process. Syst. (NIPS) 827–835.
  • Li, W. and Sun, W. (2002). Perturbation bounds of unitary and subunitary polar factors., SIAM Journal on Matrix Analysis and Applications 23 1183–1193.
  • Loh, P. L. and Wainwright, M. J. (2012). Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses., arXiv:1212.0478.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso., Ann. Statis. 34 1436–1462.
  • Negahban, S., Wainwright, M. J. et al. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling., The Annals of Statistics 39 1069–1097.
  • Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers., Statist. Sci. 27 538–557.
  • Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression., Ann. Statis. 39 1–47.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs., The Journal of Machine Learning Research 11 2241–2259.
  • Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_1$-regularized logistic regression., Ann. Statis. 38 1287–1319.
  • Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence., Electron. J. Stat. 5 935–980.
  • Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements., IEEE Transactions on Information Theory 59 3434–3447.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. R. Stat. Soc. Ser. B Stat. Methodol. 267–288.
  • Tibshirani, R. J. and Taylor, J. E. (2011). The solution path of the generalized lasso., Ann. Statis. 39 1335–1371.
  • Vaiter, S., Peyré, G., Dossal, C. and Fadili, J. (2013). Robust sparse analysis regularization., IEEE Trans. Inform. Theory 59 2001–2016.
  • van de Geer, S. (2012). Weakly decomposable regularization penalties and structured sparsity., arXiv:1204.4813.
  • Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices., arXiv:1011.3027.
  • Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_1$-constrained quadratic programming (Lasso)., IEEE Trans. Inform. Theory 55 2183–2202.
  • Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference., Found. Trends Mach. Learn. 1 1–305.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., J. Mach. Learn. Res. 7 2541–2563.