## Annals of Statistics

### Regularization and the small-ball method I: Sparse recovery

#### Abstract

We obtain bounds on estimation error rates for regularization procedures of the form \begin{equation*}\hat{f}\in\mathop{\operatorname{argmin}}_{f\in F}(\frac{1}{N}\sum_{i=1}^{N}(Y_{i}-f(X_{i}))^{2}+\lambda \Psi(f))\end{equation*} when $\Psi$ is a norm and $F$ is convex.

Our approach gives a common framework that may be used in the analysis of learning problems and regularization problems alike. In particular, it sheds some light on the role various notions of sparsity have in regularization and on their connection with the size of subdifferentials of $\Psi$ in a neighborhood of the true minimizer.

As “proof of concept” we extend the known estimates for the LASSO, SLOPE and trace norm regularization.

#### Article information

Source
Ann. Statist., Volume 46, Number 2 (2018), 611-641.

Dates
Revised: January 2017
First available in Project Euclid: 3 April 2018

https://projecteuclid.org/euclid.aos/1522742431

Digital Object Identifier
doi:10.1214/17-AOS1562

Mathematical Reviews number (MathSciNet)
MR3782379

Zentralblatt MATH identifier
06870274

#### Citation

Lecué, Guillaume; Mendelson, Shahar. Regularization and the small-ball method I: Sparse recovery. Ann. Statist. 46 (2018), no. 2, 611--641. doi:10.1214/17-AOS1562. https://projecteuclid.org/euclid.aos/1522742431

#### References

• [1] Artstein-Avidan, S., Giannopoulos, A. and Milman, V. D. (2015). Asymptotic Geometric Analysis. Part I. Mathematical Surveys and Monographs 202. Amer. Math. Soc., Providence, RI.
• [2] Bach, F. R. (2010). Structured sparsity-inducing norms through submodular functions. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a Meeting Held 6-9 December 2010 118–126, Vancouver, British Columbia, Canada.
• [3] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [4] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103–1140.
• [5] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• [6] Candès, E. J. and Plan, Y. (2011). Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inform. Theory 57 2342–2359.
• [7] Gross, D. (2011). Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform. Theory 57 1548–1566.
• [8] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Springer, Heidelberg.
• [9] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
• [10] Koltchinskii, V. and Mendelson, S. (2015). Bounding the smallest singular value of a random matrix without concentration. Int. Math. Res. Not. IMRN 23 12991–13008.
• [11] Lecué, G. and Mendelson, S. (2013). Learning subgaussian classes: Upper and minimax bounds. Technical report, CNRS, Ecole polytechnique and Technion.
• [12] Lecué, G. and Mendelson, S. (2015). Regularization and the small-ball method II: Complexity-based bounds. Technical report, CNRS, ENSAE and Technion, I.I.T.
• [13] Lecué, G. and Mendelson, S. (2017). Sparse recovery under weak moment assumptions. J. Eur. Math. Soc. (JEMS) 19 881–904.
• [14] Lecué, G. and Mendelson, S. (2018). Supplement to “Regularization and the small-ball method I: sparse recovery.” DOI:10.1214/17-AOS1562SUPP.
• [15] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
• [16] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
• [17] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• [18] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
• [19] Mendelson, S. (2013). Learning without concentration for general loss function. Technical report, Technion, I.I.T. Available at arXiv:1410.3192.
• [20] Mendelson, S. (2014). Learning without concentration. In Proceedings of the 27th Annual Conference on Learning Theory COLT14 25–39.
• [21] Mendelson, S. (2014). A remark on the diameter of random sections of convex bodies. In Geometric Aspects of Functional Analysis. Lecture Notes in Math. 2116 395–404. Springer, Cham.
• [22] Mendelson, S. (2015). Learning without concentration. J. ACM 62 Art. 21, 25.
• [23] Mendelson, S. (2015). “Local vs. global parameters”, breaking the Gaussian compexity barrier. Technical report, Technion, I.I.T.
• [24] Mendelson, S. (2015). On multiplier processes under weak moment assumptions Technical report, Technion, I.I.T.
• [25] Mendelson, S. (2016). Upper bounds on product and multiplier empirical processes. Stochastic Process. Appl. 126 3652–3680.
• [26] Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665–1697.
• [27] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
• [28] Nickl, R. and van de Geer, S. (2013). Confidence sets in sparse regression. Ann. Statist. 41 2852–2876.
• [29] Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471–501.
• [30] Rohde, A. and Tsybakov, A. B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887–930.
• [31] Rudelson, M. and Vershynin, R. (2015). Small ball probabilities for linear images of high-dimensional distributions. Int. Math. Res. Not. IMRN 19 9594–9617.
• [32] Su, W. and Candès, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist. 44 1038–1068.
• [33] Talagrand, M. (2014). Upper and lower bounds for stochastic processes. In Modern Methods and Classical Problems. Ergebnisse der Mathematik und Ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics 60. Springer, Heidelberg.
• [34] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
• [35] van de Geer, S. (2014). Weakly decomposable regularization penalties and structured sparsity. Scand. J. Stat. 41 72–86.
• [36] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
• [37] van de Geer, S. A. (2007). The deterministic lasso. Technical report, ETH Zürich. Available at http://www.stat.math.ethz.ch/~geer/lasso.pdf.
• [38] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
• [39] Watson, G. A. (1992). Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170 33–45.
• [40] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.
• [41] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.

#### Supplemental materials

• Supplementary material to “Regularization and the small-ball method I: sparse recovery”. In the supplementary material we study a general $X$ without assuming it is isotropic.