The Annals of Statistics

General nonexact oracle inequalities for classes with a subexponential envelope

Guillaume Lecué and Shahar Mendelson

Full-text: Open access

Abstract

We show that empirical risk minimization procedures and regularized empirical risk minimization procedures satisfy nonexact oracle inequalities in an unbounded framework, under the assumption that the class has a subexponential envelope function. The main novelty, in addition to the boundedness assumption free setup, is that those inequalities can yield fast rates even in situations in which exact oracle inequalities only hold with slower rates.

We apply these results to show that procedures based on $\ell_{1}$ and nuclear norms regularization functions satisfy oracle inequalities with a residual term that decreases like $1/n$ for every $L_{q}$-loss functions ($q\geq2$), while only assuming that the tail behavior of the input and output variables are well behaved. In particular, no RIP type of assumption or “incoherence condition” are needed to obtain fast residual terms in those setups. We also apply these results to the problems of convex aggregation and model selection.

Article information

Source
Ann. Statist. Volume 40, Number 2 (2012), 832-860.

Dates
First available in Project Euclid: 1 June 2012

Permanent link to this document
https://projecteuclid.org/euclid.aos/1338515139

Digital Object Identifier
doi:10.1214/11-AOS965

Mathematical Reviews number (MathSciNet)
MR2933668

Zentralblatt MATH identifier
1274.62247

Subjects
Primary: 62G05: Estimation
Secondary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30}

Keywords
Statistical learning fast rates of convergence oracle inequalities regularization classification aggregation model selection high-dimensional data

Citation

Lecué, Guillaume; Mendelson, Shahar. General nonexact oracle inequalities for classes with a subexponential envelope. Ann. Statist. 40 (2012), no. 2, 832--860. doi:10.1214/11-AOS965. https://projecteuclid.org/euclid.aos/1338515139


Export citation

References

  • [1] Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab. 13 1000–1034.
  • [2] Audibert, J.-Y. (2007). No fast exponential deviation inequalities for the progressive mixture rule. Technical report, CERTIS.
  • [3] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
  • [4] Bartlett, P. L. and Mendelson, S. (2006). Empirical minimization. Probab. Theory Related Fields 135 311–334.
  • [5] Bartlett, P. L., Mendelson, S. and Neeman, J. (2012). $\ell_1$-regularized linear regression: Persistence and oracle inequalities. Probab. Theory Related Fields. To appear.
  • [6] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [7] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • [8] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • [9] Carl, B. (1985). Inequalities of Bernstein–Jackson-type and the degree of compactness of operators in Banach spaces. Ann. Inst. Fourier (Grenoble) 35 79–118.
  • [10] Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
  • [11] Giné, E., Latała, R. and Zinn, J. (2000). Exponential and moment inequalities for $U$-statistics. In High Dimensional Probability, II (Seattle, WA, 1999). Progress in Probability 47 13–38. Birkhäuser, Boston, MA.
  • [12] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593–2656.
  • [13] Koltchinskii, V. (2009). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799–828.
  • [14] Koltchinskii, V. (2009). Sparse recovery in convex hulls via entropy penalization. Ann. Statist. 37 1332–1359.
  • [15] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7–57.
  • [16] Koltchinskii, V. and Panchenko, D. (2000). Rademacher processes and bounding the risk of function learning. In High Dimensional Probability, II (Seattle, WA, 1999). Progress in Probability 47 443–457. Birkhäuser, Boston, MA.
  • [17] Lecué, G. and Mendelson, S. (2012). On the optimality of the aggregate with exponential weights for low temperature. Bernoulli. To appear.
  • [18] Lecué, G. and Mendelson, S. (2009). Aggregation via empirical risk minimization. Probab. Theory Related Fields 145 591–613.
  • [19] Lecué, G. and Mendelson, S. (2010). Sharper lower bounds on the performance of the empirical risk minimization algorithm. Bernoulli 16 605–613.
  • [20] Lecué, G. and Mendelson, S. (2012). Supplement to “General non-exact oracle inequalities for classes with a subexponential envelope.” DOI:10.1214/11-AOS965SUPP.
  • [21] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin.
  • [22] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
  • [23] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin.
  • [24] Massart, P. and Nédélec, É. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326–2366.
  • [25] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [26] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • [27] Mendelson, S. Oracle inequalities and the isomorphic method. Technical report, Technion, Israel Inst. Technology.
  • [28] Mendelson, S. (2008). Lower bounds for the empirical minimization algorithm. IEEE Trans. Inform. Theory 54 3797–3803.
  • [29] Mendelson, S. (2008). Lower bounds for the empirical minimization algorithm. IEEE Trans. Inform. Theory 54 3797–3803.
  • [30] Mendelson, S. (2008). Obtaining fast error rates in nonconvex situations. J. Complexity 24 380–397.
  • [31] Mendelson, S. (2010). Empirical processes with a bounded $\psi_1$ diameter. Geom. Funct. Anal. 20 988–1027.
  • [32] Mendelson, S. and Neeman, J. (2010). Regularization in kernel learning. Ann. Statist. 38 526–565.
  • [33] Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17 1248–1282.
  • [34] Mendelson, S. and Paouris, G. (2011). On the generic chaining and the smallest singular value of random matrices with heavy tails. Unpublished manuscript. Available at arXiv:1108.3886.
  • [35] Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM J. Comput. 24 227–234.
  • [36] Pisier, G. (1981). Remarques sur un résultat non publié de B. Maurey. In Seminar on Functional Analysis, 19801981 Exp. No. V, 13. École Polytech., Palaiseau.
  • [37] Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Springer, New York.
  • [38] Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 28–76.
  • [39] Talagrand, M. (2005). The Generic Chaining: Upper and Lower Bounds of Stochastic Processes. Springer, Berlin.
  • [40] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [41] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
  • [42] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
  • [43] Vapnik, V. (1982). Estimation of Dependences Based on Empirical Data. Springer, New York.
  • [44] Wee, S. L., Bartlett, P. L. and Williamson, R. C. (1996). The importance of convexity in learning with squared loss. In Proceedings of the Ninth Annual Conference on Computational Learning Theory 140–146. ACM Press, New York.
  • [45] Zhang, T. (2009). Some sharp performance bounds for least squares regression with $L_1$ regularization. Ann. Statist. 37 2109–2144.
  • [46] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.

Supplemental materials

  • Supplementary material: Applications to matrix completion, convex aggregation and model selection. In the supplementary file, we apply our main results to the problem of matrix completion, convex aggregation and model selection. The aim is to expose the fundamental differences between exact and nonexact oracle inequalities on classical problems.