The Annals of Statistics
- Ann. Statist.
- Volume 42, Number 1 (2014), 211-224.
Optimal learning with Q-aggregation
Guillaume Lecué and Philippe Rigollet
Full-text: Open access
Abstract
We consider a general supervised learning problem with strongly convex and Lipschitz loss and study the problem of model selection aggregation. In particular, given a finite dictionary functions (learners) together with the prior, we generalize the results obtained by Dai, Rigollet and Zhang [Ann. Statist. 40 (2012) 1878–1905] for Gaussian regression with squared loss and fixed design to this learning setup. Specifically, we prove that the $Q$-aggregation procedure outputs an estimator that satisfies optimal oracle inequalities both in expectation and with high probability. Our proof techniques somewhat depart from traditional proofs by making most of the standard arguments on the Laplace transform of the empirical process to be controlled.
Article information
Source
Ann. Statist., Volume 42, Number 1 (2014), 211-224.
Dates
First available in Project Euclid: 18 February 2014
Permanent link to this document
https://projecteuclid.org/euclid.aos/1392733186
Digital Object Identifier
doi:10.1214/13-AOS1190
Mathematical Reviews number (MathSciNet)
MR3178462
Zentralblatt MATH identifier
1286.68255
Subjects
Primary: 68Q32: Computational learning theory [See also 68T05]
Secondary: 62G08: Nonparametric regression 62G05: Estimation
Keywords
Learning theory empirical risk minimization aggregation empirical processes theory
Citation
Lecué, Guillaume; Rigollet, Philippe. Optimal learning with Q -aggregation. Ann. Statist. 42 (2014), no. 1, 211--224. doi:10.1214/13-AOS1190. https://projecteuclid.org/euclid.aos/1392733186
References
- [1] Alquier, P. and Lounici, K. (2011). PAC-Bayesian bounds for sparse regression estimation with exponential weights. Electron. J. Stat. 5 127–145.Mathematical Reviews (MathSciNet): MR2786484
Digital Object Identifier: doi:10.1214/11-EJS601
Project Euclid: euclid.ejs/1300108317 - [2] Audibert, J.-Y. (2007). Progressive mixture rules are deviation suboptimal. In Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA.
- [3] Audibert, J.-Y. (2009). Fast learning rates in statistical inference through aggregation. Ann. Statist. 37 1591–1646.Mathematical Reviews (MathSciNet): MR2533466
Digital Object Identifier: doi:10.1214/08-AOS623
Project Euclid: euclid.aos/1245332827 - [4] Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.Mathematical Reviews (MathSciNet): MR2268032
Digital Object Identifier: doi:10.1198/016214505000000907 - [5] Boucheron, S., Lugosi, G. and Massart, P. (2012). Concentration Inequalities with Applications. Clarendon Press, Oxford.
- [6] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.Mathematical Reviews (MathSciNet): MR2351101
Digital Object Identifier: doi:10.1214/009053606000001587
Project Euclid: euclid.aos/1188405626 - [7] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Springer, Berlin. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001.Mathematical Reviews (MathSciNet): MR2163920
- [8] Catoni, O. (2007). Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. Institute of Mathematical Statistics Lecture Notes—Monograph Series 56. IMS, Beachwood, OH.Mathematical Reviews (MathSciNet): MR2483528
- [9] Dai, D., Rigollet, P. and Zhang, T. (2012). Deviation optimal learning using greedy $Q$-aggregation. Ann. Statist. 40 1878–1905.Mathematical Reviews (MathSciNet): MR3015047
Digital Object Identifier: doi:10.1214/12-AOS1025
Project Euclid: euclid.aos/1350394520 - [10] Dalalyan, A. S., Ingster, Y. and Tsybakov, A. (2014). Statistical inference in compound functional models. Probab. Theory Related Fields. To appear.Mathematical Reviews (MathSciNet): MR3176357
Digital Object Identifier: doi:10.1007/s00440-013-0487-y - [11] Dalalyan, A. S. and Salmon, J. (2012). Sharp oracle inequalities for aggregation of affine estimators. Ann. Statist. 40 2327–2355.Mathematical Reviews (MathSciNet): MR3059085
Digital Object Identifier: doi:10.1214/12-AOS1038
Project Euclid: euclid.aos/1358951384 - [12] Dalalyan, A. S. and Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97–111. Springer, Berlin.Mathematical Reviews (MathSciNet): MR2397581
Digital Object Identifier: doi:10.1007/978-3-540-72927-3_9 - [13] Dalalyan, A. S. and Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp pac-Bayesian bounds and sparsity. J. Mach. Learn. Res. 72 39–61.
- [14] Dalalyan, A. S. and Tsybakov, A. B. (2010). Mirror averaging with sparsity priors. Bernoulli 18 914–944.Mathematical Reviews (MathSciNet): MR2948907
Digital Object Identifier: doi:10.3150/11-BEJ361
Project Euclid: euclid.bj/1340887008 - [15] Dalalyan, A. S. and Tsybakov, A. B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. J. Comput. System Sci. 78 1423–1443.Mathematical Reviews (MathSciNet): MR2926142
Digital Object Identifier: doi:10.1016/j.jcss.2011.12.023 - [16] Emery, M., Nemirovski, A. and Voiculescu, D. (2000). Lectures on Probability Theory and Statistics. Lecture Notes in Math. 1738. Springer, Berlin.Mathematical Reviews (MathSciNet): MR1775638
- [17] Hiriart-Urruty, J.-B. and Lemaréchal, C. (2001). Fundamentals of Convex Analysis. Grundlehren Text Editions. Springer, Berlin. Abridged version of Convex Analysis and Minimization Algorithms. I [Springer, Berlin, 1993; MR1261420 (95m:90001)] and II [ibid.; MR1295240 (95m:90002)].Mathematical Reviews (MathSciNet): MR1865628
- [18] Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric regression. Ann. Statist. 28 681–712.Mathematical Reviews (MathSciNet): MR1792783
Digital Object Identifier: doi:10.1214/aos/1015951994
Project Euclid: euclid.aos/1015951994 - [19] Juditsky, A., Rigollet, P. and Tsybakov, A. B. (2008). Learning by mirror averaging. Ann. Statist. 36 2183–2206.Mathematical Reviews (MathSciNet): MR2458184
Digital Object Identifier: doi:10.1214/07-AOS546
Project Euclid: euclid.aos/1223908089 - [20] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Springer, Heidelberg.Mathematical Reviews (MathSciNet): MR2829871
- [21] Lecué, G. (2007). Optimal rates of aggregation in classification under low noise assumption. Bernoulli 13 1000–1022.Mathematical Reviews (MathSciNet): MR2364224
Digital Object Identifier: doi:10.3150/07-BEJ6044
Project Euclid: euclid.bj/1194625600 - [22] Lecué, G. (2007). Suboptimality of penalized empirical risk minimization in classification. In Learning Theory. Lecture Notes in Computer Science 4539 142–156. Springer, Berlin.Mathematical Reviews (MathSciNet): MR2397584
Digital Object Identifier: doi:10.1007/978-3-540-72927-3_12 - [23] Lecué, G. and Mendelson, S. (2009). Aggregation via empirical risk minimization. Probab. Theory Related Fields 145 591–613.Mathematical Reviews (MathSciNet): MR2529440
Digital Object Identifier: doi:10.1007/s00440-008-0180-8 - [24] Lecué, G. and Mendelson, S. (2010). Sharper lower bounds on the performance of the empirical risk minimization algorithm. Bernoulli 16 605–613.Mathematical Reviews (MathSciNet): MR2730641
Digital Object Identifier: doi:10.3150/09-BEJ225
Project Euclid: euclid.bj/1281099877 - [25] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin.Mathematical Reviews (MathSciNet): MR1102015
- [26] Lee, W. S., Bartlett, P. L. and Williamson, R. C. (1996). The importance of convexity in learning with squared loss. In Proceedings of the Ninth Annual Conference on Computational Learning Theory 140–146. ACM Press, New York.
- [27] Rigollet, P. (2012). Kullback–Leibler aggregation and misspecified generalized linear models. Ann. Statist. 40 639–665.Mathematical Reviews (MathSciNet): MR2933661
Digital Object Identifier: doi:10.1214/11-AOS961
Project Euclid: euclid.aos/1337268207 - [28] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.Mathematical Reviews (MathSciNet): MR2816337
Digital Object Identifier: doi:10.1214/10-AOS854
Project Euclid: euclid.aos/1299680953 - [29] Rigollet, P. and Tsybakov, A. B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558–575.Mathematical Reviews (MathSciNet): MR3025134
Digital Object Identifier: doi:10.1214/12-STS393
Project Euclid: euclid.ss/1356098556 - [30] Tsybakov, A. B. (2003). Optimal rate of aggregation. In Computational Learning Theory and Kernel Machines (COLT-2003). Lecture Notes in Artificial Intelligence 2777 303–313. Springer, Heidelberg.
- [31] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.Mathematical Reviews (MathSciNet): MR2051002
Digital Object Identifier: doi:10.1214/aos/1079120131
Project Euclid: euclid.aos/1079120131 - [32] Yang, Y. (2000). Combining different procedures for adaptive regression. J. Multivariate Anal. 74 135–161.
- [33] Yang, Y. (2000). Mixing strategies for density estimation. Ann. Statist. 28 75–87.Mathematical Reviews (MathSciNet): MR1762904
Digital Object Identifier: doi:10.1214/aos/1016120365
Project Euclid: euclid.aos/1016120365

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- General nonexact oracle inequalities for classes with a subexponential envelope
Lecué, Guillaume and Mendelson, Shahar, The Annals of Statistics, 2012 - Mirror averaging with sparsity priors
Dalalyan, Arnak S. and Tsybakov, Alexandre B., Bernoulli, 2012 - Optimal exponential bounds for aggregation of density estimators
Bellec, Pierre C., Bernoulli, 2017
- General nonexact oracle inequalities for classes with a subexponential envelope
Lecué, Guillaume and Mendelson, Shahar, The Annals of Statistics, 2012 - Mirror averaging with sparsity priors
Dalalyan, Arnak S. and Tsybakov, Alexandre B., Bernoulli, 2012 - Optimal exponential bounds for aggregation of density estimators
Bellec, Pierre C., Bernoulli, 2017 - Kullback–Leibler aggregation and misspecified generalized linear models
Rigollet, Philippe, The Annals of Statistics, 2012 - Solution of linear ill-posed problems by model selection and aggregation
Abramovich, Felix, De Canditiis, Daniela, and Pensky, Marianna, Electronic Journal of Statistics, 2018 - Deviation optimal learning using greedy $Q$-aggregation
Dai, Dong, Rigollet, Philippe, and Zhang, Tong, The Annals of Statistics, 2012 - Boosting with early stopping: Convergence and consistency
Zhang, Tong and Yu, Bin, The Annals of Statistics, 2005 - Online Learning Discriminative Dictionary with Label Information for Robust Object Tracking
Fan, Baojie, Du, Yingkui, and Cong, Yang, Abstract and Applied Analysis, 2014 - Structured Sparsity through Convex Optimization
Bach, Francis, Jenatton, Rodolphe, Mairal, Julien, and Obozinski, Guillaume, Statistical Science, 2012 - Aggregation of affine estimators
Dai, Dong, Rigollet, Philippe, Xia, Lucy, and Zhang, Tong, Electronic Journal of Statistics, 2014