Bernoulli
- Bernoulli
- Volume 18, Number 3 (2012), 914-944.
Mirror averaging with sparsity priors
Arnak S. Dalalyan and Alexandre B. Tsybakov
Full-text: Access has been disabled (more information)
Abstract
We consider the problem of aggregating the elements of a possibly infinite dictionary for building a decision procedure that aims at minimizing a given criterion. Along with the dictionary, an independent identically distributed training sample is available, on which the performance of a given procedure can be tested. In a fairly general set-up, we establish an oracle inequality for the Mirror Averaging aggregate with any prior distribution. By choosing an appropriate prior, we apply this oracle inequality in the context of prediction under sparsity assumption for the problems of regression with random design, density estimation and binary classification.
Article information
Source
Bernoulli Volume 18, Number 3 (2012), 914-944.
Dates
First available in Project Euclid: 28 June 2012
Permanent link to this document
http://projecteuclid.org/euclid.bj/1340887008
Digital Object Identifier
doi:10.3150/11-BEJ361
Mathematical Reviews number (MathSciNet)
MR2948907
Zentralblatt MATH identifier
1243.62008
Keywords
aggregation of estimators mirror averaging oracle inequalities sparsity
Citation
Dalalyan, Arnak S.; Tsybakov, Alexandre B. Mirror averaging with sparsity priors. Bernoulli 18 (2012), no. 3, 914--944. doi:10.3150/11-BEJ361. http://projecteuclid.org/euclid.bj/1340887008.
References
- [1] Abramovich, F., Grinshtein, V. and Pensky, M. (2007). On optimality of Bayesian testimation in the normal means problem. Ann. Statist. 35 2261–2286.Mathematical Reviews (MathSciNet): MR2363971
Zentralblatt MATH: 1126.62003
Digital Object Identifier: doi:10.1214/009053607000000226
Project Euclid: euclid.aos/1194461730 - [2] Alquier, P. (2008). PAC-Bayesian bounds for randomized empirical risk minimizers. Math. Methods Statist. 17 279–304.Mathematical Reviews (MathSciNet): MR2483458
Zentralblatt MATH: 05614400
Digital Object Identifier: doi:10.3103/S1066530708040017 - [3] Audibert, J.Y. (2009). Fast learning rates in statistical inference through aggregation. Ann. Statist. 37 1591–1646.Mathematical Reviews (MathSciNet): MR2533466
Zentralblatt MATH: 05582005
Digital Object Identifier: doi:10.1214/08-AOS623
Project Euclid: euclid.aos/1245332827 - [4] Barron, A (1987). Are Bayes rules consistent in information? In Open Problems in Communication and Computation (T.M. Cover and B. Gopinath, eds.) 85–91. New York: Springer.Mathematical Reviews (MathSciNet): MR922073
- [5] Bartlett, P.L., Jordan, M.I. and McAuliffe, J.D. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.Mathematical Reviews (MathSciNet): MR2268032
Zentralblatt MATH: 1118.62330
Digital Object Identifier: doi:10.1198/016214505000000907 - [6] Bickel, P., Ritov, Y. and Tsybakov, A.B. (2010). Hierarchical selection of variables in sparse high-dimensional regression. In Borrowing Strength: Theory Powering Applications – A Festschrift for Lawrence D. Brown. IMS Collections 6 56–69. IMS, Beachwood, OH.Mathematical Reviews (MathSciNet): MR2798511
- [7] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.Mathematical Reviews (MathSciNet): MR2533469
Zentralblatt MATH: 1173.62022
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830 - [8] Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373–384.Mathematical Reviews (MathSciNet): MR1365720
- [9] Bunea, F. and Nobel, A. (2008). Sequential procedures for aggregating arbitrary estimators of a conditional mean. IEEE Trans. Inform. Theory 54 1725–1735.
- [10] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. (2004). Aggregation for regression learning. Available at arxiv:math/0410214.
- [11] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. (2006). Aggregation and sparsity via l1 penalized least squares. In Learning Theory. Lecture Notes in Computer Science 4005 379–391. Berlin: Springer.Mathematical Reviews (MathSciNet): MR2280619
Zentralblatt MATH: 1143.62319
Digital Object Identifier: doi:10.1007/11776420_29 - [12] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.Mathematical Reviews (MathSciNet): MR2351101
Zentralblatt MATH: 1209.62065
Digital Object Identifier: doi:10.1214/009053606000001587
Project Euclid: euclid.aos/1188405626 - [13] Bunea, F., Tsybakov, A.B. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194 (electronic).Mathematical Reviews (MathSciNet): MR2312149
Zentralblatt MATH: 1146.62028
Digital Object Identifier: doi:10.1214/07-EJS008
Project Euclid: euclid.ejs/1179759718 - [14] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.Mathematical Reviews (MathSciNet): MR2382644
Zentralblatt MATH: 1139.62019
Digital Object Identifier: doi:10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958 - [15] Candès, E.J. (2006). Compressive sampling. In International Congress of Mathematicians. Vol. III 1433–1452. Zürich: Eur. Math. Soc.Mathematical Reviews (MathSciNet): MR2275736
- [16] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Berlin: Springer. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001.
- [17] Catoni, O. (2007). Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. Institute of Mathematical Statistics Lecture Notes—Monograph Series 56. Beachwood, OH: IMS.
- [18] Cesa-Bianchi, N., Conconi, A. and Gentile, C. (2004). On the generalization ability of on-line learning algorithms. IEEE Trans. Inform. Theory 50 2050–2057.
- [19] Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge: Cambridge Univ. Press.
- [20] Dalalyan, A. and Tsybakov, A.B. (2009). Sparse regression learning by aggregation and Langevin Monte-Carlo. In Proceedings of COLT-2009. Published online.
- [21] Dalalyan, A. and Tsybakov, A.B. (2010). Sparse regression learning by aggregation and Langevin Monte-Carlo. J. Comput. System Sci. To appear. Available at arxiv:0903.1223(v3).arXiv: arXiv:0903.1223
Mathematical Reviews (MathSciNet): MR2926142
Digital Object Identifier: doi:10.1016/j.jcss.2011.12.023 - [22] Dalalyan, A. and Tsybakov, A.B. (2008). Aggregation by exponential weighting, sharp oracle inequalities and sparsity. Machine Learning 72 39–61.
- [23] Dalalyan, A.S. and Tsybakov, A.B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97–111. Berlin: Springer.Mathematical Reviews (MathSciNet): MR2397581
Zentralblatt MATH: 1203.62063
Digital Object Identifier: doi:10.1007/978-3-540-72927-3_9 - [24] Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd ed. Applications of Mathematics (New York) 38. New York: Springer.Mathematical Reviews (MathSciNet): MR1619036
- [25] Donoho, D.L., Elad, M. and Temlyakov, V.N. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
- [26] Gaïffas, S. and Lecué, G. (2009). Hyper-sparse optimal aggregation. Available at arXiv:0912.1618.
- [27] Giraud, C. (2008). Mixing least-squares estimators when the variance is unknown. Bernoulli 14 1089–1107.Mathematical Reviews (MathSciNet): MR2543587
Digital Object Identifier: doi:10.3150/08-BEJ135
Project Euclid: euclid.bj/1225980572 - [28] Goldenshluger, A. (2009). A universal procedure for aggregating estimators. Ann. Statist. 37 542–568.Mathematical Reviews (MathSciNet): MR2488362
Zentralblatt MATH: 1155.62018
Digital Object Identifier: doi:10.1214/00-AOS576
Project Euclid: euclid.aos/1232115945 - [29] Haussler, D., Kivinen, J. and Warmuth, M.K. (1998). Sequential prediction of individual sequences under general loss functions. IEEE Trans. Inform. Theory 44 1906–1925.
- [30] Johnstone, I.M. and Silverman, B.W. (2005). Empirical Bayes selection of wavelet thresholds. Ann. Statist. 33 1700–1752.Mathematical Reviews (MathSciNet): MR2166560
Zentralblatt MATH: 1078.62005
Digital Object Identifier: doi:10.1214/009053605000000345
Project Euclid: euclid.aos/1123250227 - [31] Juditsky, A., Rigollet, P. and Tsybakov, A.B. (2008). Learning by mirror averaging. Ann. Statist. 36 2183–2206.Mathematical Reviews (MathSciNet): MR2458184
Zentralblatt MATH: 05368488
Digital Object Identifier: doi:10.1214/07-AOS546
Project Euclid: euclid.aos/1223908089 - [32] Juditsky, A.B., Nazin, A.V., Tsybakov, A.B. and Vayatis, N. (2005). Recursive aggregation of estimators by the mirror descent algorithm with averaging. Probl. Inf. Transm. 41 368–384.Mathematical Reviews (MathSciNet): MR2198228
- [33] Klemelä, J. (2009). Smoothing of Multivariate Data: Density Estimation and Visualization. Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley.Mathematical Reviews (MathSciNet): MR2640738
- [34] Koltchinskii, V. (2009). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799–828.Mathematical Reviews (MathSciNet): MR2555200
Digital Object Identifier: doi:10.3150/09-BEJ187
Project Euclid: euclid.bj/1251463282 - [35] Koltchinskii, V. (2009). Sparse recovery in convex hulls via entropy penalization. Ann. Statist. 37 1332–1359.Mathematical Reviews (MathSciNet): MR2509076
Zentralblatt MATH: 05555707
Digital Object Identifier: doi:10.1214/08-AOS621
Project Euclid: euclid.aos/1239369024 - [36] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7–57.Mathematical Reviews (MathSciNet): MR2500227
Zentralblatt MATH: 1168.62044
Digital Object Identifier: doi:10.1214/07-AIHP146
Project Euclid: euclid.aihp/1234469970 - [37] Lecué, G. (2007). Optimal rates of aggregation in classification under low noise assumption. Bernoulli 13 1000–1022.Mathematical Reviews (MathSciNet): MR2364224
Digital Object Identifier: doi:10.3150/07-BEJ6044
Project Euclid: euclid.bj/1194625600 - [38] Lecué, G. and Mendelson, S. On the optimality of the aggregate with exponential weights for low temperature. Bernoulli. To appear. Available at http://www.e-publications.org/ims/submission/index.php/BEJ/user/submissionFile/8682?confirm=f72acebe.Mathematical Reviews (MathSciNet): MR2730641
Digital Object Identifier: doi:10.3150/09-BEJ225
Project Euclid: euclid.bj/1281099877 - [39] Leung, G. and Barron, A.R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
- [40] Lounici, K. (2007). Generalized mirror averaging and D-convex aggregation. Math. Methods Statist. 16 246–259.Mathematical Reviews (MathSciNet): MR2356820
Zentralblatt MATH: 1231.62046
Digital Object Identifier: doi:10.3103/S1066530707030040 - [41] McAllester, D. (2003). PAC-Bayesian stochastic model selection. Machine Learning 51 5–21.
- [42] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.Mathematical Reviews (MathSciNet): MR2278363
Zentralblatt MATH: 1113.62082
Digital Object Identifier: doi:10.1214/009053606000000281
Project Euclid: euclid.aos/1152540754 - [43] Rigollet, P. and Tsybakov, A.B. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.Mathematical Reviews (MathSciNet): MR2816337
Zentralblatt MATH: 1215.62043
Digital Object Identifier: doi:10.1214/10-AOS854
Project Euclid: euclid.aos/1299680953 - [44] Rivoirard, V. (2006). Nonlinear estimation over weak Besov spaces and minimax Bayes method. Bernoulli 12 609–632.Mathematical Reviews (MathSciNet): MR2248230
Digital Object Identifier: doi:10.3150/bj/1155735929
Project Euclid: euclid.bj/1155735929 - [45] Salmon, J. and Le Pennec, E. (2009). NL-Means and aggregation procedures. Proc. ICIP 2009 2941–2944.
- [46] Seeger, M.W. (2008). Bayesian inference and optimal design for the sparse linear model. J. Mach. Learn. Res. 9 759–813.
- [47] Tsybakov, A.B. (2003). Optimal rates of aggregation. In Computational Learning Theory and Kernel Machines (B. Schölkopf and M. Warmuth, eds.). Lecture Notes in Artificial Intelligence 2777 303–313. Heidelberg: Springer.
- [48] Tsybakov, A.B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.Mathematical Reviews (MathSciNet): MR2051002
Zentralblatt MATH: 1105.62353
Digital Object Identifier: doi:10.1214/aos/1079120131
Project Euclid: euclid.aos/1079120131 - [49] van de Geer, S.A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.Mathematical Reviews (MathSciNet): MR2396809
Zentralblatt MATH: 1138.62323
Digital Object Identifier: doi:10.1214/009053607000000929
Project Euclid: euclid.aos/1205420513 - [50] Vovk, V. (1990). Aggregating strategies. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory, COLT1990 371–386. Morgan Kaufmann: CA.
- [51] Yang, Y. (2001). Adaptive regression by mixing. J. Amer. Statist. Assoc. 96 574–588.Mathematical Reviews (MathSciNet): MR1946426
Zentralblatt MATH: 1018.62033
Digital Object Identifier: doi:10.1198/016214501753168262 - [52] Yang, Y. (2003). Regression with multiple candidate models: Selecting or mixing? Statist. Sinica 13 783–809.
- [53] Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli 10 25–47.Mathematical Reviews (MathSciNet): MR2044592
Digital Object Identifier: doi:10.3150/bj/1077544602
Project Euclid: euclid.bj/1077544602 - [54] Zhang, C.H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.Mathematical Reviews (MathSciNet): MR2435448
Zentralblatt MATH: 1142.62044
Digital Object Identifier: doi:10.1214/07-AOS520
Project Euclid: euclid.aos/1216237292 - [55] Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56–85.Mathematical Reviews (MathSciNet): MR2051001
Zentralblatt MATH: 1105.62323
Digital Object Identifier: doi:10.1214/aos/1079120130
Project Euclid: euclid.aos/1079120130 - [56] Zhang, T. (2006). From ϵ-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.Mathematical Reviews (MathSciNet): MR2291497
Digital Object Identifier: doi:10.1214/009053606000000704
Project Euclid: euclid.aos/1169571794 - [57] Zhang, T. (2009). Some sharp performance bounds for least squares regression with L1 regularization. Ann. Statist. 37 2109–2144.Mathematical Reviews (MathSciNet): MR2543687
Zentralblatt MATH: 1173.62029
Digital Object Identifier: doi:10.1214/08-AOS659
Project Euclid: euclid.aos/1247663750 - [58] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
- [59] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.Mathematical Reviews (MathSciNet): MR2279469
Zentralblatt MATH: 1171.62326
Digital Object Identifier: doi:10.1198/016214506000000735 - [60] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.Mathematical Reviews (MathSciNet): MR2137327
Zentralblatt MATH: 1069.62054
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00503.x

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Learning by mirror averaging
Juditsky, A., Rigollet, P., and Tsybakov, A. B., The Annals of Statistics, 2008 - Sparse Estimation by Exponential Weighting
Rigollet, Philippe and Tsybakov, Alexandre B., Statistical Science, 2012 - Optimal learning with Q-aggregation
Lecué, Guillaume and Rigollet, Philippe, The Annals of Statistics, 2014
- Learning by mirror averaging
Juditsky, A., Rigollet, P., and Tsybakov, A. B., The Annals of Statistics, 2008 - Sparse Estimation by Exponential Weighting
Rigollet, Philippe and Tsybakov, Alexandre B., Statistical Science, 2012 - Optimal learning with Q-aggregation
Lecué, Guillaume and Rigollet, Philippe, The Annals of Statistics, 2014 - Optimal exponential bounds for aggregation of density estimators
Bellec, Pierre C., Bernoulli, 2017 - Lasso type classifiers with a reject option
Wegkamp, Marten, Electronic Journal of Statistics, 2007 - Node harvest
Meinshausen, Nicolai, The Annals of Applied Statistics, 2010 - Estimation of a delta-contaminated density of a random intensity of Poisson data
De Canditiis, Daniela and Pensky, Marianna, Electronic Journal of Statistics, 2016 - Spatial aggregation of local likelihood estimates with applications to classification
Belomestny, Denis and Spokoiny, Vladimir, The Annals of Statistics, 2007 - Oracle inequalities and optimal inference under group sparsity
Lounici, Karim, Pontil, Massimiliano, van de Geer, Sara, and Tsybakov, Alexandre B., The Annals of Statistics, 2011 - Estimator selection in the Gaussian setting
Baraud, Yannick, Giraud, Christophe, and Huet, Sylvie, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, 2014
