## Bernoulli

• Bernoulli
• Volume 23, Number 1 (2017), 219-248.

### Optimal exponential bounds for aggregation of density estimators

Pierre C. Bellec

#### Abstract

We consider the problem of model selection type aggregation in the context of density estimation. We first show that empirical risk minimization is sub-optimal for this problem and it shares this property with the exponential weights aggregate, empirical risk minimization over the convex hull of the dictionary functions, and all selectors. Using a penalty inspired by recent works on the $Q$-aggregation procedure, we derive a sharp oracle inequality in deviation under a simple boundedness assumption and we show that the rate is optimal in a minimax sense. Unlike the procedures based on exponential weights, this estimator is fully adaptive under the uniform prior. In particular, its construction does not rely on the sup-norm of the unknown density. By providing lower bounds with exponential tails, we show that the deviation term appearing in the sharp oracle inequalities cannot be improved.

#### Article information

Source
Bernoulli, Volume 23, Number 1 (2017), 219-248.

Dates
Revised: April 2015
First available in Project Euclid: 27 September 2016

https://projecteuclid.org/euclid.bj/1475001354

Digital Object Identifier
doi:10.3150/15-BEJ742

Mathematical Reviews number (MathSciNet)
MR3556772

Zentralblatt MATH identifier
1368.62085

#### Citation

Bellec, Pierre C. Optimal exponential bounds for aggregation of density estimators. Bernoulli 23 (2017), no. 1, 219--248. doi:10.3150/15-BEJ742. https://projecteuclid.org/euclid.bj/1475001354

#### References

• [1] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Berlin: Springer.
• [2] Dai, D., Rigollet, P., Xia, L. and Zhang, T. (2014). Aggregation of affine estimators. Electron. J. Stat. 8 302–327.
• [3] Dai, D., Rigollet, P. and Zhang, T. (2012). Deviation optimal learning using greedy $Q$-aggregation. Ann. Statist. 40 1878–1905.
• [4] Dalalyan, A.S. and Salmon, J. (2012). Sharp oracle inequalities for aggregation of affine estimators. Ann. Statist. 40 2327–2355.
• [5] Dalalyan, A.S. and Tsybakov, A.B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97–111. Berlin: Springer.
• [6] Dalalyan, A.S. and Tsybakov, A.B. (2012). Mirror averaging with sparsity priors. Bernoulli 18 914–944.
• [7] Heil, C. (2011). A Basis Theory Primer, Expanded ed. Applied and Numerical Harmonic Analysis. New York: Birkhäuser/Springer.
• [8] Juditsky, A., Rigollet, P. and Tsybakov, A.B. (2008). Learning by mirror averaging. Ann. Statist. 36 2183–2206.
• [9] Kerkyacharian, G., Tsybakov, A.B., Temlyakov, V., Picard, D. and Koltchinskii, V. (2014). Optimal exponential bounds on the accuracy of classification. Constr. Approx. 39 421–444.
• [10] Lecué, G. (2006). Lower bounds and aggregation in density estimation. J. Mach. Learn. Res. 7 971–981.
• [11] Lecué, G. (2013). Empirical risk minimization is optimal for the convex aggregation problem. Bernoulli 19 2153–2166.
• [12] Lecué, G. and Mendelson, S. (2009). Aggregation via empirical risk minimization. Probab. Theory Related Fields 145 591–613.
• [13] Lecué, G. and Mendelson, S. (2010). Sharper lower bounds on the performance of the empirical risk minimization algorithm. Bernoulli 16 605–613.
• [14] Lecué, G. and Mendelson, S. (2013). On the optimality of the aggregate with exponential weights for low temperatures. Bernoulli 19 646–675.
• [15] Lecué, G. and Rigollet, P. (2014). Optimal learning with $Q$-aggregation. Ann. Statist. 42 211–224.
• [16] Leung, G. and Barron, A.R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
• [17] Lounici, K. (2007). Generalized mirror averaging and $D$-convex aggregation. Math. Methods Statist. 16 246–259.
• [18] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer.
• [19] Matousek, J. and Vondrak, J. (2008). The probabilistic method, lecture notes. Available at http://kam.mff.cuni.cz/~matousek/.
• [20] Rigollet, P. (2012). Kullback–Leibler aggregation and misspecified generalized linear models. Ann. Statist. 40 639–665.
• [21] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
• [22] Rigollet, Ph. and Tsybakov, A.B. (2007). Linear and convex aggregation of density estimators. Math. Methods Statist. 16 260–280.
• [23] Rigollet, P. and Tsybakov, A.B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558–575.
• [24] Tsybakov, A.B. (2003). Optimal rates of aggregation. In Learning Theory and Kernel Machines 303–313. Berlin: Springer.
• [25] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer Series in Statistics. Berlin: Springer.
• [26] Wegkamp, M.H. (1999). Quasi-universal bandwidth selection for kernel density estimators. Canad. J. Statist. 27 409–420.
• [27] Yang, Y. (2000). Mixing strategies for density estimation. Ann. Statist. 28 75–87.