• Bernoulli
  • Volume 23, Number 1 (2017), 219-248.

Optimal exponential bounds for aggregation of density estimators

Pierre C. Bellec

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We consider the problem of model selection type aggregation in the context of density estimation. We first show that empirical risk minimization is sub-optimal for this problem and it shares this property with the exponential weights aggregate, empirical risk minimization over the convex hull of the dictionary functions, and all selectors. Using a penalty inspired by recent works on the $Q$-aggregation procedure, we derive a sharp oracle inequality in deviation under a simple boundedness assumption and we show that the rate is optimal in a minimax sense. Unlike the procedures based on exponential weights, this estimator is fully adaptive under the uniform prior. In particular, its construction does not rely on the sup-norm of the unknown density. By providing lower bounds with exponential tails, we show that the deviation term appearing in the sharp oracle inequalities cannot be improved.

Article information

Bernoulli Volume 23, Number 1 (2017), 219-248.

Received: January 2015
Revised: April 2015
First available in Project Euclid: 27 September 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

aggregation concentration inequality density estimation minimax lower bounds minimax optimality model selection sharp oracle inequality


Bellec, Pierre C. Optimal exponential bounds for aggregation of density estimators. Bernoulli 23 (2017), no. 1, 219--248. doi:10.3150/15-BEJ742.

Export citation


  • [1] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Berlin: Springer.
  • [2] Dai, D., Rigollet, P., Xia, L. and Zhang, T. (2014). Aggregation of affine estimators. Electron. J. Stat. 8 302–327.
  • [3] Dai, D., Rigollet, P. and Zhang, T. (2012). Deviation optimal learning using greedy $Q$-aggregation. Ann. Statist. 40 1878–1905.
  • [4] Dalalyan, A.S. and Salmon, J. (2012). Sharp oracle inequalities for aggregation of affine estimators. Ann. Statist. 40 2327–2355.
  • [5] Dalalyan, A.S. and Tsybakov, A.B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97–111. Berlin: Springer.
  • [6] Dalalyan, A.S. and Tsybakov, A.B. (2012). Mirror averaging with sparsity priors. Bernoulli 18 914–944.
  • [7] Heil, C. (2011). A Basis Theory Primer, Expanded ed. Applied and Numerical Harmonic Analysis. New York: Birkhäuser/Springer.
  • [8] Juditsky, A., Rigollet, P. and Tsybakov, A.B. (2008). Learning by mirror averaging. Ann. Statist. 36 2183–2206.
  • [9] Kerkyacharian, G., Tsybakov, A.B., Temlyakov, V., Picard, D. and Koltchinskii, V. (2014). Optimal exponential bounds on the accuracy of classification. Constr. Approx. 39 421–444.
  • [10] Lecué, G. (2006). Lower bounds and aggregation in density estimation. J. Mach. Learn. Res. 7 971–981.
  • [11] Lecué, G. (2013). Empirical risk minimization is optimal for the convex aggregation problem. Bernoulli 19 2153–2166.
  • [12] Lecué, G. and Mendelson, S. (2009). Aggregation via empirical risk minimization. Probab. Theory Related Fields 145 591–613.
  • [13] Lecué, G. and Mendelson, S. (2010). Sharper lower bounds on the performance of the empirical risk minimization algorithm. Bernoulli 16 605–613.
  • [14] Lecué, G. and Mendelson, S. (2013). On the optimality of the aggregate with exponential weights for low temperatures. Bernoulli 19 646–675.
  • [15] Lecué, G. and Rigollet, P. (2014). Optimal learning with $Q$-aggregation. Ann. Statist. 42 211–224.
  • [16] Leung, G. and Barron, A.R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
  • [17] Lounici, K. (2007). Generalized mirror averaging and $D$-convex aggregation. Math. Methods Statist. 16 246–259.
  • [18] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer.
  • [19] Matousek, J. and Vondrak, J. (2008). The probabilistic method, lecture notes. Available at
  • [20] Rigollet, P. (2012). Kullback–Leibler aggregation and misspecified generalized linear models. Ann. Statist. 40 639–665.
  • [21] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • [22] Rigollet, Ph. and Tsybakov, A.B. (2007). Linear and convex aggregation of density estimators. Math. Methods Statist. 16 260–280.
  • [23] Rigollet, P. and Tsybakov, A.B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558–575.
  • [24] Tsybakov, A.B. (2003). Optimal rates of aggregation. In Learning Theory and Kernel Machines 303–313. Berlin: Springer.
  • [25] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer Series in Statistics. Berlin: Springer.
  • [26] Wegkamp, M.H. (1999). Quasi-universal bandwidth selection for kernel density estimators. Canad. J. Statist. 27 409–420.
  • [27] Yang, Y. (2000). Mixing strategies for density estimation. Ann. Statist. 28 75–87.