The Annals of Statistics

A universal procedure for aggregating estimators

Alexander Goldenshluger

Full-text: Open access


In this paper we study the aggregation problem that can be formulated as follows. Assume that we have a family of estimators $\mathcal{F}$ built on the basis of available observations. The goal is to construct a new estimator whose risk is as close as possible to that of the best estimator in the family. We propose a general aggregation scheme that is universal in the following sense: it applies for families of arbitrary estimators and a wide variety of models and global risk measures. The procedure is based on comparison of empirical estimates of certain linear functionals with estimates induced by the family $\mathcal{F}$. We derive oracle inequalities and show that they are unimprovable in some sense. Numerical results demonstrate good practical behavior of the procedure.

Article information

Ann. Statist., Volume 37, Number 1 (2009), 542-568.

First available in Project Euclid: 16 January 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62G05: Estimation 62G20: Asymptotic properties

Aggregation lower bound normal means model oracle inequalities sparse vectors white noise model


Goldenshluger, Alexander. A universal procedure for aggregating estimators. Ann. Statist. 37 (2009), no. 1, 542--568. doi:10.1214/00-AOS576.

Export citation


  • Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • Audibert, J.-Y. (2004). Aggregated estimators and empirical complexity for least square regression. Ann. Inst. H. Poincaré Probab. Statist. 40 685–736.
  • Birgé, L. (2006). Model selection via testing: An alternative to (penalized) maximum likelihood estimators. Ann. Inst. H. Poincaré Probab. Statist. 42 273–325.
  • Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Aggregation for regression learning. Ann. Statist. 35 1674–1697.
  • Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Ecole d’Ete de Probabilities de Saint–Flour 2001. Lecture Notes in Math. 1851. Springer, Berlin.
  • Cavalier, L., Golubev, G. K., Picard, D. and Tsybakov, A. B. (2002). Oracle inequalities for inverse problems. Ann. Statist. 30 843–874.
  • Devroye, L. (1987). A Course in Density Estimation. Birkhäuser, Boston.
  • Devroye, L. and Lugosi, G. (1996). A universally acceptable smoothing factor for kernel density estimation. Ann. Statist. 24 2499–2512.
  • Devroye, L. and Lugosi, G. (1997). Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes. Ann. Statist. 25 2626–2637.
  • Devroye, L. and Lugosi, G. (2001). Combinatorial Methods in Density Estimation. Springer, New York.
  • Hengartner, N. and Wegkamp, M. (2001). Estimation and selection procedures in regression: An L1 approach. Canad. J. Statist. 29 621–632.
  • Goldenshluger, A. and Lepski, O. (2007). Structural adaptation via Lp-norm oracle inequalities. Probab. Theory Related Fields. To appear.
  • Golubev, G. K. (2002). Reconstruction of sparse vectors in white Gaussian noise. Problems Inform. Transmission 38 65–79.
  • Johnstone, I. (1998). Function Estimation in Gaussian Noise: Sequence Models. Available at
  • Johnstone, I. and Silverman, B. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594–1649.
  • Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric regression. Ann. Statist. 28 681–712.
  • Juditsky, A., Rigollet, Ph. and Tsybakov, A. B. (2008). Learning by mirror averaging. Ann. Statist. 36 2183–2206.
  • Kneip, A. (1994). Ordered linear smoothers. Ann. Statist. 22 835–866.
  • Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization (with discussion). Ann. Statist. 34 2593–2656.
  • Lecué, G. (2007). Simultaneous adaptation to the margin and to complexity in classification. Ann. Statist. 35 1698–1721.
  • Lepski, O. V. and Spokoiny, V. G. (1997). Optimal pointwise adaptive methods in nonparametric estimation. Ann. Statist. 25 2512–2546.
  • Nemirovski, A. S. (1985). Nonparametric estimation of smooth regression functions. Soviet J. Comput. Systems Sci. 23 1–11; translated from Izv. Akad. Nauk SSSR Tekhn. Kibernet. (1985) 50–60, 235 (in Russian).
  • Nemirovski, A. (2000). Topics in non-parametric statistics. Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85–277. Springer, Berlin.
  • Tsybakov, A. (2003). Optimal rates of aggregation. In Proceedings of 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines. Lecture Notes in Artificial Intelligence 2777 303–313. Springer, Heidelberg.
  • Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252–273.
  • Yang, Y. (2000). Mixing strategies for density estimation. Ann. Statist. 28 75–87.
  • Yang, Y. (2001). Adaptive regression by mixing. J. Amer. Statist. Assoc. 96 135–161.
  • Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli 10 25–47.