Electronic Journal of Statistics

Aggregation of affine estimators

Dong Dai, Philippe Rigollet, Lucy Xia, and Tong Zhang

Full-text: Open access


We consider the problem of aggregating a general collection of affine estimators for fixed design regression. Relevant examples include some commonly used statistical estimators such as least squares, ridge and robust least squares estimators. Dalalyan and Salmon [DS12] have established that, for this problem, exponentially weighted (EW) model selection aggregation leads to sharp oracle inequalities in expectation, but similar bounds in deviation were not previously known. While results [DRZ12] indicate that the same aggregation scheme may not satisfy sharp oracle inequalities with high probability, we prove that a weaker notion of oracle inequality for EW that holds with high probability. Moreover, using a generalization of the newly introduced $Q$-aggregation scheme we also prove sharp oracle inequalities that hold with high probability. Finally, we apply our results to universal aggregation and show that our proposed estimator leads simultaneously to all the best known bounds for aggregation, including $\ell_{q}$-aggregation, $q\in(0,1)$, with high probability.

Article information

Electron. J. Statist., Volume 8, Number 1 (2014), 302-327.

First available in Project Euclid: 10 April 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62C20: Minimax procedures 62G05: Estimation 62G20: Asymptotic properties

Aggregation affine estimators Gaussian mean oracle inequalities Maurey’s argument


Dai, Dong; Rigollet, Philippe; Xia, Lucy; Zhang, Tong. Aggregation of affine estimators. Electron. J. Statist. 8 (2014), no. 1, 302--327. doi:10.1214/14-EJS886. https://projecteuclid.org/euclid.ejs/1397134174

Export citation


  • [AL11] Pierre Alquier and Karim Lounici, PAC-Bayesian bounds for sparse regression estimation with exponential weights, Electron. J. Stat. 5 (2011), 127–145.
  • [Aud08] Jean-Yves Audibert, Progressive mixture rules are deviation suboptimal, Advances in Neural Information Processing Systems 20 (J.C. Platt, D. Koller, Y. Singer, and S. Roweis, eds.), MIT Press, Cambridge, MA, 2008, pp. 41–48.
  • [BRT09] Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector, Ann. Statist. 37 (2009), no. 4, 1705–1732.
  • [BTW07] Florentina Bunea, Alexandre B. Tsybakov, and Marten H. Wegkamp, Sparsity oracle inequalities for the Lasso, Electron. J. Stat. 1 (2007), 169–194 (electronic).
  • [Cat99] O. Catoni, Universal aggregation rules with exact bias bounds., Tech. report, Laboratoire de Probabilités et Modeles Aléatoires, Preprint 510., 1999.
  • [Cat04] Olivier Catoni, Statistical learning theory and stochastic optimization, Lecture Notes in Mathematics, vol. 1851, Springer-Verlag, Berlin, 2004, Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001.
  • [Coh66] Arthur Cohen, All admissible linear estimates of the mean vector, Ann. Math. Statist. 37 (1966), 458–463.
  • [CT01] L. Cavalier and A. B. Tsybakov, Penalized blockwise Stein’s method, monotone oracles and sharp adaptive estimation, Math. Methods Statist. 10 (2001), no. 3, 247–282, Meeting on Mathematical Statistics (Marseille, 2000).
  • [DRZ12] Dong Dai, Philippe Rigollet, and Tong Zhang, Deviation optimal learning using greedy $Q$-aggregation, Ann. Statist. 40 (2012), no. 3, 1878–1905.
  • [DS12] Arnak S. Dalalyan and Joseph Salmon, Sharp oracle inequalities for aggregation of affine estimators, Ann. Statist. 40 (2012), no. 4, 2327–2355.
  • [DT07] Arnak S. Dalalyan and Alexandre B. Tsybakov, Aggregation by exponential weighting and sharp oracle inequalities, Learning theory, Lecture Notes in Comput. Sci., vol. 4539, Springer, Berlin, 2007, pp. 97–111.
  • [DT08] A. Dalalyan and A. B. Tsybakov, Aggregation by exponential weighting, sharp PAC-bayesian bounds and sparsity, Machine Learning 72 (2008), no. 1, 39–61.
  • [FPRU10] Simon Foucart, Alain Pajor, Holger Rauhut, and Tino Ullrich, The Gelfand widths of $\ell_{p}$-balls for $0<p\leq 1$, J. Complexity 26 (2010), no. 6, 629–640.
  • [Gir08] Christophe Giraud, Mixing least-squares estimators when the variance is unknown, Bernoulli 14 (2008), no. 4, 1089–1107.
  • [GN92] G. K. Golubev and M. Nussbaum, Adaptive spline estimates in a nonparametric regression model, Teor. Veroyatnost. i Primenen. 37 (1992), no. 3, 554–561.
  • [Gru98] Marvin H. J. Gruber, Improving efficiency by shrinkage, Statistics: Textbooks and Monographs, vol. 156, Marcel Dekker Inc., New York, 1998, The James-Stein and ridge regression estimators.
  • [JN00] Anatoli Juditsky and Arkadii Nemirovski, Functional aggregation for nonparametric regression, Ann. Statist. 28 (2000), no. 3, 681–712.
  • [Joh11] Iain M. Johnstone, Gaussian estimation: Sequence and wavelet models, Unpublished Manuscript., December 2011.
  • [LB06] Gilbert Leung and A. R. Barron, Information theory and mixing least-squares regressions, Information Theory, IEEE Transactions on 52 (2006), no. 8, 3396–3410.
  • [Lec07] Guillaume Lecué, Optimal rates of aggregation in classification under low noise assumption, Bernoulli 13 (2007), no. 4, 1000–1022.
  • [LM00] B. Laurent and P. Massart, Adaptive estimation of a quadratic functional by model selection, Ann. Statist. 28 (2000), no. 5, 1302–1338.
  • [LM09] Guillaume Lecué and Shahar Mendelson, Aggregation via empirical risk minimization, Probab. Theory Related Fields 145 (2009), no. 3-4, 591–613.
  • [LM12] , General nonexact oracle inequalities for classes with a subexponential envelope, Ann. Statist. 40 (2012), no. 2, 832–860.
  • [Lou07] Karim Lounici, Generalized mirror averaging and $D$-convex aggregation, Math. Methods Statist. 16 (2007), no. 3, 246–259.
  • [LR14] Guillaume Lecué and Philippe Rigollet, Optimal learning with $q$-aggregation, Ann. Statist. 42 (2014), no. 1, 211–224.
  • [Nem00] Arkadi Nemirovski, Topics in non-parametric statistics, Lectures on probability theory and statistics (Saint-Flour, 1998), Lecture Notes in Math., vol. 1738, Springer, Berlin, 2000, pp. 85–277.
  • [Pin80] M. S. Pinsker, Optimal filtration of square-integrable signals in Gaussian noise, Probl. Inf. Transm. (Russian) 16 (1980), no. 2, 52–68.
  • [Pis81] G. Pisier, Remarques sur un résultat non publié de B. Maurey, Seminar on Functional Analysis, 1980–1981, École Polytech., Palaiseau, 1981, pp. Exp. No. V, 13.
  • [Rig12] Philippe Rigollet, Kullback-Leibler aggregation and misspecified generalized linear models, Ann. Statist. 40 (2012), no. 2, 639–665.
  • [RT07] Ph. Rigollet and A. B. Tsybakov, Linear and convex aggregation of density estimators, Math. Methods Statist. 16 (2007), no. 3, 260–280.
  • [RT11] P. Rigollet and A. Tsybakov, Exponential Screening and optimal rates of sparse estimation, Ann. Statist. 39 (2011), no. 2, 731–771.
  • [RT12] , Sparse estimation by exponential weighting, Statistical Science 27 (2012), no. 4, 558–575.
  • [RWY11] Garvesh Raskutti, Martin J. Wainwright, and Bin Yu, Minimax rates of estimation for high-dimensional linear regression over $\ell_{q}$-balls, IEEE Trans. Inform. Theory 57 (2011), no. 10, 6976–6994.
  • [Ste56] Charles Stein, Inadmissibility of the usual estimator for the mean of a multivariate normal distribution, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954–1955, vol. I (Berkeley and Los Angeles), University of California Press, 1956, pp. 197–206.
  • [Tsy03] A. B. Tsybakov, Optimal rates of aggregation, COLT, 2003, pp. 303–313.
  • [Tsy09] Alexandre B. Tsybakov, Introduction to nonparametric estimation, Springer Series in Statistics, Springer, New York, 2009, Revised and extended from the 2004 French original, Translated by Vladimir Zaiats.
  • [WPGY11] Zhan Wang, Sandra Paterlini, Frank Gao, and Yuhong Yang, Adaptive minimax estimation over sparse $\ell_{q}$-hulls, Arxiv:1108.1961 (2011).
  • [Yan99] Y. Yang, Model selection for nonparametric regression, Statistica Sinica 9 (1999), 475–500.
  • [Yan04] Yuhong Yang, Aggregating regression procedures to improve performance, Bernoulli 10 (2004), no. 1, 25–47.