The Annals of Statistics

Robust linear least squares regression

Jean-Yves Audibert and Olivier Catoni

Full-text: Open access


We consider the problem of robustly predicting as well as the best linear combination of d given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. For the ridge estimator and the ordinary least squares estimator, and their variants, we provide new risk bounds of order d/n without logarithmic factor unlike some standard results, where n is the size of the training data. We also provide a new estimator with better deviations in the presence of heavy-tailed noise. It is based on truncating differences of losses in a min–max framework and satisfies a d/n risk bound both in expectation and in deviations. The key common surprising factor of these results is the absence of exponential moment condition on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. Experimental results strongly back up our truncated min–max estimator.

Article information

Ann. Statist., Volume 39, Number 5 (2011), 2766-2794.

First available in Project Euclid: 22 December 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators

Linear regression generalization error shrinkage PAC-Bayesian theorems risk bounds robust statistics resistant estimators Gibbs posterior distributions randomized estimators statistical learning theory


Audibert, Jean-Yves; Catoni, Olivier. Robust linear least squares regression. Ann. Statist. 39 (2011), no. 5, 2766--2794. doi:10.1214/11-AOS918.

Export citation


  • [1] Audibert, J. Y. and Catoni, O. (2010). Robust linear regression through PAC-Bayesian truncation. Available at arXiv:1010.0072.
  • [2] Audibert, J. Y. and Catoni, O. (2011). Supplement to “Robust linear least squares regression.” DOI:10.1214/11-AOS918SUPP.
  • [3] Baraud, Y. (2000). Model selection for regression on a fixed design. Probab. Theory Related Fields 117 467–493.
  • [4] Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 329–375.
  • [5] Catoni, O. (2010). Challenging the empirical mean and empirical variance: A deviation study. Available at arXiv:1009.2048v1.
  • [6] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2004). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
  • [7] Langford, J. and Shawe-Taylor, J. (2002). PAC-Bayes and margins. In Advances in Neural Information Processing Systems (S. Becker, S. Thrun and K. Obermayer, eds.) 15 423–430. MIT Press, Cambridge, MA.
  • [8] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85–277. Springer, Berlin.
  • [9] Rousseeuw, P. and Yohai, V. (1984). Robust regression by means of S-estimators. In Robust and Nonlinear Time Series Analysis (Heidelberg, 1983). Lecture Notes in Statist. 26 256–272. Springer, New York.
  • [10] Sauvé, M. (2010). Piecewise polynomial estimation of a regression function. IEEE Trans. Inform. Theory 56 597–613.
  • [11] Tsybakov, A. B. (2003). Optimal rates of aggregation. In Computational Learning Theory and Kernel Machines (B. Scholkopf and M. Warmuth, eds.). Lecture Notes in Artificial Intelligence 2777 303–313. Springer, Berlin.
  • [12] Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli 10 25–47.
  • [13] Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. Ann. Statist. 15 642–656.

Supplemental materials