Annals of Statistics

On-line predictive linear regression

Vladimir Vovk, Ilia Nouretdinov, and Alex Gammerman

Full-text: Open access


We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to the nominal significance level ɛ, but this property per se does not imply that the long-run frequency of error is close to ɛ; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error for the classical prediction intervals does equal the nominal significance level, up to statistical fluctuations. We also describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.

Article information

Ann. Statist., Volume 37, Number 3 (2009), 1566-1590.

First available in Project Euclid: 10 April 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J05: Linear regression 62G08: Nonparametric regression
Secondary: 60G25: Prediction theory [See also 62M20] 68Q32: Computational learning theory [See also 68T05]

Gauss linear model independent identically distributed observations multivariate analysis on-line protocol prequential statistics


Vovk, Vladimir; Nouretdinov, Ilia; Gammerman, Alex. On-line predictive linear regression. Ann. Statist. 37 (2009), no. 3, 1566--1590. doi:10.1214/08-AOS622.

Export citation


  • Bell, C. B., Blackwell, D. and Breiman, L. (1960). On the completeness of order statistics. Ann. Math. Statist. 31 794–797.
  • Brown, L. D. (1990). An ancillarity paradox which appears in multiple linear regression (with discussion). Ann. Statist. 18 471–538.
  • Brown, R. L., Durbin, J. and Evans, J. M. (1975). Techniques for testing the constancy of regression relationships over time (with discussion). J. Roy. Statist. Soc. Ser. B 37 149–192.
  • Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning and Games. Cambridge Univ. Press, Cambridge.
  • Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, London.
  • Dawid, A. P. (1984). Statistical theory: The prequential approach. J. Roy. Statist. Soc. Ser. A 147 278–292.
  • Fisher, R. A. (1925). Applications of “Student’s” distribution. Metron 5 90–104.
  • Fisher, R. A. (1973). Statistical Methods and Scientific Inference, 3rd ed. Hafner, New York.
  • Fraser, D. A. S. (1957). Nonparametric Methods in Statistics. Wiley, New York.
  • Gammerman, A. and Vovk, V. (2007). Hedging predictions in machine learning (with discussion). Comput. J. 50 151–177.
  • Lauritzen, S. L. (1988). Extremal Families and Systems of Sufficient Statistics. Lecture Notes in Statistics 49. Springer, New York.
  • Lehmann, E. L. (1986). Testing Statistical Hypotheses, 2nd ed. Springer, New York.
  • Sampson, A. R. (1974). A tale of two regressions. J. Amer. Statist. Assoc. 69 682–689.
  • Seber, G. A. F. and Lee, A. J. (2003). Linear Regression Analysis, 2nd ed. Wiley, Hoboken, NJ.
  • Seillier-Moiseiwitsch, F. (1993). Sequential probability forecasts and the probability integral transform. Internat. Statist. Rev. 61 395–408.
  • Shafer, G. and Vovk, V. (2008). A tutorial on conformal prediction. J. Mach. Learn. Res. 9 371–421.
  • Takeuchi, K. (1975). Statistical Prediction Theory (in Japanese). Baihūkan, Tokyo.
  • Vanderlooy, S., van der Maaten, L. and Sprinkhuizen-Kuyper, I. (2007). Off-line learning with transductive confidence machines: An empirical evaluation. Lecture Notes in Artificial Intelligence 4571 310–323.
  • Vovk, V., Gammerman, A. and Shafer, G. (2005). Algorithmic Learning in a Random World. Springer, New York.
  • Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. Ann. Math. Statist. 12 91–96.