We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to the nominal significance level ɛ, but this property per se does not imply that the long-run frequency of error is close to ɛ; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error for the classical prediction intervals does equal the nominal significance level, up to statistical fluctuations. We also describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription.
Read more about accessing full-text
References
Bell, C. B., Blackwell, D. and Breiman, L. (1960). On the completeness of order statistics. Ann. Math. Statist. 31 794–797.
Mathematical Reviews (MathSciNet):
MR116427
Brown, L. D. (1990). An ancillarity paradox which appears in multiple linear regression (with discussion). Ann. Statist. 18 471–538.
Brown, R. L., Durbin, J. and Evans, J. M. (1975). Techniques for testing the constancy of regression relationships over time (with discussion). J. Roy. Statist. Soc. Ser. B 37 149–192.
Mathematical Reviews (MathSciNet):
MR378310
Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning and Games. Cambridge Univ. Press, Cambridge.
Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, London.
Mathematical Reviews (MathSciNet):
MR370837
Dawid, A. P. (1984). Statistical theory: The prequential approach. J. Roy. Statist. Soc. Ser. A 147 278–292.
Mathematical Reviews (MathSciNet):
MR763811
Fisher, R. A. (1925). Applications of “Student’s” distribution. Metron 5 90–104.
Fisher, R. A. (1973). Statistical Methods and Scientific Inference, 3rd ed. Hafner, New York.
Mathematical Reviews (MathSciNet):
MR346955
Fraser, D. A. S. (1957). Nonparametric Methods in Statistics. Wiley, New York.
Mathematical Reviews (MathSciNet):
MR83868
Gammerman, A. and Vovk, V. (2007). Hedging predictions in machine learning (with discussion). Comput. J. 50 151–177.
Lauritzen, S. L. (1988). Extremal Families and Systems of Sufficient Statistics. Lecture Notes in Statistics 49. Springer, New York.
Mathematical Reviews (MathSciNet):
MR971253
Lehmann, E. L. (1986). Testing Statistical Hypotheses, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet):
MR852406
Sampson, A. R. (1974). A tale of two regressions. J. Amer. Statist. Assoc. 69 682–689.
Mathematical Reviews (MathSciNet):
MR375655
Seber, G. A. F. and Lee, A. J. (2003). Linear Regression Analysis, 2nd ed. Wiley, Hoboken, NJ.
Seillier-Moiseiwitsch, F. (1993). Sequential probability forecasts and the probability integral transform. Internat. Statist. Rev. 61 395–408.
Shafer, G. and Vovk, V. (2008). A tutorial on conformal prediction. J. Mach. Learn. Res. 9 371–421.
Takeuchi, K. (1975). Statistical Prediction Theory (in Japanese). Baihūkan, Tokyo.
Vanderlooy, S., van der Maaten, L. and Sprinkhuizen-Kuyper, I. (2007). Off-line learning with transductive confidence machines: An empirical evaluation. Lecture Notes in Artificial Intelligence 4571 310–323.
Vovk, V., Gammerman, A. and Shafer, G. (2005). Algorithmic Learning in a Random World. Springer, New York.
Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. Ann. Math. Statist. 12 91–96.
Mathematical Reviews (MathSciNet):
MR4451