The Annals of Statistics

On prediction of individual sequences

Nicolò Cesa-Bianchi and Gábor Lugosi

Source: Ann. Statist. Volume 27, Number 6 (1999), 1865-1895.

Abstract

Sequential randomized prediction of an arbitrary binary sequence is investigated. No assumption is made on the mechanism of generating the bit sequence. The goal of the predictor is to minimize its relative loss (or regret), that is, to make almost as few mistakes as the best “expert” in a fixed, possibly infinite, set of experts. We point out a surprising connection between this prediction problem and empirical process theory. First, in the special case of static (memoryless) expert, we completely characterize the minimax regret in terms of the maximum of an associated Rademacher process. Then we show general upper and lower bounds on the minimax regret in terms of the geometry of the class of experts. As main examples, we determine the exact order of magnitude of the minimax regret for the class of autoregressive linear predictors and for the class of Markov experts.

Primary Subjects: 62C20
Secondary Subjects: 60G20
Keywords: Universal prediction; prediction with experts; absolute loss; empirical processes; covering numbers; finite-state machines

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1017939242
Mathematical Reviews number (MathSciNet): MR1765620
Digital Object Identifier: doi:10.1214/aos/1017939242
Zentralblatt MATH identifier: 0961.62081

References

1 AZUMA, K. 1967 . Weighted sums of certain dependent random variables. Tohoku Math. J. 68 357 367.
Mathematical Reviews (MathSciNet): MR36:4623
Zentralblatt MATH: 0178.21103
2 BILLINGSLEY, P. 1968 . Convergence of Probability Measures. Wiley, New York.
Mathematical Reviews (MathSciNet): MR38:1718
6 CHOW, Y. S. and TEICHER, H. 1988 . Probability Theory, Independence, Interchangeability, Martingales, 2nd ed. Springer, New York.
7 CHUNG, T. H. 1997 . Minimax learning on iterated games via distributional majorization. Ph.D. dissertation, Stanford Univ.
8 COVER, T. M. 1965 . Behavior of sequential predictors of binary sequences. In Proceedings of the Fourth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes 263 272. Publishing House Czechoslovak Academy Sciences, Prague.
9 DEVROYE, L., GYORFI, L. and LUGOSI, G. 1996 . A Probabilistic Theory of Pattern Recogni¨ tion. Springer, New York.
10 FEDER, M., MERHAV, N. and GUTMAN, M. 1992 . Universal prediction of individual sequences. IEEE Trans. Inform. Theory 38 1258 1270.
Mathematical Reviews (MathSciNet): MR93b:94019
11 GALAMBOS, J. 1987 . The Asymptotic Theory of Extreme Order Statistics. Wiley, New York.
Mathematical Reviews (MathSciNet): MR89a:60059
12 GILBERT, E. N. 1952 . A comparison of signalling alphabets. Bell System Technical J. 31 504 522.
13 GINE, E. 1996 . Empirical processes and applications: an overview. Bernoulli 2 1 28. ´
Mathematical Reviews (MathSciNet): MR97e:60060
14 HALL, P. and HEYDE, C. C. 1980 . Martingale Limit Theory and Its Application. Academic Press, New York.
Mathematical Reviews (MathSciNet): MR83a:60001
15 HAUSSLER, D. 1995 . Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik Chervonenkis dimension. J. Combin. Theory Ser. A 69 217 232.
Mathematical Reviews (MathSciNet): MR96f:52027
Zentralblatt MATH: 0818.60005
16 HOEFFDING, W. 1963 . Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13 30.
Mathematical Reviews (MathSciNet): MR26:1908
Zentralblatt MATH: 0127.10602
17 LEDOUX, M. and TALAGRAND, M. 1991 . Probability in Banach Spaces. Springer, New York.
Mathematical Reviews (MathSciNet): MR93c:60001
18 LITTLESTONE, N. and WARMUTH, M. K. 1994 . The weighted majority algorithm. Inform. and Comput. 108 212 261.
Mathematical Reviews (MathSciNet): MR95d:68118
Zentralblatt MATH: 0804.68121
19 MCDIARMID, C. 1989 . On the method of bounded differences. In Surveys in Combinatorics 1989 148 188. Cambridge Univ. Press.
20 MERHAV, N. and FEDER, M. 1998 . Universal prediction. IEEE Trans. Inform. Theory 44 2124 2147.
Mathematical Reviews (MathSciNet): MR99h:94026
Zentralblatt MATH: 0933.94008
21 MERHAV, N. and WEISSMAN, T. 1999 . Some results on prediction in the presence of noise. Unpublished manuscript.
22 OPPER, M. and HAUSSLER, D. 1998 . Worst case prediction over sequences under log loss. In The Mathematics of Information Coding, Extracting and Distribution. Springer, New York.
23 POLLARD, D. 1989 . Asymptotics via empirical processes. Statist. Sci. 4 341 366.
Mathematical Reviews (MathSciNet): MR91e:60112
Zentralblatt MATH: 0955.60517
24 SINGER, A. C. and FEDER, M. 1997 . Universal linear prediction over parameters and model orders. Unpublished manuscript.
25 SZAREK, S. J. 1976 . On the best constants in the Khintchine inequality. Studia Math. 63 197 208.
26 TALAGRAND, M. 1996 . Majorizing measures: the generic chaining. Ann. Probab. 24 1049 1103.
Mathematical Reviews (MathSciNet): MR97k:60097
27 VAPNIK, V. N. and CHERVONENKIS, A. Y. 1971 . On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264 280.
Zentralblatt MATH: 0247.60005
28 VOVK, V. G. 1990 . Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory 372 383. ACM Press, New York.
29 VOVK, V. G. 1998 . A game of prediction with expert advice. J. Comput. System Sci. 56 153 173.
Mathematical Reviews (MathSciNet): MR99m:68201
Zentralblatt MATH: 0945.68528

2010 © Institute of Mathematical Statistics