Sequential randomized prediction of an arbitrary binary sequence is
investigated. No assumption is made on the mechanism of generating the bit
sequence. The goal of the predictor is to minimize its relative loss (or
regret), that is, to make almost as few mistakes as the best
“expert” in a fixed, possibly infinite, set of experts. We point
out a surprising connection between this prediction problem and empirical
process theory. First, in the special case of static (memoryless) expert, we
completely characterize the minimax regret in terms of the maximum of an
associated Rademacher process. Then we show general upper and lower bounds on
the minimax regret in terms of the geometry of the class of experts. As main
examples, we determine the exact order of magnitude of the minimax regret for
the class of autoregressive linear predictors and for the class of Markov
experts.
References
1 AZUMA, K. 1967 . Weighted sums of certain dependent random variables. Tohoku Math. J. 68 357 367.
2 BILLINGSLEY, P. 1968 . Convergence of Probability Measures. Wiley, New York.
6 CHOW, Y. S. and TEICHER, H. 1988 . Probability Theory, Independence, Interchangeability, Martingales, 2nd ed. Springer, New York.
7 CHUNG, T. H. 1997 . Minimax learning on iterated games via distributional majorization. Ph.D. dissertation, Stanford Univ.
8 COVER, T. M. 1965 . Behavior of sequential predictors of binary sequences. In Proceedings of the Fourth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes 263 272. Publishing House Czechoslovak Academy Sciences, Prague.
9 DEVROYE, L., GYORFI, L. and LUGOSI, G. 1996 . A Probabilistic Theory of Pattern Recogni¨ tion. Springer, New York.
10 FEDER, M., MERHAV, N. and GUTMAN, M. 1992 . Universal prediction of individual sequences. IEEE Trans. Inform. Theory 38 1258 1270.
11 GALAMBOS, J. 1987 . The Asymptotic Theory of Extreme Order Statistics. Wiley, New York.
12 GILBERT, E. N. 1952 . A comparison of signalling alphabets. Bell System Technical J. 31 504 522.
13 GINE, E. 1996 . Empirical processes and applications: an overview. Bernoulli 2 1 28. ´
14 HALL, P. and HEYDE, C. C. 1980 . Martingale Limit Theory and Its Application. Academic Press, New York.
15 HAUSSLER, D. 1995 . Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik Chervonenkis dimension. J. Combin. Theory Ser. A 69 217 232.
16 HOEFFDING, W. 1963 . Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13 30.
17 LEDOUX, M. and TALAGRAND, M. 1991 . Probability in Banach Spaces. Springer, New York.
18 LITTLESTONE, N. and WARMUTH, M. K. 1994 . The weighted majority algorithm. Inform. and Comput. 108 212 261.
19 MCDIARMID, C. 1989 . On the method of bounded differences. In Surveys in Combinatorics 1989 148 188. Cambridge Univ. Press.
20 MERHAV, N. and FEDER, M. 1998 . Universal prediction. IEEE Trans. Inform. Theory 44 2124 2147.
21 MERHAV, N. and WEISSMAN, T. 1999 . Some results on prediction in the presence of noise. Unpublished manuscript.
22 OPPER, M. and HAUSSLER, D. 1998 . Worst case prediction over sequences under log loss. In The Mathematics of Information Coding, Extracting and Distribution. Springer, New York.
23 POLLARD, D. 1989 . Asymptotics via empirical processes. Statist. Sci. 4 341 366.
24 SINGER, A. C. and FEDER, M. 1997 . Universal linear prediction over parameters and model orders. Unpublished manuscript.
25 SZAREK, S. J. 1976 . On the best constants in the Khintchine inequality. Studia Math. 63 197 208.
26 TALAGRAND, M. 1996 . Majorizing measures: the generic chaining. Ann. Probab. 24 1049 1103.
27 VAPNIK, V. N. and CHERVONENKIS, A. Y. 1971 . On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264 280.
28 VOVK, V. G. 1990 . Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory 372 383. ACM Press, New York.
29 VOVK, V. G. 1998 . A game of prediction with expert advice. J. Comput. System Sci. 56 153 173.