The Annals of Probability

String matching bounds via coding

Paul C. Shields

Source: Ann. Probab. Volume 25, Number 1 (1997), 329-336.

Abstract

It is known that the length $L(x_1^n)$ of the longest block appearing at least twice in a randomly chosen sample path of length $n$ drawn from an i.i.d. process is asymptotically almost surely equal to $C \log n$, where the constant $C$ depends on the process. A simple coding argument will be used to show that for a class of processes called the finite energy processes, $L(x_1^n)$ is almost surely upper bounded by $C \log n$, where $C$ is a constant. While the coding technique does not yield the exact constant $C$, it does show clearly what is needed to obtain log $n$ bounds.

Primary Subjects: 60G17
Secondary Subjects: 94A24
Keywords: STring matching; prefix codes

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aop/1024404290
Mathematical Reviews number (MathSciNet): MR1428511
Digital Object Identifier: doi:10.1214/aop/1024404290
Zentralblatt MATH identifier: 0873.60029

References

1 ARRATIA, R. and WATERMAN, M. 1989 . The Erdos Renyi strong law for pattern matching ´ with a given proportion of mismatches. Ann. Probab. 17 1152 1169.
Mathematical Reviews (MathSciNet): MR91c:60031
2 BARRON, A. 1985 . Logically smooth density estimation. Ph.D. dissertation, Dept. Electrical Engineering, Stanford Univ.
3 GRASSBERGER, P. 1989 . Estimating the information content of symbol sequences and efficient codes. IEEE Trans. Inform. Theory 35 669 675.
Mathematical Reviews (MathSciNet): MR90h:94030
4 KONTOYIANNIS, I. and SUHOV, Y. M. 1993 . Prefixes and the entropy rate for long-rangesources. In Probability, Statistics, and Optimization F. P. Kelly, ed. . Wiley, New York.
5 ORNSTEIN, D. and WEISS, B. 1990 . How sampling reveals a process. Ann. Probab. 18 905 930.
Mathematical Reviews (MathSciNet): MR91h:60041
Zentralblatt MATH: 0709.60036
6 ORNSTEIN, D. and WEISS, B. 1993 . Entropy and data compression. IEEE Trans. Inform. Theory IT-39 78 83.
Mathematical Reviews (MathSciNet): MR93m:94012
Zentralblatt MATH: 0764.94003
7 QUAS, A. 1995 . An entropy estimator for a class of infinite alphabet processes. Preprint.
8 SHIELDS, P. 1990 . Universal almost sure data compression using Markov types. Problems Control Inform. Theory 19 269 277.
Mathematical Reviews (MathSciNet): MR91j:94013
Zentralblatt MATH: 0713.94006
9 SHIELDS, P. 1992 . Entropy and prefixes. Ann. Probab. 20 403 409.
Mathematical Reviews (MathSciNet): MR93b:28048
Zentralblatt MATH: 0765.28016
10 SHIELDS, P. 1992 . String matching the general ergodic case. Ann. Probab. 20 1199 1203.
11 SHIELDS, P. 1996 . The Ergodic Theory of Discrete Sample Paths. AMS Graduate Studies in Mathematics 13. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR98g:28029
TOLEDO, OHIO 43606 E-MAIL: pshield2@uoft02.utoledo.edu

2009 © Institute of Mathematical Statistics