The Annals of Applied Probability

The asymptotics of waiting times between stationary processes, allowing distortion

Amir Dembo and Ioannis Kontoyiannis

Source: Ann. Appl. Probab. Volume 9, Number 2 (1999), 413-429.

Abstract

Given two independent realizations of the stationary processes $\mathbf{X} = {X_n;n \geq 1}$ and $\mathbf{Y} = {Y_n;n \geq 1}$, our main quantity of interest is the waiting time $W_n(D)$ until a D-close version of the initial string $(X_1, X_2,\dots, X_n)$ first appears as a contiguous substring in $(Y_1, Y_2, Y_3,\dots)$, where closeness is measured with respect to some "average distortion" criterion.

We study the asymptotics of $W_n(D)$ for large n under various mixing conditions on X and Y. We first prove a strong approximation theorem between $\logW_n(D)$ and the logarithm of the probability of a D-ball around $(X_1, X_2,\dots, X_n)$. Using large deviations techniques, we show that this probability can, in turn, be strongly approximated by an associated random walk, and we conclude that: (i) $n^{-1} \log W_n(D)$ converges almost surely to a constant R determined byan explicit variational problem; (ii) $[\log W_n(D) - R]$, properly normalized, satisfies a central limit theorem, a law of the iterated logarithm and, more generally, an almost sure invariance principle.

Primary Subjects: 60F15
Secondary Subjects: 60F10, 94A17
Keywords: Waiting times; string matching; large deviations; relative entropy; strong approximation; almost sure invariance principle

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoap/1029962749
Mathematical Reviews number (MathSciNet): MR1687410
Digital Object Identifier: doi:10.1214/aoap/1029962749
Zentralblatt MATH identifier: 0940.60033

References

Arratia, R. and Waterman, M. S. (1994). A phase transition for the score in matching random sequences allowing deletions. Ann. Appl. Probab. 4 200-225.
Mathematical Reviews (MathSciNet): MR95b:60024
Zentralblatt MATH: 0809.62008
Berger, T. (1971). Rate Distortion Theory. Prentice-Hall, Englewood Cliffs, NJ.
Bradley, B. C. (1986). Basic properties of strong mixing conditions. In Progress in Probability and Statistics 11 (E. Eberlein and M. S. Taqqu, eds.) 165-192. Birkhauser, Boston.
Mathematical Reviews (MathSciNet): MR88g:60039
Deuschel, J. D. and Stroock, D. W. (1989). Large Deviations. Academic Press, Boston.
Mathematical Reviews (MathSciNet): MR90h:60026
Zentralblatt MATH: 0675.60086
Ibragimov, I. A. (1962). Some limit theorems for stationary processes. Theory Probab. Appl. 7 349-382.
Zentralblatt MATH: 0119.14204
Karlin, S. and Ost, F. (1988). Maximal length of common words among random letter sequences. Ann. Probab. 16 535-563.
Mathematical Reviews (MathSciNet): MR89h:60047
Zentralblatt MATH: 0645.60034
Kontoy iannis, I. (1998). Asy mptotic recurrence and waiting times for stationary processes. J. Theoret. Probab. 11 795-811.
Mathematical Reviews (MathSciNet): MR99e:60102
Luczak, T. and Szpankowski, W. (1997). A suboptimal lossy data compression based on approximate pattern matching. IEEE Trans. Inform. Theory 43 1439-1451.
Marton, K. and Shields, P. C. (1995). Almost-sure waiting time results for weak and very weak Bernoulli processes. Ergodic Theory Dy nam. Sy stems 15 951-960.
Mathematical Reviews (MathSciNet): MR97a:28011
Peligrad, M. (1986). Recent advances in the central limit theorem and its weak invariance principle for mixing sequences of random variables (a survey). In Progress in Probability and Statistics 11 (E. Eberlein and M. S. Taqqu, eds.) 193-223. Birkhauser, Boston.
Mathematical Reviews (MathSciNet): MR88j:60053
Pevzner, P., Borodovsky, M. and Mironov, A. (1991). Linguistic of nucleotide sequences: the significance of deviations from mean statistical characteristics and prediction of the frequency of occurrence of words. J. Biomol. Struct. Dy nam. 6 1013-1026.
Philipp, W. and Stout, W. (1975). Almost Sure Invariance Principles for Partial Sums of Weakly Dependent Random Variables. Mem. Amer. Soc. 2 Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR55:6570
Rao, R. R. (1962). Relations between weak and uniform convergence of measures with applications. Ann. Math. Statist. 33 659-680.
Rio, E. (1995). The functional law of the iterated logarithm for stationary strongly mixing sequences. Ann. Probab. 23 1188-1203.
Mathematical Reviews (MathSciNet): MR96f:60054
Zentralblatt MATH: 0833.60024
Shields, P. C. (1993). Waiting times: positive and negative results on the Wy ner-Ziv problem. J. Theoret. Probab. 6 499-519.
Mathematical Reviews (MathSciNet): MR94i:28015
Steinberg, Y. and Gutman, M. (1993). An algorithm for source coding subject to a fidelity criterion based on string matching. IEEE Trans. Inform. Theory 39 877-886.
Mathematical Reviews (MathSciNet): MR94g:94011
Strassen, V. (1964). An almost sure invariance principle for the law of the iterated logarithm. Z. Wahrsch. Verw. Gebiete 3 23-32.
Mathematical Reviews (MathSciNet): MR30:5379
Szpankowski, W. (1993). Asy mptotic properties of data compression and suffix trees. IEEE Trans. Inform. Theory 39 1647-1659.
Mathematical Reviews (MathSciNet): MR1281713
Wy ner, A. J. (1993). String matching theorems and applications to data compression and statistics. Ph.D. dissertation, Dept. Statistics, Stanford Univ.
Wy ner, A. D. and Ziv, J. (1989). Some asy mptotic properties of the entropy of a stationary ergodic data source with applications to data compression. IEEE Trans. Inform. Theory 35 1250-1258.
Mathematical Reviews (MathSciNet): MR91k:94018
Wy ner, A. D. and Ziv, J. (1991). Fixed data base version of the Lempel-Ziv data compression algorithm. IEEE Trans. Inform. Theory 37 878-880.
Yang, E.-H. and Kieffer, J. C. (1998). On the performances of data compression algorithms based upon string matching. IEEE Trans. Inform. Theory 44 47-65.

2010 © Institute of Mathematical Statistics