The Annals of Applied Probability

The asymptotics of waiting times between stationary processes, allowing distortion

Amir Dembo and Ioannis Kontoyiannis

Full-text: Open access

Abstract

Given two independent realizations of the stationary processes $\mathbf{X} = {X_n;n \geq 1}$ and $\mathbf{Y} = {Y_n;n \geq 1}$, our main quantity of interest is the waiting time $W_n(D)$ until a D-close version of the initial string $(X_1, X_2,\dots, X_n)$ first appears as a contiguous substring in $(Y_1, Y_2, Y_3,\dots)$, where closeness is measured with respect to some "average distortion" criterion.

We study the asymptotics of $W_n(D)$ for large n under various mixing conditions on X and Y. We first prove a strong approximation theorem between $\logW_n(D)$ and the logarithm of the probability of a D-ball around $(X_1, X_2,\dots, X_n)$. Using large deviations techniques, we show that this probability can, in turn, be strongly approximated by an associated random walk, and we conclude that: (i) $n^{-1} \log W_n(D)$ converges almost surely to a constant R determined byan explicit variational problem; (ii) $[\log W_n(D) - R]$, properly normalized, satisfies a central limit theorem, a law of the iterated logarithm and, more generally, an almost sure invariance principle.

Article information

Source
Ann. Appl. Probab., Volume 9, Number 2 (1999), 413-429.

Dates
First available in Project Euclid: 21 August 2002

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1029962749

Digital Object Identifier
doi:10.1214/aoap/1029962749

Mathematical Reviews number (MathSciNet)
MR1687410

Zentralblatt MATH identifier
0940.60033

Subjects
Primary: 60F15: Strong theorems
Secondary: 60F10: Large deviations 94A17: Measures of information, entropy

Keywords
Waiting times string matching large deviations relative entropy strong approximation almost sure invariance principle

Citation

Dembo, Amir; Kontoyiannis, Ioannis. The asymptotics of waiting times between stationary processes, allowing distortion. Ann. Appl. Probab. 9 (1999), no. 2, 413--429. doi:10.1214/aoap/1029962749. https://projecteuclid.org/euclid.aoap/1029962749


Export citation

References

  • Arratia, R. and Waterman, M. S. (1994). A phase transition for the score in matching random sequences allowing deletions. Ann. Appl. Probab. 4 200-225.
  • Berger, T. (1971). Rate Distortion Theory. Prentice-Hall, Englewood Cliffs, NJ.
  • Bradley, B. C. (1986). Basic properties of strong mixing conditions. In Progress in Probability and Statistics 11 (E. Eberlein and M. S. Taqqu, eds.) 165-192. Birkhauser, Boston.
  • Deuschel, J. D. and Stroock, D. W. (1989). Large Deviations. Academic Press, Boston.
  • Ibragimov, I. A. (1962). Some limit theorems for stationary processes. Theory Probab. Appl. 7 349-382.
  • Karlin, S. and Ost, F. (1988). Maximal length of common words among random letter sequences. Ann. Probab. 16 535-563.
  • Kontoy iannis, I. (1998). Asy mptotic recurrence and waiting times for stationary processes. J. Theoret. Probab. 11 795-811.
  • Luczak, T. and Szpankowski, W. (1997). A suboptimal lossy data compression based on approximate pattern matching. IEEE Trans. Inform. Theory 43 1439-1451.
  • Marton, K. and Shields, P. C. (1995). Almost-sure waiting time results for weak and very weak Bernoulli processes. Ergodic Theory Dy nam. Sy stems 15 951-960.
  • Peligrad, M. (1986). Recent advances in the central limit theorem and its weak invariance principle for mixing sequences of random variables (a survey). In Progress in Probability and Statistics 11 (E. Eberlein and M. S. Taqqu, eds.) 193-223. Birkhauser, Boston.
  • Pevzner, P., Borodovsky, M. and Mironov, A. (1991). Linguistic of nucleotide sequences: the significance of deviations from mean statistical characteristics and prediction of the frequency of occurrence of words. J. Biomol. Struct. Dy nam. 6 1013-1026.
  • Philipp, W. and Stout, W. (1975). Almost Sure Invariance Principles for Partial Sums of Weakly Dependent Random Variables. Mem. Amer. Soc. 2 Amer. Math. Soc., Providence, RI.
  • Rao, R. R. (1962). Relations between weak and uniform convergence of measures with applications. Ann. Math. Statist. 33 659-680.
  • Rio, E. (1995). The functional law of the iterated logarithm for stationary strongly mixing sequences. Ann. Probab. 23 1188-1203.
  • Shields, P. C. (1993). Waiting times: positive and negative results on the Wy ner-Ziv problem. J. Theoret. Probab. 6 499-519.
  • Steinberg, Y. and Gutman, M. (1993). An algorithm for source coding subject to a fidelity criterion based on string matching. IEEE Trans. Inform. Theory 39 877-886.
  • Strassen, V. (1964). An almost sure invariance principle for the law of the iterated logarithm. Z. Wahrsch. Verw. Gebiete 3 23-32.
  • Szpankowski, W. (1993). Asy mptotic properties of data compression and suffix trees. IEEE Trans. Inform. Theory 39 1647-1659.
  • Wy ner, A. J. (1993). String matching theorems and applications to data compression and statistics. Ph.D. dissertation, Dept. Statistics, Stanford Univ.
  • Wy ner, A. D. and Ziv, J. (1989). Some asy mptotic properties of the entropy of a stationary ergodic data source with applications to data compression. IEEE Trans. Inform. Theory 35 1250-1258.
  • Wy ner, A. D. and Ziv, J. (1991). Fixed data base version of the Lempel-Ziv data compression algorithm. IEEE Trans. Inform. Theory 37 878-880.
  • Yang, E.-H. and Kieffer, J. C. (1998). On the performances of data compression algorithms based upon string matching. IEEE Trans. Inform. Theory 44 47-65.