Open Access
Translator Disclaimer
November, 1985 Critical Phenomena in Sequence Matching
Richard Arratia, Michael S. Waterman
Ann. Probab. 13(4): 1236-1249 (November, 1985). DOI: 10.1214/aop/1176992808


We give a generalization of the result of Erdos and Renyi on the length $R_n$ of the longest head run in the first $n$ tosses of a coin. Consider two independent sequences, $X_1 X_2\cdots X_m$ and $Y_1Y_2\cdots Y_n$. Suppose that $X_1, X_2,\cdots$ are i.i.d. $\mu$, and $Y_1, Y_2,\cdots$ are i.i.d. $\nu$, where $\mu$ and $\nu$ are possibly different distributions on a common finite alphabet $S$. Let $p \equiv P(X_1 = Y_1) \in (0, 1)$. The length of the longest matching consecutive subsequence is $M_{m,n} \equiv \max \{k: X_{i+r} = Y_{j+r}$ for $r = 1$ to $k$, for some $0 \leq i \leq m - k, 0 \leq j \leq n - k\}$. For $m$ and $n \rightarrow \infty$ with $\log(m)/\log(mn) \rightarrow \lambda \in (0,1)$, our result is that there is a constant $K \equiv K(\mu, \nu, \lambda) \in (0, 1\rbrack$ such that $P(\lim M_{m,n}/\log_{1/p}(mn) = K) = 1$. The proof uses large deviation methods. The constant $K$ is determined from a variational formula involving the Kullback-Liebler distance or relative entropy. A simple necessary and sufficient condition for $K = 1$ is given. For the case $m = n (\lambda = 1/2)$ and $\mu = \nu, K = 1$. The set of $(\mu, \nu, \lambda)$ for which $K = 1$ has nonempty interior. The boundary of this set is the location of a phase transition. The results generalize to more than two sequences and to Markov chains. A strong law of large numbers is given for the proportion of letters within the longest matching word; the limiting proportion exhibits critical behavior, similar to that of $K$.


Download Citation

Richard Arratia. Michael S. Waterman. "Critical Phenomena in Sequence Matching." Ann. Probab. 13 (4) 1236 - 1249, November, 1985.


Published: November, 1985
First available in Project Euclid: 19 April 2007

zbMATH: 0576.60058
MathSciNet: MR806221
Digital Object Identifier: 10.1214/aop/1176992808

Primary: 60J10
Secondary: 68G10 , 94A17

Keywords: Entropy , Kullback-Liebler distance , large deviations , Sequence matching

Rights: Copyright © 1985 Institute of Mathematical Statistics


Vol.13 • No. 4 • November, 1985
Back to Top