The Annals of Probability

Critical Phenomena for Sequence Matching with Scoring

Abstract

Consider two independent sequences $X_1,\ldots, X_n$ and $Y_1,\ldots, Y_n$. Suppose that $X_1,\ldots, X_n$ are i.i.d. $\mu X$ and $Y_1,\ldots, Y_n$ are i.i.d. $\mu_Y$, where $\mu_X$ and $\mu_Y$ are distributions on finite alphabets $\sum_X$ and $\sum_Y$, respectively. A score $F: \sum_X \times \sum_Y \rightarrow \mathbb{R}$ is assigned to each pair $(X_i, Y_j)$ and the maximal nonaligned segment score is $M_n = \max_{0\leq i, j \leq n - \Delta, \Delta \geq 0}\{\sum^\Delta_{l=1}F(X_{i+l}, Y_{j+l})\}$. Our result is that $M_n/\log n \rightarrow \gamma^\ast(\mu_X, \mu_Y)$ a.s. with $\gamma^\ast$ determined by a tractable variational formula. Moreover, the pair empirical measure of $(X_{i+l}, Y_{j+l})$ during the segment where $M_n$ is achieved converges to a probability measure $\nu^\ast$, which is accessible by the same formula. These results generalize to $X_i, Y_j$ taking values in any Polish space, to intrasequence scores under shifts, to long quality segments and to more than two sequences.

Article information

Source
Ann. Probab., Volume 22, Number 4 (1994), 1993-2021.

Dates
First available in Project Euclid: 19 April 2007

https://projecteuclid.org/euclid.aop/1176988492

Digital Object Identifier
doi:10.1214/aop/1176988492

Mathematical Reviews number (MathSciNet)
MR1331213

Zentralblatt MATH identifier
0834.60031

JSTOR