The Annals of Probability

Critical Phenomena for Sequence Matching with Scoring

Amir Dembo, Samuel Karlin, and Ofer Zeitouni

Full-text: Open access

Abstract

Consider two independent sequences $X_1,\ldots, X_n$ and $Y_1,\ldots, Y_n$. Suppose that $X_1,\ldots, X_n$ are i.i.d. $\mu X$ and $Y_1,\ldots, Y_n$ are i.i.d. $\mu_Y$, where $\mu_X$ and $\mu_Y$ are distributions on finite alphabets $\sum_X$ and $\sum_Y$, respectively. A score $F: \sum_X \times \sum_Y \rightarrow \mathbb{R}$ is assigned to each pair $(X_i, Y_j)$ and the maximal nonaligned segment score is $M_n = \max_{0\leq i, j \leq n - \Delta, \Delta \geq 0}\{\sum^\Delta_{l=1}F(X_{i+l}, Y_{j+l})\}$. Our result is that $M_n/\log n \rightarrow \gamma^\ast(\mu_X, \mu_Y)$ a.s. with $\gamma^\ast$ determined by a tractable variational formula. Moreover, the pair empirical measure of $(X_{i+l}, Y_{j+l})$ during the segment where $M_n$ is achieved converges to a probability measure $\nu^\ast$, which is accessible by the same formula. These results generalize to $X_i, Y_j$ taking values in any Polish space, to intrasequence scores under shifts, to long quality segments and to more than two sequences.

Article information

Source
Ann. Probab., Volume 22, Number 4 (1994), 1993-2021.

Dates
First available in Project Euclid: 19 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aop/1176988492

Digital Object Identifier
doi:10.1214/aop/1176988492

Mathematical Reviews number (MathSciNet)
MR1331213

Zentralblatt MATH identifier
0834.60031

JSTOR
links.jstor.org

Subjects
Primary: 60F10: Large deviations
Secondary: 60F15: Strong theorems

Keywords
Large deviations strong laws sequence matching large segmental sums

Citation

Dembo, Amir; Karlin, Samuel; Zeitouni, Ofer. Critical Phenomena for Sequence Matching with Scoring. Ann. Probab. 22 (1994), no. 4, 1993--2021. doi:10.1214/aop/1176988492. https://projecteuclid.org/euclid.aop/1176988492


Export citation