Abstract
Let $X_1, X_2, \ldots$ be a sequence of independently and identically distributed integer-valued random variables. Let $Y_{t - m + 1,t}$ for $t = m, m + 1,\ldots$ denote a moving sum of $m$ consecutive $X_i$'s. Let $N_{m,T} = \max_{m \leq t \leq T} \{Y_{t - m + 1,t}\}$ and let $\tau_{k,m}$ be the waiting time until the moving sum of $X_i$'s in a scanning window of $m$ trials is as large as $k$. We derive tight bounds for the equivalent probabilities $P(\tau_{k,m} > T) = P(N_{m,T} < k)$. We apply the bounds for two problems in molecular biology: the distribution of the length of the longest almost-matching subsequence in aligned amino acid sequences and the distribution of the largest net charge within any $m$ consecutive positions in a charged alphabet string.
Citation
Joseph Glaz. Joseph I. Naus. "Tight Bounds and Approximations for Scan Statistic Probabilities for Discrete Data." Ann. Appl. Probab. 1 (2) 306 - 318, May, 1991. https://doi.org/10.1214/aoap/1177005940
Information