Source: Ann. Statist.
Volume 28, Number 3
Assume that two sequences from a finite alphabet are optimally
aligned according to a scoring system that rewards similarities according to a
general scoring scheme and penalizes gaps (insertions and deletions). Under the
assumption that the letters in each sequence are independent and identically
distributed and the two sequences are also independent, approximate $p$-values
are obtained for the optimal local alignment when either (i) there are at most
a fixed number of gaps, or (ii) the gap initiation cost is sufficiently large.
In the latter case the approximation can be written in the same form as the
well-known case of ungapped alignments.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990). Basic local alignment search tool. J. Molecular Biol. 215 403-410.
Altschul, S. F. and Gish, W. (1996). Local alignment statistics. Methods in Enzymology 266 460-480. Altschul, S. F., Madden, T. L., Sch¨affer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman,
D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 3389-3402.
Arratia, R., Goldstein, L. and Gordon L. (1989). Two moments suffice for Poisson approximation: the Chen-Stein method. Ann. Probab. 17 9-25.
Arratia, R., Gordon, L. and Waterman, M. S. (1990). The Erd¨os-R´enyi Law in distribution for coin tossing and sequence matching. Ann. Statist. 18 539-570.
Asmussen, S. (1989). Risk theory in a Markovian environment. Scand. Actuarial J. 69-100.
Athreya, K. B., McDonald, D. and Ney. P. (1978). Limit theorems for semi-Markov processes and renewal theory for Markov chains. Ann. Probab. 6 788-797.
Chung, K. L. (1974). A Course in Probability Theory. Academic Press, New York.
Mathematical Reviews (MathSciNet): MR346858
Dembo, A., Karlin, S. and Zeitouni, O. (1994). Limit distribution of maximal non-aligned twosequence segmental score. Ann. probab. 22 2022-2039.
Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological Sequence Analysis. Cambridge Univ. Press.
Durrett, R. (1990). Probability: Theory and Examples. Duxbury Press, Belmont, CA.
Hogan, M. and Siegmund, D. (1986). Large deviations for the maximum of some random fields, Adv. in Appl. Math. 7 2-22.
Mathematical Reviews (MathSciNet): MR834217
Karlin, S. and Dembo, A. (1992). Limit distributions of maximal segmental score among Markovdependent parital sums. Adv. in Appl. Probab. 24 113-140.
Lezaud, P. (1998). Chernoff-type bound for finite Markov chains. Ann. Appl. Probab. 8 849-867.
Mott, R. and Tribe, R. (1999). Approximate statistics of gapped alignments. J. Comput. Biol. 6 91-112.
Neuhauser, C. (1994). A Poisson approximation for sequence comparisons with insertions and deletions. Ann. Statist. 22 1603-1629.
Pearson, W. R. (1995). Comparison of methods for searching protein databases. Protein Sci. 4 1145-1160.
Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer, New York.
Mathematical Reviews (MathSciNet): MR799155
Siegmund, D. and Yakir, B. (2000). Tail probabilities for the null distribution of scanning statistics. Bernoulli 6 191-213.
Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. J. Molecular Biol. 147 195-197.
Waterman, M. (1995). Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London.
Waterman, M. and Vingron, M. (1994). Sequence comparison and Poisson approximation. Statist. Sci. 9 367-381.
Williams, D. (1991). Probability and Martingales. Cambridge Univ. Press.
Woodroofe, M. (1982). Nonlinear Renewal Theory in Sequential Analysis. SIAM, Philadelphia.
Yakir, B. and Pollak, M. (1998). A new representation for a renewal-theoretic constant appearing in asymptotic approximations of large deviations. Ann. Appl. Probab. 8 749-774.