The Annals of Statistics
previous :: next

Approximate $p$-values for local sequence alignments

David Siegmund and Benjamin Yakir
Source: Ann. Statist. Volume 28, Number 3 (2000), 657-680.

Abstract

Assume that two sequences from a finite alphabet are optimally aligned according to a scoring system that rewards similarities according to a general scoring scheme and penalizes gaps (insertions and deletions). Under the assumption that the letters in each sequence are independent and identically distributed and the two sequences are also independent, approximate $p$-values are obtained for the optimal local alignment when either (i) there are at most a fixed number of gaps, or (ii) the gap initiation cost is sufficiently large. In the latter case the approximation can be written in the same form as the well-known case of ungapped alignments.

First Page: Show Hide
Primary Subjects: 62M40
Secondary Subjects: 92D10
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1015951993
Mathematical Reviews number (MathSciNet): MR1792782
Digital Object Identifier: doi:10.1214/aos/1015951993
Zentralblatt MATH identifier: 01828957

References

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990). Basic local alignment search tool. J. Molecular Biol. 215 403-410.
Altschul, S. F. and Gish, W. (1996). Local alignment statistics. Methods in Enzymology 266 460-480. Altschul, S. F., Madden, T. L., Sch¨affer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman,
D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 3389-3402.
Arratia, R., Goldstein, L. and Gordon L. (1989). Two moments suffice for Poisson approximation: the Chen-Stein method. Ann. Probab. 17 9-25.
Mathematical Reviews (MathSciNet): MR90b:60021
Zentralblatt MATH: 0675.60017
Digital Object Identifier: doi:10.1214/aop/1176991491
Project Euclid: euclid.aop/1176991491
Arratia, R., Gordon, L. and Waterman, M. S. (1990). The Erd¨os-R´enyi Law in distribution for coin tossing and sequence matching. Ann. Statist. 18 539-570.
Asmussen, S. (1989). Risk theory in a Markovian environment. Scand. Actuarial J. 69-100.
Mathematical Reviews (MathSciNet): MR91g:62082
Zentralblatt MATH: 0684.62073
Athreya, K. B., McDonald, D. and Ney. P. (1978). Limit theorems for semi-Markov processes and renewal theory for Markov chains. Ann. Probab. 6 788-797.
Mathematical Reviews (MathSciNet): MR80f:60076
Zentralblatt MATH: 0397.60052
Digital Object Identifier: doi:10.1214/aop/1176995429
Project Euclid: euclid.aop/1176995429
Chung, K. L. (1974). A Course in Probability Theory. Academic Press, New York.
Mathematical Reviews (MathSciNet): MR346858
Dembo, A., Karlin, S. and Zeitouni, O. (1994). Limit distribution of maximal non-aligned twosequence segmental score. Ann. probab. 22 2022-2039.
Mathematical Reviews (MathSciNet): MR97c:60073
Zentralblatt MATH: 0836.60023
Digital Object Identifier: doi:10.1214/aop/1176988493
Project Euclid: euclid.aop/1176988493
Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological Sequence Analysis. Cambridge Univ. Press.
Durrett, R. (1990). Probability: Theory and Examples. Duxbury Press, Belmont, CA.
Mathematical Reviews (MathSciNet): MR1609153
Zentralblatt MATH: 0709.60002
Hogan, M. and Siegmund, D. (1986). Large deviations for the maximum of some random fields, Adv. in Appl. Math. 7 2-22.
Mathematical Reviews (MathSciNet): MR834217
Zentralblatt MATH: 0612.60029
Digital Object Identifier: doi:10.1016/0196-8858(86)90003-5
Karlin, S. and Dembo, A. (1992). Limit distributions of maximal segmental score among Markovdependent parital sums. Adv. in Appl. Probab. 24 113-140.
Mathematical Reviews (MathSciNet): MR1146522
Zentralblatt MATH: 0767.60017
Digital Object Identifier: doi:10.2307/1427732
Lezaud, P. (1998). Chernoff-type bound for finite Markov chains. Ann. Appl. Probab. 8 849-867.
Mathematical Reviews (MathSciNet): MR99f:60061
Zentralblatt MATH: 0938.60027
Digital Object Identifier: doi:10.1214/aoap/1028903453
Project Euclid: euclid.aoap/1028903453
Mott, R. and Tribe, R. (1999). Approximate statistics of gapped alignments. J. Comput. Biol. 6 91-112.
Neuhauser, C. (1994). A Poisson approximation for sequence comparisons with insertions and deletions. Ann. Statist. 22 1603-1629.
Mathematical Reviews (MathSciNet): MR96d:62025
Zentralblatt MATH: 0817.62013
Digital Object Identifier: doi:10.1214/aos/1176325645
Project Euclid: euclid.aos/1176325645
Pearson, W. R. (1995). Comparison of methods for searching protein databases. Protein Sci. 4 1145-1160.
Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer, New York.
Mathematical Reviews (MathSciNet): MR799155
Siegmund, D. and Yakir, B. (2000). Tail probabilities for the null distribution of scanning statistics. Bernoulli 6 191-213.
Mathematical Reviews (MathSciNet): MR2001e:62036
Zentralblatt MATH: 0976.62048
Digital Object Identifier: doi:10.2307/3318574
Project Euclid: euclid.bj/1081788026
Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. J. Molecular Biol. 147 195-197.
Waterman, M. (1995). Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London.
Waterman, M. and Vingron, M. (1994). Sequence comparison and Poisson approximation. Statist. Sci. 9 367-381.
Mathematical Reviews (MathSciNet): MR1325433
Digital Object Identifier: doi:10.1214/ss/1177010382
Project Euclid: euclid.ss/1177010382
Williams, D. (1991). Probability and Martingales. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1155402
Zentralblatt MATH: 0722.60001
Woodroofe, M. (1982). Nonlinear Renewal Theory in Sequential Analysis. SIAM, Philadelphia.
Mathematical Reviews (MathSciNet): MR83j:62118
Zentralblatt MATH: 0487.62062
Yakir, B. and Pollak, M. (1998). A new representation for a renewal-theoretic constant appearing in asymptotic approximations of large deviations. Ann. Appl. Probab. 8 749-774.
Mathematical Reviews (MathSciNet): MR99f:60152
Zentralblatt MATH: 0937.60082
Digital Object Identifier: doi:10.1214/aoap/1028903449
Project Euclid: euclid.aoap/1028903449
previous :: next

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?