Journal of Applied Probability

Moderate deviations for word counts in biological sequences

Sarah Behrens and Matthias Löwe
Source: J. Appl. Probab. Volume 46, Number 4 (2009), 1020-1037.

Abstract

We derive a moderate deviation principle for word counts (which is extended to counts of multiple patterns) in biological sequences under different models: independent and identically distributed letters, homogeneous Markov chains of order 1 and $m$, and, in view of the codon structure of DNA sequences, Markov chains with three different transition matrices. This enables us to approximate P-values for the number of word occurrences in DNA and protein sequences in a new manner.

First Page: Show Hide
Primary Subjects: 60F99, 92D20
Secondary Subjects: 60J10, 60J20, 60G50
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.jap/1261670686
Digital Object Identifier: doi:10.1239/jap/1261670686
Zentralblatt MATH identifier: 05665453
Mathematical Reviews number (MathSciNet): MR2582704

References

Behrens, S. (2008). Moderate und groß e abweichungen zur statistischen analyse biologischer sequenzen. Doctoral Thesis, Universität Münster.
Blaisdell, B. E. (1985). Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J. Molec. Evol. 21, 278--288.
Chen, X. (1999). Limit theorems for functionals of ergodic Markov chains with general state space. Mem. Amer. Math. Soc. 139.
Mathematical Reviews (MathSciNet): MR1491814
Zentralblatt MATH: 0952.60014
Chung, K. L. (1967). Markov Chains With Stationary Transition Probabilities, 2nd edn. Springer, New York.
Mathematical Reviews (MathSciNet): MR217872
Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd edn. Springer, New York.
Mathematical Reviews (MathSciNet): MR1619036
Djellout, H. and Guillin, A. (2001). Moderate deviations for Markov chains with atom. Stoch. Process. Appl. 95, 203--217.
Mathematical Reviews (MathSciNet): MR1854025
Zentralblatt MATH: 1059.60029
Digital Object Identifier: doi:10.1016/S0304-4149(01)00100-4
Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological Sequence Analysis. Cambridge University Press.
Zentralblatt MATH: 0929.92010
Hunter, J. J. (2008). Variances of first passage times in a Markov chain with applications to mixing times. Linear Algebra Appl. 429, 1135--1162.
Mathematical Reviews (MathSciNet): MR2433169
Zentralblatt MATH: 1146.60054
Digital Object Identifier: doi:10.1016/j.laa.2007.06.016
Kleffe J. and Borodovsky M. (1992). First and second moment of counts of words in random texts generated by Markov chains. Comput. Appl. Biosci. 8, 433--441.
Kleffe, J. and Langbecker, U. (1990). Exact computation of pattern probabilities in random sequences generated by Markov chains. Comput. Appl. Biosci. 6, 347--353.
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1102015
Zentralblatt MATH: 0748.60004
Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.
Mathematical Reviews (MathSciNet): MR1287609
Nuel, G. (2001). Grandes déviations et chaînes de Markov pour l'étude des occurrences de mots dans les séquences biologiques. Doctoral Thesis, Université d'Essonne.
Nuel, G. (2006). Numerical solutions for patterns statistics on Markov chains. Statist. Appl. Genet. Molec. Biol. 5, 45 pp.
Mathematical Reviews (MathSciNet): MR2306489
Zentralblatt MATH: 1166.62324
Digital Object Identifier: doi:10.2202/1544-6115.1219
Nussinov, R. (1981). The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice. J. Molec. Evol. 17, 237--244.
Pitman, J. W. (1974). Uniform rates of convergence for Markov chain transition probabilities. Z. Wahrscheinlichkeitsth. 29, 193--227.
Mathematical Reviews (MathSciNet): MR373012
Digital Object Identifier: doi:10.1007/BF00536280
Prum, B., Rodolphe, F. and de Turckheim, È. (1995). Finding words with unexpected frequencies in desoxyribonucleic acid sequences. J. R. Statist. Soc. B 57, 205--220.
Mathematical Reviews (MathSciNet): MR1325386
Régnier, M. (2000). A unified approach to word occurrence probabilities. Discrete Appl. Math. 104, 259--280.
Mathematical Reviews (MathSciNet): MR1781779
Digital Object Identifier: doi:10.1016/S0166-218X(00)00195-5
Reinert, G., Schbath, S. and Waterman, M. S. (2005). Probabilistic and statistical properties of finite words in finite sequences. In Applied Combinatorics on Words, eds J. Berstel and D. Perrin, Cambridge University Press.
Mathematical Reviews (MathSciNet): MR2165687
Zentralblatt MATH: 1133.68067
Robin, S. and Daudin, J. J. (1999). Exact distributions of word occurrences in a random sequence of letters. J. Appl. Prob. 36, 179--193.
Mathematical Reviews (MathSciNet): MR1699643
Zentralblatt MATH: 0945.60008
Digital Object Identifier: doi:10.1239/jap/1032374240
Project Euclid: euclid.jap/1032374240
Schbath, S. (1995). Compound poisson approximation of word counts in DNA sequences. ESAIM Prob. Statist. 1, 1--16.
Mathematical Reviews (MathSciNet): MR1382515
Digital Object Identifier: doi:10.1051/ps:1997100
Schbath, S. (1995). Étude asymptotique du nombre d'occurrences d'un mot dans une chaîne de Markov et application à la recherche de mots de fréquence exceptionnelle dans les séquences d'ADN. Doctoral Thesis, Université René Descartes, Paris V.

2012 © Applied Probability Trust

Journal of Applied Probability

Journal of Applied Probability