Source: J. Appl. Probab. Volume 46, Number 4
(2009), 1020-1037.
We derive a moderate deviation principle for word counts (which is
extended to counts of multiple patterns) in biological sequences under
different models: independent and identically distributed letters,
homogeneous Markov chains of order 1 and $m$, and, in view of the codon
structure of DNA sequences, Markov chains with three different transition
matrices. This enables us to approximate P-values for the number of
word occurrences in DNA and protein sequences in a new manner.
Full-text: Access denied (no subscription
detected)
We're sorry, but we are unable to provide
you with the full text of this article because we are not able to identify
you as a subscriber.
If you have a personal subscription to
this journal, then please login. If you are already logged in, then you
may need to update your profile to register your subscription.
Read more about accessing full-text
References
Behrens, S. (2008). Moderate und groß e abweichungen zur statistischen analyse biologischer sequenzen. Doctoral Thesis, Universität Münster.
Blaisdell, B. E. (1985). Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J. Molec. Evol. 21, 278--288.
Chen, X. (1999). Limit theorems for functionals of ergodic Markov chains with general state space. Mem. Amer. Math. Soc. 139.
Chung, K. L. (1967). Markov Chains With Stationary Transition Probabilities, 2nd edn. Springer, New York.
Mathematical Reviews (MathSciNet):
MR217872
Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd edn. Springer, New York.
Djellout, H. and Guillin, A. (2001). Moderate deviations for Markov chains with atom. Stoch. Process. Appl. 95, 203--217.
Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological Sequence Analysis. Cambridge University Press.
Hunter, J. J. (2008). Variances of first passage times in a Markov chain with applications to mixing times. Linear Algebra Appl. 429, 1135--1162.
Kleffe J. and Borodovsky M. (1992). First and second moment of counts of words in random texts generated by Markov chains. Comput. Appl. Biosci. 8, 433--441.
Kleffe, J. and Langbecker, U. (1990). Exact computation of pattern probabilities in random sequences generated by Markov chains. Comput. Appl. Biosci. 6, 347--353.
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, Berlin.
Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.
Nuel, G. (2001). Grandes déviations et chaînes de Markov pour l'étude des occurrences de mots dans les séquences biologiques. Doctoral Thesis, Université d'Essonne.
Nuel, G. (2006). Numerical solutions for patterns statistics on Markov chains. Statist. Appl. Genet. Molec. Biol. 5, 45 pp.
Nussinov, R. (1981). The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice. J. Molec. Evol. 17, 237--244.
Pitman, J. W. (1974). Uniform rates of convergence for Markov chain transition probabilities. Z. Wahrscheinlichkeitsth. 29, 193--227.
Mathematical Reviews (MathSciNet):
MR373012
Prum, B., Rodolphe, F. and de Turckheim, È. (1995). Finding words with unexpected frequencies in desoxyribonucleic acid sequences. J. R. Statist. Soc. B 57, 205--220.
Régnier, M. (2000). A unified approach to word occurrence probabilities. Discrete Appl. Math. 104, 259--280.
Reinert, G., Schbath, S. and Waterman, M. S. (2005). Probabilistic and statistical properties of finite words in finite sequences. In Applied Combinatorics on Words, eds J. Berstel and D. Perrin, Cambridge University Press.
Robin, S. and Daudin, J. J. (1999). Exact distributions of word occurrences in a random sequence of letters. J. Appl. Prob. 36, 179--193.
Schbath, S. (1995). Compound poisson approximation of word counts in DNA sequences. ESAIM Prob. Statist. 1, 1--16.
Schbath, S. (1995). Étude asymptotique du nombre d'occurrences d'un mot dans une chaîne de Markov et application à la recherche de mots de fréquence exceptionnelle dans les séquences d'ADN. Doctoral Thesis, Université René Descartes, Paris V.