Advances in Applied Probability

Improved compound Poisson approximation for the number of occurrences of any rare word family in a stationary Markov chain

Etienne Roquain and Sophie Schbath
Source: Adv. in Appl. Probab. Volume 39, Number 1 (2007), 128-140.

Abstract

We derive a new compound Poisson distribution with explicit parameters to approximate the number of overlapping occurrences of any set of words in a Markovian sequence. Using the Chen-Stein method, we provide a bound for the approximation error. This error converges to 0 under the rare event condition, even for overlapping families, which improves previous results. As a consequence, we also propose Poisson approximations for the declumped count and the number of competing renewals.

First Page: Show Hide
Primary Subjects: 62E17
Secondary Subjects: 60C05
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aap/1175266472
Digital Object Identifier: doi:10.1239/aap/1175266472
Mathematical Reviews number (MathSciNet): MR2307874
Zentralblatt MATH identifier: 1109.62012

References

Arratia, R., Goldstein, L. and Gordon, L. (1990). Poisson approximation and the Chen--Stein method. Statist. Sci. 5, 403--434.
Mathematical Reviews (MathSciNet): MR1092983
Project Euclid: euclid.ss/1177012015
Chryssaphinou, O. and Papastavridis, S. (1990). The occurrence of sequence patterns in repeated dependent experiments. Theory Prob. Appl. 35, 145--152.
Mathematical Reviews (MathSciNet): MR1050068
Chryssaphinou, O., Papastavridis, S. and Vaggelatou, E. (2001). Poisson approximation for the non-overlapping appearances of several words in Markov chains. Combin. Prob. Comput. 10, 293--308.
Mathematical Reviews (MathSciNet): MR1860437
Digital Object Identifier: doi:10.1017/S096354830100476X
Zentralblatt MATH: 0994.62005
Godbole, A. P. (1991). Poisson approximations for runs and patterns of rare events. Adv. Appl. Prob. 23, 851--865.
Mathematical Reviews (MathSciNet): MR1133732
Digital Object Identifier: doi:10.2307/1427680
Zentralblatt MATH: 0751.60018
Lothaire, M. (2005). Applied Combinatorics on Words. Cambridge University Press.
Mathematical Reviews (MathSciNet): MR2165687
Zentralblatt MATH: 02183071
Prum, B., Rodolphe, F. and de Turckheim, É. (1995). Finding words with unexpected frequencies in DNA sequences. J. R. Statist. Soc. B 57, 205--220.
Mathematical Reviews (MathSciNet): MR1325386
Régnier, M. (2000). A unified approach to word occurrence probabilities. Discrete Appl. Math. 104, 259--280.
Mathematical Reviews (MathSciNet): MR1781779
Digital Object Identifier: doi:10.1016/S0166-218X(00)00195-5
Zentralblatt MATH: 0987.92017
Reinert, G. and Schbath, S. (1998). Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223--253.
Reinert, G., Schbath, S. and Waterman, M. (2000). Probabilistic and statistical properties of words. J. Comput. Biol. 7, 1--46.
Robin, S. and Daudin, J.-J. (1999). Exact distribution of word occurrences in a random sequence of letters. J. Appl. Prob. 36, 179--193.
Mathematical Reviews (MathSciNet): MR1699643
Digital Object Identifier: doi:10.1239/jap/1032374240
Project Euclid: euclid.jap/1032374240
Zentralblatt MATH: 0945.60008
Robin, S. and Schbath, S. (2001). Numerical comparison of several approximations of the word count distribution in random sequences. J. Comput. Biol. 8, 349--359.
Schbath, S. (1995). Compound Poisson approximation of word counts in DNA sequences. ESAIM Prob. Statist. 1, 1--16.
Mathematical Reviews (MathSciNet): MR1382515
Digital Object Identifier: doi:10.1051/ps:1997100
Zentralblatt MATH: 0869.60067
Schbath, S. (1995). Étude asymptotique du nombre d'occurrences d'un mot dans une chaî ne de Markov et application à la recherche de mots de fréquence exceptionnelle dans les séquences d'ADN. Doctoral Thesis, Université René Descartes, Paris V.

2013 © Applied Probability Trust

Advances in Applied Probability

Advances in Applied Probability