Journal of Applied Probability

Exact distribution of word occurrences in a random sequence of letters

S. Robin and J. J. Daudin

Source: J. Appl. Probab. Volume 36, Number 1 (1999), 179-193.

Abstract

The study of the distribution of the distance between words in a random sequence of letters is interesting in view of application in genome sequence analysis. In this paper we give the exact distribution probability and cumulative distribution function of the distances between two successive occurrences of a given word and between the nth and the (n+m)th occurrences under three models of generation of the letters: i.i.d. with the same probability for each letter, i.i.d. with different probabilities and Markov process. The generating function and the first two moments are also given. The point of studying the distances instead of the counting process is that we get some knowledge not only about the frequency of a word but also about its longitudinal distribution in the sequence.

Primary Subjects: 60C05, 60E05
Secondary Subjects: 60G50, 60J20, 60J10
Keywords: Distance between occurrences; genome sequence analysis; Markov chain; patterns; waiting time

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.jap/1032374240
Digital Object Identifier: doi:10.1239/jap/1032374240
Mathematical Reviews number (MathSciNet): MR1699643
Zentralblatt MATH identifier: 0945.60008


2009 © Applied Probability Trust