The Annals of Applied Probability

Waiting for regulatory sequences to appear

Richard Durrett and Deena Schmidt

Full-text: Open access


One possible explanation for the substantial organismal differences between humans and chimpanzees is that there have been changes in gene regulation. Given what is known about transcription factor binding sites, this motivates the following probability question: given a 1000 nucleotide region in our genome, how long does it take for a specified six to nine letter word to appear in that region in some individual? Stone and Wray [Mol. Biol. Evol. 18 (2001) 1764–1770] computed 5,950 years as the answer for six letter words. Here, we will show that for words of length 6, the average waiting time is 100,000 years, while for words of length 8, the waiting time has mean 375,000 years when there is a 7 out of 8 letter match in the population consensus sequence (an event of probability roughly 5/16) and has mean 650 million years when there is not. Fortunately, in biological reality, the match to the target word does not have to be perfect for binding to occur. If we model this by saying that a 7 out of 8 letter match is good enough, the mean reduces to about 60,000 years.

Article information

Ann. Appl. Probab. Volume 17, Number 1 (2007), 1-32.

First available in Project Euclid: 13 February 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 92D10: Genetics {For genetic algebras, see 17D92}
Secondary: 60F05: Central limit and other weak theorems

Regulatory sequence population genetics Moran model Poisson approximation clumping heuristic


Durrett, Richard; Schmidt, Deena. Waiting for regulatory sequences to appear. Ann. Appl. Probab. 17 (2007), no. 1, 1--32. doi:10.1214/105051606000000619.

Export citation


  • Aldous, D. J. (1989). The Poisson Clumping Heuristic. Springer, New York.
  • Aldous, D. J. and Fill, J. (2002). Reversible Markov Chains and Random Walks on Graphs. Chapter 3. Available at
  • Arratia, R., Goldstein, L. and Gordon, L. (1989). Two moments suffice for Poisson approximation: The Chen--Stein method. Ann. Probab. 17 9--25.
  • Berg, J., Willmann, S. and Lassig, M. (2004). Adaptive evolution of transcription factor binding sites. BMC Evol. Biol. 4 paper 42.
  • Berg, O. and von Hippel, P. (1987). Selection of DNA binding sites by regulatory proteins. J. Mol. Biol. 193 723--750.
  • Carter, A. J. R. and Wagner, G. P. (2002). Evolution of functionally conserved enhancers can be accelerated in large populations: A population-genetic model. Proc. Roy. Soc. London Ser. B 269 953--960.
  • Dermitzakis, E. T. and Clark, A. G. (2002). Evolution of transcription factor binding sites in mammalian gene regulatory regions: Conservation and turnover. Mol. Biol. Evol. 19 1114--1121.
  • Durrett, R. (2002). Probability Models for DNA Sequence Evolution. Springer, New York.
  • Durrett, R. (2005). Probability: Theory and Examples, 3rd ed. Duxbury Press, Belmont, CA.
  • Eigen, M., McCaskill, J. and Schuster, P. (1989). The molecular quasispecies. Adv. Chem. Phys. 75 149--263.
  • Ewens, W. J. (2004). Mathematical Population Genetics, 2nd ed. Springer, New York.
  • Fields, D., He, Y., Al-Uzri, A. and Stormo, G. (1997). Quantitative specificity of mnt repression. J. Mol. Biol. 271 178--194.
  • Gerland, U. and Hwa, T. (2002). On the selection and evolution of regulatory DNA motifs. J. Mol. Evol. 55 386--400.
  • Hahn, M. W., Rockman, M. V., Sorzano, N., Goldstein, D. B. and Wray, G. A. (2004). Population genetic and phylogenetic evidence for positive selection on regulatory mutations at the Factor VII locus in humans. Genetics 167 867--877.
  • King, M. C. and Wilson, A. C. (1975). Evolution at two levels in humans and chimpanzees. Science 188 107--116.
  • Ludwig, M. Z., Palsson, A., Alekseeva, E., Bergman, C. E., Nathan, J. and Kreitman, M. (2005). Functional evolution of a cis-regulatory module. PLoS Biology 3 588--598.
  • MacArthur, S. and Brookfield, J. F. (2004). Expected rates and modes of evolution of enhancer sequences. Mol. Biol. Evol. 21 1064--1073.
  • Prudhomme, B., et al. (2006). Repeated morphological evolution through cis-regulatory changes in a pleiotropic gene. Nature 440 1050--1053.
  • Rockman, M. V., Hahn, M. W., Sorzano, N., Zimprich, F., Goldstein, D. B. and Wray, G. A. (2005). Ancient and recent positive selection transformed opioid cis-regulation in humans. PLoS Biology 3 article e387.
  • Stone, J. R. and Wray, G. A. (2001). Rapid evolution of cis-regulatory sequences via local point mutations. Mol. Biol. Evol. 18 1764--1770.
  • Tavaré, S. (2004). Ancestral inference in population genetics. Lecture on Probability Theory and Statistics. Lecture Notes in Math. 1837 1--188. Springer, Berlin.