The Annals of Probability

Power laws for family sizes in a duplication model

Rick Durrett and Jason Schweinsberg

Source: Ann. Probab. Volume 33, Number 6 (2005), 2094-2126.

Abstract

Qian, Luscombe and Gerstein [J. Molecular Biol. 313 (2001) 673–681] introduced a model of the diversification of protein folds in a genome that we may formulate as follows. Consider a multitype Yule process starting with one individual in which there are no deaths and each individual gives birth to a new individual at rate 1. When a new individual is born, it has the same type as its parent with probability 1−r and is a new type, different from all previously observed types, with probability r. We refer to individuals with the same type as families and provide an approximation to the joint distribution of family sizes when the population size reaches N. We also show that if 1≪SN1−r, then the number of families of size at least S is approximately CNS−1/(1−r), while if N1−rS the distribution decays more rapidly than any power.

Primary Subjects: 60J80
Secondary Subjects: 60J85, 92D15, 92D20
Keywords: Power law; Yule processes; multitype branching processes; genome sequencing

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aop/1133965854
Digital Object Identifier: doi:10.1214/009117905000000369
Mathematical Reviews number (MathSciNet): MR2184092
Zentralblatt MATH identifier: 05020238

References

Aldous, D. J. (2001). Stochastic models and descriptive statistics for phylogenetic trees from Yule to today. Statist. Sci. 16 23--34.
Mathematical Reviews (MathSciNet): MR1838600
Digital Object Identifier: doi:10.1214/ss/998929474
Project Euclid: euclid.ss/998929474
Angerer, W. P. (2001). An explicit representation of the Luria--Delbrück distribution. J. Math. Biol. 42 145--174.
Mathematical Reviews (MathSciNet): MR1816347
Digital Object Identifier: doi:10.1007/s002850000053
Angerer, W. P. and Wakolbinger, A. (2005). In preparation.
Arratia, R., Barbour, A. D. and Tavaré, S. (2000). Limits of logarithmic combinatorial structures. Ann. Probab. 28 1620--1644.
Mathematical Reviews (MathSciNet): MR1813836
Digital Object Identifier: doi:10.1214/aop/1019160500
Project Euclid: euclid.aop/1019160500
Arratia, R. and Gordon, L. (1989). Tutorial on large deviations for the binomial distribution. Bull. Math. Biol. 51 125--131.
Mathematical Reviews (MathSciNet): MR978907
Digital Object Identifier: doi:10.1016/S0092-8240(89)80052-7
Athreya, K. B. and Karlin, S. (1968). Embedding of urn schemes into continuous time Markov branching processes. Ann. Math. Statist. 39 1801--1817.
Mathematical Reviews (MathSciNet): MR232455
Athreya, K. B. and Ney, P. E. (1972). Branching Processes. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR373040
Zentralblatt MATH: 0259.60002
Barbási, A. L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509--512.
Mathematical Reviews (MathSciNet): MR2091634
Digital Object Identifier: doi:10.1126/science.286.5439.509
Berger, N., Borgs, C., Chayes, J. and Saberi, A. (2005). On the spread of viruses on the internet. In Proceedings of the 16th ACM---SIAM Symposium on Discrete Algorithms 301--310. SIAM, Philadelphia, PA.
Bollobás, B., Borgs, C., Chayes, J. and Riordan, O. (2003). Directed scale free graphs. In Proceedings of the 14th ACM---SIAM Symposium on Discrete Algorithms 132--139. SIAM, Philadelphia, PA.
Mathematical Reviews (MathSciNet): MR1974912
Bollobás, B., Riordan, O., Spencer, J. and Tusn\' ady, G. (2001). The degree sequence of a scale-free random graph process. Random Structures Algorithms 18 279--290.
Mathematical Reviews (MathSciNet): MR1824277
Cooper, C. and Frieze, A. (2003). A general model for web graphs. Random Structures Algorithms 22 311--335.
Mathematical Reviews (MathSciNet): MR1966545
Durrett, R. (1996). Probability: Theory and Examples, 2nd ed. Duxbury, Belmont, CA.
Mathematical Reviews (MathSciNet): MR1609153
Durrett, R. (2002). Probability Models for DNA Sequence Evolution. Springer, New York.
Mathematical Reviews (MathSciNet): MR1903526
Durrett, R. and Schweinsberg, J. (2004). Approximating selective sweeps. Theoret. Population Biol. 66 129--138.
Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Population Biol. 3 87--112.
Mathematical Reviews (MathSciNet): MR325177
Gu, Z., Cavalcanti, A., Chen, F.-C., Bouman, P. and Li, W.-H. (2002). Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. Mol. Biol. Evol. 19 256--262.
Harrison, P. M. and Gerstein, M. (2002). Studying genomes through the aeons: Protein families, pseudogenes, and proteome evolution. J. Mol. Biol. 318 1155--1174.
Huynen, M. A. and van Nimwegen, E. (1998). The frequency distribution of gene family sizes in complete genomes. Mol. Biol. Evol. 15 583--589.
Janson, S. (2004). Functional limit theorems for multitype branching processes and generalized Pólya urns. Stochastic Process. Appl. 110 177--245.
Mathematical Reviews (MathSciNet): MR2040966
Digital Object Identifier: doi:10.1016/j.spa.2003.12.002
Janson, S. (2004). Limit theorems for triangular urn schemes. Preprint. Available at http://www. math.uu.se/~svante/papers/index.html.
Johnson, N. L., Kotz, S. and Kemp, A. W. (1992). Univariate Discrete Distributions. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1224449
Zentralblatt MATH: 0773.62007
Karev, G. P., Wolf, Y. I., Rzhetsky, A. Y., Berezov, F. S. and Koonin, E. V. (2002). Birth and death of protein domains: A simple model explains power law behavior. BMC Evolutionary Biology 2 article 18.
Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235--248.
Mathematical Reviews (MathSciNet): MR671034
Digital Object Identifier: doi:10.1016/0304-4149(82)90011-4
Koonin, E. V., Wolf, Y. I. and Karev, G. P. (2002). The structure of the protein universe and genome evolution. Nature 420 218--223.
Krapivsky, P. L., Redner S. and Leyvraz, F. (2000). Connectivity of growing random networks. Phys. Rev. Lett. 85 4629--4632.
Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A. and Upfal, E. (2000). Stochastic models for the web graph. In Proceedings of the 41st IEEE Symposium on the Foundations of Computer Science 57--65.
Mathematical Reviews (MathSciNet): MR1931804
Li, W.-H., Gu, Z., Wang, H. and Nekrutenko, A. (2001). Evolutionary analyses of the human genome. Nature 409 847--849.
Zentralblatt MATH: 1019.81044
Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145--158.
Mathematical Reviews (MathSciNet): MR1337249
Digital Object Identifier: doi:10.1007/BF01213386
Pitman, J. (2002). Combinatorial stochastic processes. Lecture Notes for St. Flour Summer School. Available at http://stat-www.berkeley.edu/users/pitman/bibliog.html.
Pitman, J. and Yor, M. (1997). The two-parameter Poisson--Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 855--900.
Mathematical Reviews (MathSciNet): MR1434129
Digital Object Identifier: doi:10.1214/aop/1024404422
Project Euclid: euclid.aop/1024404422
Qian, J., Luscombe, N. M. and Gerstein, M. (2001). Protein family and fold occurrence in genomes: Power-law behavior and evolutionary model. J. Mol. Biol. 313 673--681.
Rzhetsky, A. and Gomez, S. M. (2001). Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics 17 988--996.
Schweinsberg, J. and Durrett, R. (2004). Random partitions approximating the coalescence of lineages during a selective sweep. Preprint. Available at http://front.math.ucdavis.edu/math.PR/ 0411069.
Mathematical Reviews (MathSciNet): MR2152239
Digital Object Identifier: doi:10.1214/105051605000000430
Project Euclid: euclid.aoap/1121433763
Simon, H. A. (1955). On a class of skew distribution functions. Biometrika 42 425--440.
Mathematical Reviews (MathSciNet): MR73085
Zentralblatt MATH: 0066.11201
Skorohod, A. V. (1961). Asymptotic formulas for stable distribution laws. Selected Translations in Mathematical Statistics and Probability 1 157--161.
Mathematical Reviews (MathSciNet): MR116373
Yule, G. U. (1925). A mathematical theory of evolution based on the conclusions of Dr. J. C. Willis. Philos. Trans. Roy. Soc. London Ser. B 213 21--87.

2010 © Institute of Mathematical Statistics