Electronic Journal of Probability

Large deviation principles for the Ewens-Pitman sampling model

Stefano Favaro and Shui Feng

Full-text: Open access

Abstract

Let $M_{l,n}$ be the number of blocks with frequency $l$ in the exchangeable random partition induced by a sample of size $n$ from the Ewens-Pitman sampling model. In this paper we show that, as $n$ tends to infinity, $n^{-1}M_{l,n}$ satisfies a large deviation principle and we characterize the corresponding rate function. A conditional counterpart of this large deviation principle is also presented. Specifically, given an initial observed sample of size $n$ from the Ewens-Pitman sampling model, we consider an additional unobserved sample of size $m$ thus giving rise to an enlarged sample of size $n+m$. Then, for any fixed $n$ and as $m$ tends to infinity, we establish a large deviation principle for the conditional number of blocks with frequency $l$ in the enlarged sample, given the initial sample. Interestingly this conditional large deviation principle coincides with the large deviation principle for $M_{l,n}$, namely there is no long lasting impact of the given initial sample to the large deviations. Potential applications of our conditional large deviation principle are thoroughly  discussed in the context of Bayesian nonparametric inference for species sampling problems.

Article information

Source
Electron. J. Probab., Volume 20 (2015), paper no. 40, 26 pp.

Dates
Accepted: 8 April 2015
First available in Project Euclid: 4 June 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejp/1465067146

Digital Object Identifier
doi:10.1214/EJP.v20-3668

Mathematical Reviews number (MathSciNet)
MR3335831

Zentralblatt MATH identifier
1321.60047

Subjects
Primary: 60F10: Large deviations
Secondary: 92D10: Genetics {For genetic algebras, see 17D92}

Keywords
Bayesian nonparametrics discovery probability Ewens-Pitman sampling model exchangeable random partition large deviations population genetics species sampling problems

Rights
This work is licensed under aCreative Commons Attribution 3.0 License.

Citation

Favaro, Stefano; Feng, Shui. Large deviation principles for the Ewens-Pitman sampling model. Electron. J. Probab. 20 (2015), paper no. 40, 26 pp. doi:10.1214/EJP.v20-3668. https://projecteuclid.org/euclid.ejp/1465067146


Export citation

References

  • Adams, M., Kelley, J., Gocayne, J., Mark, D., Polymeropoulos, M., Xiao, H., Merril, C., Wu, A., Olde, B., Moreno, R., Kerlavage, A., McCombe, W. and Venter, J. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. textitScience 252, 1651–1656.
  • Arratia, R.; Barbour, A. D.; Tavaré, S. Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab. 2 (1992), no. 3, 519–535.
  • Arratia, R.; Barbour, A. D.; Tavaré, S. Logarithmic combinatorial structures: a probabilistic approach. EMS Monographs in Mathematics. European Mathematical Society (EMS), Zurich, 2003. xii+363 pp. ISBN: 3-03719-000-0
  • Bacallado, S.; Favaro, S.; Trippa, L. Looking-backward probabilities for Gibbs-type exchangeable random partitions. Bernoulli 21 (2015), no. 1, 1–37.
  • Barbour, A. D.; Gnedin, A. V. Small counts in the infinite occupancy scheme. Electron. J. Probab. 14 (2009), no. 13, 365–384.
  • Charalambides, C. A. Combinatorial methods in discrete distributions. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2005. xiv+415 pp. ISBN: 0-471-68027-3
  • Dembo, A.; Zeitouni, O. Large deviations techniques and applications. Second edition. Applications of Mathematics (New York), 38. Springer-Verlag, New York, 1998. xvi+396 pp. ISBN: 0-387-98406-2
  • Dinwoodie, I. H.; Zabell, S. L. Large deviations for exchangeable random vectors. Ann. Probab. 20 (1992), no. 3, 1147–1166.
  • Ewens, W. J. The sampling theory of selectively neutral alleles. Theoret. Population Biology 3 (1972), 87–112; erratum, ibid. 3 (1972), 240; erratum, ibid. 3 (1972), 376.
  • Favaro, S.; Lijoi, A.; Mena, R. H.; Pruenster, I. Bayesian non-parametric inference for species variety with a two-parameter Poisson-Dirichlet process prior. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 (2009), no. 5, 993–1008.
  • Favaro, S.; Lijoi, A.; Pruenster, I. Conditional formulae for Gibbs-type exchangeable random partitions. Ann. Appl. Probab. 23 (2013), no. 5, 1721–1754.
  • Favaro, S. and Feng, S. (2013). Asymptotics for the conditional number of blocks in the Ewens-Pitman sampling model. textitElectron. J. Probab., 19, 1–15.
  • Feng, S. Large deviations associated with Poisson-Dirichlet distribution and Ewens sampling formula. Ann. Appl. Probab. 17 (2007), no. 5-6, 1570–1595.
  • Feng, S. The Poisson-Dirichlet distribution and related topics. Models and asymptotic behaviors. Probability and its Applications (New York). Springer, Heidelberg, 2010. xiv+218 pp. ISBN: 978-3-642-11193-8
  • Feng, S.; Hoppe, F. M. Large deviation principles for some random combinatorial structures in population genetics and Brownian motion. Ann. Appl. Probab. 8 (1998), no. 4, 975–994.
  • Flajolet, P.; Dumas, P.; Puyhaubert, V. Some exactly solvable models of urn process theory. Fourth Colloquium on Mathematics and Computer Science Algorithms, Trees, Combinatorics and Probabilities, 59–118, Discrete Math. Theor. Comput. Sci. Proc., AG, Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 2006.
  • Good, I. J. The population frequencies of species and the estimation of population parameters. Biometrika 40, (1953). 237–264.
  • Good, I. J.; Toulmin, G. H. The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43 (1956), 45–63.
  • Griffiths, R. C.; Spanò, D. Record indices and age-ordered frequencies in exchangeable Gibbs partitions. Electron. J. Probab. 12 (2007), 1101–1130.
  • Janson, S. Limit theorems for triangular urn schemes. Probab. Theory Related Fields 134 (2006), no. 3, 417–452.
  • Korwar, R. M.; Hollander, M. Contributions to the theory of Dirichlet processes. Ann. Probability 1 (1973), 705–711.
  • Lijoi, A.; Mena, R. H.; Pruenster, I. Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 (2007), no. 4, 769–786.
  • Lijoi, A.; Pruenster, I.; Walker, S. G. Bayesian nonparametric estimators derived from conditional Gibbs structures. Ann. Appl. Probab. 18 (2008), no. 4, 1519–1547.
  • Perman, M.; Pitman, J.; Yor, M. Size-biased sampling of Poisson point processes and excursions. Probab. Theory Related Fields 92 (1992), no. 1, 21–39.
  • Pitman, J. Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 (1995), no. 2, 145–158.
  • Pitman, J. Some developments of the Blackwell-MacQueen urn scheme. Statistics, probability and game theory, 245–267, IMS Lecture Notes Monogr. Ser., 30, Inst. Math. Statist., Hayward, CA, 1996.
  • Pitman, J. Partition structures derived from Brownian motion and stable subordinators. Bernoulli 3 (1997), no. 1, 79–96.
  • Pitman, J.; Yor, M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 (1997), no. 2, 855–900.
  • Pitman, J. Combinatorial stochastic processes. Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour, July 7-24, 2002. With a foreword by Jean Picard. Lecture Notes in Mathematics, 1875. Springer-Verlag, Berlin, 2006. x+256 pp. ISBN: 978-3-540-30990-1; 3-540-30990-X
  • Schweinsberg, J. The number of small blocks in exchangeable random partitions. ALEA Lat. Am. J. Probab. Math. Stat. 7 (2010), 217–242.
  • Susko, E. and Roger, A. J. (2004). Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys. textitBioinformatics, 20, 2279–2287.
  • Watterson, G. A. The sampling theory of selectively neutral alleles. Advances in Appl. Probability 6 (1974), 463–488.
  • Watterson, G. A. Lines of descent and the coalescent. Theoret. Population Biol. 26 (1984), no. 1, 77–92.