Statistical Science

Defining Predictive Probability Functions for Species Sampling Models

Jaeyong Lee, Fernando A. Quintana, Peter Müller, and Lorenzo Trippa

Full-text: Open access


We review the class of species sampling models (SSM). In particular, we investigate the relation between the exchangeable partition probability function (EPPF) and the predictive probability function (PPF). It is straightforward to define a PPF from an EPPF, but the converse is not necessarily true. In this paper we introduce the notion of putative PPFs and show novel conditions for a putative PPF to define an EPPF. We show that all possible PPFs in a certain class have to define (unnormalized) probabilities for cluster membership that are linear in cluster size. We give a new necessary and sufficient condition for arbitrary putative PPFs to define an EPPF. Finally, we show posterior inference for a large class of SSMs with a PPF that is not linear in cluster size and discuss a numerical method to derive its PPF.

Article information

Statist. Sci., Volume 28, Number 2 (2013), 209-222.

First available in Project Euclid: 21 May 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Species sampling prior exchangeable partition probability functions prediction probability functions


Lee, Jaeyong; Quintana, Fernando A.; Müller, Peter; Trippa, Lorenzo. Defining Predictive Probability Functions for Species Sampling Models. Statist. Sci. 28 (2013), no. 2, 209--222. doi:10.1214/12-STS407.

Export citation


  • Berry, D. A. and Christensen, R. (1979). Empirical Bayes estimation of a binomial parameter via mixtures of Dirichlet processes. Ann. Statist. 7 558–568.
  • Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist. 1 353–355.
  • Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes. Adv. in Appl. Probab. 31 929–953.
  • Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • Ferguson, T. S. and Klass, M. J. (1972). A representation of independent increment processes without Gaussian components. Ann. Math. Statist. 43 1634–1643.
  • Fortini, S., Ladelli, L. and Regazzini, E. (2000). Exchangeability, predictive distributions and parametric models. Sankhyā Ser. A 62 86–109.
  • Gnedin, A., Haulk, C. and Pitman, J. (2010). Characterizations of exchangeable partitions and random discrete distributions by deletion properties. In Probability and Mathematical Genetics. London Mathematical Society Lecture Note Series 378 264–298. Cambridge Univ. Press, Cambridge.
  • Gnedin, A. and Pitman, J. (2005). Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 325 83–102, 244–245.
  • Gnedin, A. and Pitman, J. (2006). Exchangeable Gibbs partitions and Stirling triangles. J. Math. Sci. 138 5674–5685.
  • Good, I. J. (1965). The Estimation of Probabilities. An Essay on Modern Bayesian Methods. Research Monograph 30. MIT Press, Cambridge, MA.
  • Ishwaran, H. and James, L. F. (2003). Generalized weighted Chinese restaurant processes for species sampling mixture models. Statist. Sinica 13 1211–1235.
  • James, L. F. (2008). Large sample asymptotics for the two-parameter Poisson–Dirichlet process. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh. Inst. Math. Stat. Collect. 3 187–199. IMS, Beachwood, OH.
  • James, L. F., Lijoi, A. and Prünster, I. (2009). Posterior analysis for normalized random measures with independent increments. Scand. J. Stat. 36 76–97.
  • Jang, G. H., Lee, J. and Lee, S. (2010). Posterior consistency of species sampling priors. Statist. Sinica 20 581–593.
  • Kingman, J. F. C. (1975). Random discrete distribution (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 37 1–22.
  • Kingman, J. F. C. (1978). The representation of partition structures. J. Lond. Math. Soc. (2) 18 374–380.
  • Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235–248.
  • León-Novelo, L., Bekele, B. N., Müller, P. and Quintana, F. A. (2012). Borrowing strength with non-exchangeable priors over subpopulations. Biometrics 68 550–558.
  • Lijoi, A., Mena, R. H. and Prünster, I. (2005). Hierarchical mixture modeling with normalized inverse-Gaussian priors. J. Amer. Statist. Assoc. 100 1278–1291.
  • Lijoi, A., Mena, R. H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769–786.
  • Lijoi, A., Prünster, I. and Walker, S. G. (2005). On consistency of nonparametric normal mixtures for Bayesian density estimation. J. Amer. Statist. Assoc. 100 1292–1296.
  • Lijoi, A., Prünster, I. and Walker, S. G. (2008a). Investigating nonparametric priors with Gibbs structure. Statist. Sinica 18 1653–1668.
  • Lijoi, A., Prünster, I. and Walker, S. G. (2008b). Bayesian nonparametric estimators derived from conditional Gibbs structures. Ann. Appl. Probab. 18 1519–1547.
  • Lijoi, A. and Prünster, I. (2010). Models beyond the Dirichlet process. In Bayesian Nonparametrics (N. L. Hjort, C. Holmes, P. Müller and S. G. Walker, eds.) 80–136. Cambridge Univ. Press, Cambridge.
  • Liu, J. S. (1996). Nonparametric hierarchical Bayes via sequential imputations. Ann. Statist. 24 911–930.
  • MacEachern, S. N. (1994). Estimating normal means with a conjugate style Dirichlet process prior. Comm. Statist. Simulation Comput. 23 727–741.
  • MacEachern, S. N. and Müller, P. (1998). Estimating mixtures of Dirichlet process models. J. Comput. Graph. Statist. 7 223–239.
  • Navarrete, C., Quintana, F. A. and Müller, P. (2008). Some issues in nonparametric Bayesian modelling using species sampling models. Stat. Model. 8 3–21.
  • Nieto-Barajas, L. E., Prünster, I. and Walker, S. G. (2004). Normalized random measures driven by increasing additive processes. Ann. Statist. 32 2343–2360.
  • Perman, M., Pitman, J. and Yor, M. (1992). Size-biased sampling of Poisson point processes and excursions. Probab. Theory Related Fields 92 21–39.
  • Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145–158.
  • Pitman, J. (1996). Some developments of the Blackwell–MacQueen urn scheme. In Statistics, Probability and Game Theory. Institute of Mathematical Statistics Lecture Notes—Monograph Series 30 245–267. IMS, Hayward, CA.
  • Pitman, J. (2003). Poisson–Kingman partitions. In Statistics and Science: A Festschrift for Terry Speed. Institute of Mathematical Statistics Lecture Notes—Monograph Series 40 1–34. IMS, Beachwood, OH.
  • Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Math. 1875. Springer, Berlin.
  • Regazzini, E., Lijoi, A. and Prünster, I. (2003). Distributional results for means of normalized random measures with independent increments. Ann. Statist. 31 560–585.
  • Trippa, L. and Favaro, S. (2012). A class of normalized random measures with an exact predictive sampling scheme. Scand. J. Stat. 39 444–460.
  • Zabell, S. L. (1982). W. E. Johnson’s “sufficientness” postulate. Ann. Statist. 10 1090–1099 (1 plate).