Electronic Journal of Statistics

A note on nonparametric inference for species variety with Gibbs-type priors

Stefano Favaro and Lancelot F. James

Full-text: Open access

Abstract

A Bayesian nonparametric methodology has been recently introduced for estimating, given an initial observed sample, the species variety featured by an additional unobserved sample of size $m$. Although this methodology led to explicit posterior distributions under the general framework of Gibbs-type priors, there are situations of practical interest where $m$ is required to be very large and the computational burden for evaluating these posterior distributions makes impossible their concrete implementation. In this paper we present a solution to this problem for a large class of Gibbs-type priors which encompasses the two parameter Poisson-Dirichlet prior and, among others, the normalized generalized Gamma prior. Our solution relies on the study of the large $m$ asymptotic behaviour of the posterior distribution of the number of new species in the additional sample. In particular we introduce a simple characterization of the limiting posterior distribution in terms of a scale mixture with respect to a suitable latent random variable; this characterization, combined with the adaptive rejection sampling, leads to derive a large $m$ approximation of any feature of interest from the exact posterior distribution. We show how to implement our results through a simulation study and the analysis of a dataset in linguistics.

Article information

Source
Electron. J. Statist., Volume 9, Number 2 (2015), 2884-2902.

Dates
Received: February 2015
First available in Project Euclid: 4 January 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1451916110

Digital Object Identifier
doi:10.1214/15-EJS1096

Mathematical Reviews number (MathSciNet)
MR3439188

Zentralblatt MATH identifier
1329.62162

Subjects
Primary: 62F15: Bayesian inference 60G57: Random measures

Keywords
Adaptive rejection sampling Bayesian nonparametric inference empirical linguistics Gibbs-type priors normalized generalized Gamma prior species sampling asymptotics two parameter Poisson-Dirichlet prior

Citation

Favaro, Stefano; F. James, Lancelot. A note on nonparametric inference for species variety with Gibbs-type priors. Electron. J. Statist. 9 (2015), no. 2, 2884--2902. doi:10.1214/15-EJS1096. https://projecteuclid.org/euclid.ejs/1451916110


Export citation

References

  • [1] Airoldi E., Costa T., Leisen F., Bassetti F. and Guindani M. (2014). Generalized species sampling priors with latent beta reinforcements., J. Amer. Statist. Assoc., 109, 1466-1480.
  • [2] Argiento, R., Guglielmi, A. and Pievatolo, A. (2010). Bayesian density estimation and model selection using nonparametric hierarchical mixtures., Comput. Statist. Data Anal., 54, 816-832.
  • [3] Bacallado, S., Favaro, S. and Trippa, L. (2013). Bayesian nonparametric analysis of reversible Markov chains., Ann. Statist., 41, 870-896.
  • [4] Bachenko, J. and Gale, W.A. (1993). A corpus-based model of interstress timing and structure., J. Aco. Soc. Am., 94, 1797.
  • [5] Bailey, D.H., Jeyabalan, K. and Li, X.S. (2006). A comparison of three high-precision quadrature schemes., Experiment. Math., 14, 317-329.
  • [6] Barger, K. and Bunge, J. (2010). Objective Bayesian estimation of the number of species., Bayesian Anal., 5, 619-639.
  • [7] Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: a review., J. Amer. Statist. Assoc., 88, 364-373.
  • [8] Bunge, J., Willis, A. and Walsh, F. (2014). Estimating the number of species in microbial diversity studies., Annu. Rev. Sta. Appl., 1, 427-445.
  • [9] Caron, F. and Fox, E.B. (2015). Sparse graphs using exchangeable random measures., Preprint arXiv:1401.1137.
  • [10] De Blasi, P., Favaro, S., Lijoi, A., Mena, R.H., Prünster, I. and Ruggiero, M. (2014). Are Gibbs-type priors the most natural generalization of the Dirichlet process?, IEEE Trans. Pattern Anal. Mach. Intell., 37, 212-229.
  • [11] Devroye, L. (2009). Random variate generation for exponentially and polynomially tilted stable distributions., ACM Trans. Model. Comp. Simul., 19, 4.
  • [12] Favaro, S., Lijoi, A., Mena, R.H. and Prünster, I. (2009). Bayesian nonparametric inference for species variety with a two parameter Poisson-Dirichlet process prior., J. Roy. Statist. Soc. Ser. B, 71, 993-1008.
  • [13] Favaro, S., Lijoi, A. and Prünster, I. (2012). A new estimator of the discovery probability., Biometrics, 68, 1188-1196.
  • [14] Favaro, S., Lijoi, A. and Prünster, I. (2013). Conditional formulae for Gibbs-type exchangeable random partitions., Ann. Appl. Probab., 23, 1721-1754.
  • [15] Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling., Appl. Statist., 41, 337-348.
  • [16] Gnedin, S., Hansen, B. and Pitman, J. (2007). Notes on the occupancy problem with infinitely many boxes: general asymptotics and power law., Probab. Surv., 4, 146-171.
  • [17] Gnedin, A. and Pitman J. (2006). Exchangeable Gibbs partitions and Stirling triangles., J. Math. Sci., 138, 5674-5685.
  • [18] Griffin, J.E., Kolossiatis, M. and Steel, M.F.J. (2013). Comparing distributions by using dependent normalized random measure mixtures., J. Roy. Statist. Soc. Ser. B, 75, 499-529
  • [19] Guindani, M., Sepulveda, N., Paulino, C.D. and Müller, P. (2014). A Bayesian semiparametric approach for the differential analysis of sequence data., J. Roy. Statist. Soc. Ser. C, 63, 385-404.
  • [20] Huang, V.L., Qin, A.K. and Suganthan, P.N. (2006). Self-adaptive differential evolution algorithm for constrained real-parameter optimization., Proc. IEEE Congress on Evolutionary Computation, 2006.
  • [21] James, L.F. (2002). Poisson process partition calculus with applications to exchangeable models and Bayesian nonparametrics., Preprint arXiv:math/0205093.
  • [22] James, L.F. (2013). Stick-breaking PG$(\alpha,\zeta)$-generalized Gamma processes., Preprint arXiv:1308.6570.
  • [23] Lee, J., Quintana, F.A., Müller, P. and Trippa, L. (2013). Defining predictive probability functions for species sampling models., Statist. Sci., 28, 209-222.
  • [24] Lijoi, A., Mena, R.H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species., Biometrika, 94, 715-740.
  • [25] Lijoi, A., Mena, R.H. and Prünster, I. (2007). Controlling the reinforcement in Bayesian non-parametric mixture models., J. Roy. Statist. Soc. Ser. B, 69, 769-786.
  • [26] Mao, C.X. (2004). Prediction of the conditional probability of discovering a new class., J. Amer. Statist. Assoc., 99, 1108-1118.
  • [27] Mao, C.X. and Lindsay, B.G. (2002). A Poisson model for the coverage problem with a genomic application., Biometrika, 89, 669-82.
  • [28] Navarrete, C., Quintana, F. and Müller, P. (2008). Some issues on nonparametric Bayesian modeling using species sampling models., Stat. Model., 8, 3-21.
  • [29] Pitman, J. (1995). Exchangeable and partially exchangeable random partitions., Probab. Theory Related Fields, 102, 145-158.
  • [30] Pitman, J. (2003). Poisson-Kingman partitions. In, Science and Statistics: a Festschrift for Terry Speed (D.R. Goldstein, Ed.) Lecture Notes Monograph Series 40 1-34. IMS, Beachwood, OH.
  • [31] Pitman, J. and Yor, M. (1997). The two parameter Poisson-Dirichlet distribution derived from a stable subordinator., Ann. Probab., 25, 855-900.
  • [32] Sampson, G. (2001)., Empirical Linguistics. Continuum Press, London - New York.
  • [33] Zhang, H. and Stern, H. (2009). Sample size calculation for finding unseen species., Bayesian Anal., 4, 763-792.