The Annals of Applied Probability

Bayesian nonparametric estimators derived from conditional Gibbs structures

Antonio Lijoi, Igor Prünster, and Stephen G. Walker

Full-text: Open access

Abstract

We consider discrete nonparametric priors which induce Gibbs-type exchangeable random partitions and investigate their posterior behavior in detail. In particular, we deduce conditional distributions and the corresponding Bayesian nonparametric estimators, which can be readily exploited for predicting various features of additional samples. The results provide useful tools for genomic applications where prediction of future outcomes is required.

Article information

Source
Ann. Appl. Probab. Volume 18, Number 4 (2008), 1519-1547.

Dates
First available: 21 July 2008

Permanent link to this document
http://projecteuclid.org/euclid.aoap/1216677130

Digital Object Identifier
doi:10.1214/07-AAP495

Zentralblatt MATH identifier
1142.62333

Mathematical Reviews number (MathSciNet)
MR2434179

Subjects
Primary: 62G05: Estimation 62F15: Bayesian inference 60G57: Random measures

Keywords
Bayesian nonparametrics Dirichlet process exchangeable random partitions generalized factorial coefficients generalized gamma process population genetics species sampling models two parameter Poisson–Dirichlet process

Citation

Lijoi, Antonio; Prünster, Igor; Walker, Stephen G. Bayesian nonparametric estimators derived from conditional Gibbs structures. The Annals of Applied Probability 18 (2008), no. 4, 1519--1547. doi:10.1214/07-AAP495. http://projecteuclid.org/euclid.aoap/1216677130.


Export citation

References

  • [1] Adams, M., Kelley, J., Gocayne, J., Mark, D., Polymeropoulos, M., Xiao, H., Merril, C., Wu, A., Olde, B., Moreno, R., Kerlavage, A., McCombe, W. and Venter, J. (1991). Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252 1651–1656.
  • [2] Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2 1152–1174.
  • [3] Arratia, R., Barbour, A. D. and Tavaré, S. (2003). Logarithmic Combinatorial Structures: A Probabilistic Approach. EMS, Zürich.
  • [4] Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes. Adv. in Appl. Probab. 31 929–953.
  • [5] Charalambides, C. A. (2005). Combinatorial Methods in Discrete Distributions. Wiley, Hoboken, NJ.
  • [6] Charalambides, C. A. and Singh, J. (1988). A review of the Stirling numbers, their generalizations and statistical applications. Commun. Statist. Theory Methods 17 2533–2595.
  • [7] Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3 87–112.
  • [8] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • [9] Gnedin, A. and Pitman, J. (2005). Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. POMI 325 83–102, 244–245.
  • [10] Griffiths, R. C. and Lessard, S. (2005). Ewens’ sampling formula and related formulae: combinatorial proofs, extensions to variable population size and applications to ages of alleles. Theor. Popul. Biol. 68 167–177.
  • [11] Griffiths, R. C. and Spanò, D. (2007). Record indices and age-ordered frequencies in exchangeable Gibbs partitions. Electron. J. Probab. 12 1101–1130.
  • [12] Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161–173.
  • [13] Ishwaran, H. and James, L. F. (2003). Generalized weighted Chinese restaurant processes for species sampling mixture models. Statist. Sinica 13 1211–1235.
  • [14] James, L. F. (2002). Poisson process partition calculus with applications to exchangeable models and Bayesian nonparametrics. Manuscript. Available at http://arxiv.org/pdf/math.PR/0205093.
  • [15] James, L. F., Lijoi, A. and Prünster, I. (2006). Conjugacy as a distinctive feature of the Dirichlet process. Scand. J. Statist. 33 105–120.
  • [16] James, L. F., Lijoi, A. and Prünster, I. (2008). Posterior analysis for normalized random measures with independent increments. Scand. J. Statist. To appear.
  • [17] Kingman, J. F. C. (1975). Random discrete distributions (with discussion). J. Roy. Statist. Soc. Ser. B 37 1–22.
  • [18] Kerov, S. (1995). Coherent random allocations and the Ewens–Pitman sampling formula. PDMI Preprint, Steklov Math. Institute, St. Petersburg.
  • [19] Lijoi, A., Mena, R. H. and Prünster, I. (2006). Controlling the reinforcement in Bayesian mixture models. J. Roy. Statist. Soc. Ser. B 69 715–740.
  • [20] Lijoi, A., Mena, R. H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769–786.
  • [21] Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates. I. Density estimates. Ann. Statist. 12 351–357.
  • [22] Lo, A. Y. and Weng, C.-S. (1989). On a class of Bayesian nonparametric estimates. II. Hazard rate estimates. Ann. Inst. Statist. Math. 41 227–245.
  • [23] Mao, C. X. (2004). Prediction of the conditional probability of discovering a new class. J. Amer. Statist. Assoc. 99 1108–1118.
  • [24] Mao, C. X. and Lindsay, B. G. (2002). A Poisson model for the coverage problem with a genomic application. Biometrika 89 669–682.
  • [25] Mao, C. X. (2007). Estimating species accumulation curves and diversity indices. Statist. Sinica 17 761–775.
  • [26] Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145–158.
  • [27] Pitman, J. (1996). Some developments of the Blackwell–MacQueen urn scheme. Statistics, Probability and Game Theory. Papers in Honor of David Blackwell (T. S. Ferguson et al., eds.). Lecture Notes Monograph Series 30 245–267. IMS, Hayward, CA.
  • [28] Pitman, J. (2003). Poisson–Kingman partitions. Science and Statistics: A Festschrift for Terry Speed (D. R. Goldstein, ed.). Lecture Notes Monograph Series 40 1–34. IMS, Beachwood, OH.
  • [29] Pitman, J. (2006). Combinatorial Stochastic Processes. Springer, Berlin.
  • [30] Quackenbush, J., Cho, J., Lee, D., Liang, F., Holt, I., Karamycheva, S., Parvizi, B., Pertea, G., Sultana, R. and White, J. (2000). The TIGR gene indices: Analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29 159–164.
  • [31] Regazzini, E., Lijoi, A. and Prünster, I. (2003). Distributional results for means of random measures with independent increments. Ann. Statist. 31 560–585.
  • [32] Susko, E. and Roger, A. J. (2004). Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys. Bioinformatics 20 2279–2287.
  • [33] Tavaré, E. and Ewens, W. J. (1998). The Ewens sampling formula. In Encyclopedia of Statistical Science (S. Kotz, C. B. Read and D. L. Banks, eds.) 2 update 230–234. Wiley, New York.
  • [34] Teh, Y. W. (2006) A hierarchical Bayesian language model based on Pitman–Yor processes. Coling ACL Proceedings 44 985–992.
  • [35] Teh, Y. W, Jordan, M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101 1566–1581.
  • [36] Zabell, S. L. (1982). W. E. Johnson’s “sufficientness” postulate. Ann. Statist. 10 1090–1099.