Statistical Science

Cluster and Feature Modeling from Combinatorial Stochastic Processes

Tamara Broderick, Michael I. Jordan, and Jim Pitman

Full-text: Open access

Abstract

One of the focal points of the modern literature on Bayesian nonparametrics has been the problem of clustering, or partitioning, where each data point is modeled as being associated with one and only one of some collection of groups called clusters or partition blocks. Underlying these Bayesian nonparametric models are a set of interrelated stochastic processes, most notably the Dirichlet process and the Chinese restaurant process. In this paper we provide a formal development of an analogous problem, called feature modeling, for associating data points with arbitrary nonnegative integer numbers of groups, now called features or topics. We review the existing combinatorial stochastic process representations for the clustering problem and develop analogous representations for the feature modeling problem. These representations include the beta process and the Indian buffet process as well as new representations that provide insight into the connections between these processes. We thereby bring the same level of completeness to the treatment of Bayesian nonparametric feature modeling that has previously been achieved for Bayesian nonparametric clustering.

Article information

Source
Statist. Sci., Volume 28, Number 3 (2013), 289-312.

Dates
First available in Project Euclid: 28 August 2013

Permanent link to this document
https://projecteuclid.org/euclid.ss/1377696938

Digital Object Identifier
doi:10.1214/13-STS434

Mathematical Reviews number (MathSciNet)
MR3135534

Zentralblatt MATH identifier
1331.62124

Keywords
Cluster feature Dirichlet process beta process Chinese restaurant process Indian buffet process nonparametric Bayesian combinatorial stochastic process

Citation

Broderick, Tamara; Jordan, Michael I.; Pitman, Jim. Cluster and Feature Modeling from Combinatorial Stochastic Processes. Statist. Sci. 28 (2013), no. 3, 289--312. doi:10.1214/13-STS434. https://projecteuclid.org/euclid.ss/1377696938


Export citation

References

  • Adams, R. P., Ghahramani, Z. and Jordan, M. I. (2010). Tree-structured stick breaking for hierarchical data. Adv. Neural Inf. Process. Syst. 23 19–27.
  • Aldous, D. J. (1985). Exchangeability and related topics. In École D’été de Probabilités de Saint-Flour, XIII—1983. Lecture Notes in Math. 1117 1–198. Springer, Berlin.
  • Bertoin, J. (1996). Lévy Processes. Cambridge Tracts in Mathematics 121. Cambridge Univ. Press, Cambridge.
  • Bertoin, J. (1999). Subordinators: Examples and Applications. In Lectures on Probability Theory and Statistics (Saint-Flour, 1997). Lecture Notes in Math. 1717 1–91. Springer, Berlin.
  • Bertoin, J. (2000). Subordinators, Lévy processes with no negative jumps, and branching processes. Unpublished manuscript.
  • Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist. 1 353–355.
  • Blei, D. M. and Frazier, P. I. (2011). Distance dependent Chinese restaurant processes. J. Mach. Learn. Res. 12 2461–2488.
  • Blei, D. M., Griffiths, T. L. and Jordan, M. I. (2010). The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57 Art. 7, 30.
  • Blei, D. M. and Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Anal. 1 121–143 (electronic).
  • Bochner, S. (1955). Harmonic Analysis and the Theory of Probability. Univ. California Press, Berkeley and Los Angeles.
  • Broderick, T., Jordan, M. I. and Pitman, J. (2012). Beta processes, stick-breaking and power laws. Bayesian Anal. 7 439–475.
  • Broderick, T., Pitman, J. and Jordan, M. I. (2013). Feature allocations, probability functions, and paintboxes. Bayesian Anal. To appear.
  • Broderick, T., Mackey, L., Paisley, J. and Jordan, M. I. (2011). Combinatorial clustering and the beta negative binomial process. Available at arXiv:1111.1802.
  • De Finetti, B. (1931). Funzione caratteristica di un fenomeno aleatorio. Atti della R. Academia Nazionale dei Lincei, Serie 6. 4 251–299.
  • Dunson, D. B. and Park, J.-H. (2008). Kernel stick-breaking processes. Biometrika 95 307–323.
  • Escobar, M. D. (1994). Estimating normal means with a Dirichlet process prior. J. Amer. Statist. Assoc. 89 268–277.
  • Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • Freedman, D. A. (1965). Bernard Friedman’s urn. Ann. Math. Statist. 36 956–970.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 721–741.
  • Gnedin, A. and Pitman, J. (2006). Exchangeable Gibbs partitions and Stirling triangles. J. Math. Sci. 138 5674–5685.
  • Griffiths, T. and Ghahramani, Z. (2006). Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18 (Y. Weiss, B. Schölkopf and J. Platt, eds.) 475–482. MIT Press, Cambridge, MA.
  • Griffiths, T. L. and Ghahramani, Z. (2011). The Indian buffet process: An introduction and review. J. Mach. Learn. Res. 12 1185–1224.
  • Hansen, B. and Pitman, J. (1998). Prediction Rules for Exchangeable Sequences Related to Species Sampling. Technical Report 520, Univ. California, Berkeley.
  • Hewitt, E. and Savage, L. J. (1955). Symmetric measures on Cartesian products. Trans. Amer. Math. Soc. 80 470–501.
  • Hjort, N. L. (1990). Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist. 18 1259–1294.
  • Hoppe, F. M. (1984). Pólya-like urns and the Ewens’ sampling formula. J. Math. Biol. 20 91–94.
  • Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161–173.
  • Ishwaran, H. and Zarepour, M. (2000). Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika 87 371–390.
  • Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. and Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning 37 183–233.
  • Kim, Y. (1999). Nonparametric Bayesian estimators for counting processes. Ann. Statist. 27 562–588.
  • Kingman, J. F. C. (1967). Completely random measures. Pacific J. Math. 21 59–78.
  • Kingman, J. F. C. (1978). The representation of partition structures. J. London Math. Soc. (2) 18 374–380.
  • Kingman, J. F. C. (1993). Poisson Processes. Oxford Studies in Probability 3. Oxford Univ. Press, New York.
  • Lee, J., Quintana, F. A., Müller, P. and Trippa, L. (2008). Defining predictive probability functions for species sampling models. Technical report.
  • Li, W. and McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of the 23rd International Conference on Machine Learning 577–584. ACM, New York, NY.
  • MacEachern, S. N. (1994). Estimating normal means with a conjugate style Dirichlet process prior. Comm. Statist. Simulation Comput. 23 727–741.
  • McCloskey, J. W. (1965). A model for the distribution of individuals by species in an environment. Ph.D. thesis, Michigan State Univ.
  • McCullagh, P., Pitman, J. and Winkel, M. (2008). Gibbs fragmentation trees. Bernoulli 14 988–1002.
  • Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 9 249–265.
  • Paisley, J., Zaas, A., Woods, C. W., Ginsburg, G. S. and Carin, L. (2010). A stick-breaking construction of the beta process. In International Conference on Machine Learning. Haifa, Israel.
  • Papaspiliopoulos, O. and Roberts, G. O. (2008). Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95 169–186.
  • Patil, G. P. and Taillie, C. (1977). Diversity as a concept and its implications for random communities. In Proceedings of the 41st Session of the International Statistical Institute (New Delhi, 1977) 497–515. New Delhi.
  • Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145–158.
  • Pitman, J. (1996). Some developments of the Blackwell–MacQueen urn scheme. In Statistics, Probability and Game Theory. Institute of Mathematical Statistics Lecture Notes—Monograph Series 30 245–267. IMS, Hayward, CA.
  • Pitman, J. (2003). Poisson–Kingman partitions. In Statistics and Science: A Festschrift for Terry Speed. Institute of Mathematical Statistics Lecture Notes—Monograph Series 40 1–34. IMS, Beachwood, OH.
  • Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Math. 1875. Springer, Berlin.
  • Pitman, J. and Yor, M. (1997). The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 855–900.
  • Pólya, G. (1930). Sur quelques points de la théorie des probabilités. Ann. Inst. H. Poincaré 1 117–161.
  • Rogers, L. C. G. and Williams, D. (2000). Diffusions, Markov Processes, and Martingales. Vol. 1: Foundations. Cambridge Univ. Press, Cambridge.
  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statist. Sinica 4 639–650.
  • Teh, Y. W., Görür, D. and Ghahramani, Z. (2007). Stick-breaking construction for the indian buffet process. In Proceedings of the International Conference on Artificial Intelligence and Statistics 11.
  • Thibaux, R. and Jordan, M. I. (2007). Hierarchical beta processes and the Indian buffet process. In Proceedings of the International Conference on Artificial Intelligence and Statistics 11.
  • Walker, S. G. (2007). Sampling the Dirichlet mixture model with slices. Comm. Statist. Simulation Comput. 36 45–54.
  • Wolpert, R. L. and Ickstadt, K. (2004). Reflecting uncertainty in inverse problems: A Bayesian solution using Lévy processes. Inverse Problems 20 1759–1771.
  • Zhou, M., Hannah, L., Dunson, D. and Carin, L. (2012). Beta-negative binomial process and Poisson factor analysis. In International Conference on Artificial Intelligence and Statistics.