Bayesian Analysis

Inference of global clusters from locally distributed data

XuanLong Nguyen

Full-text: Open access


We consider the problem of analyzing the heterogeneity of clustering distributions for multiple groups of observed data, each of which is indexed by a covariate value, and inferring global clusters arising from observations aggregated over the covariate domain. We propose a novel Bayesian nonparametric method reposing on the formalism of spatial modeling and a nested hierarchy of Dirichlet processes. We provide an analysis of the model properties, relating and contrasting the notions of local and global clusters. We also provide an efficient inference algorithm, and demonstrate the utility of our method in several data examples, including the problem of object tracking and a global clustering analysis of functional data where the functional identity information is not available.

Article information

Bayesian Anal., Volume 5, Number 4 (2010), 817-845.

First available in Project Euclid: 19 June 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

global clustering local clustering nonparametric Bayes hierarchical Dirichlet process Gaussian process graphical model spatial dependence Markov Chain Monte Carlo model identifiability


Nguyen, XuanLong. Inference of global clusters from locally distributed data. Bayesian Anal. 5 (2010), no. 4, 817--845. doi:10.1214/10-BA529.

Export citation


  • Aldous, D. (1985). "Exchangeability and related topics." École d'Été de Probabilités de Saint-Flour XIII-1983, 1–198.
  • Antoniak, C. (1974). "Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems." Annals of Statistics, 2(6): 1152––1174.
  • Blackwell, D. and MacQueen, J. (1973). "Ferguson Distributions via Polya Urn Schemes." Annals of Statistics, 1: 353–355.
  • Brumback, B. and Rice, J. (1998). "Smoothing spline models for the analysis of nested and crossed samples of curves." J. Amer. Statist. Assoc., 93(443): 961–980.
  • Cifarelli, D. and Regazzini, E. (1978). "Nonparametric statistical problems under partial exchangeability: The role of associative means." Technical report, Quaderni Istituto Matematica Finanziaria dell’Universit`a di Torino.
  • Cressie, N. (1993). Statistics for Spatial Data. Wiley, NY.
  • DeIorio, M., Mueller, P., Rosner, G., and MacEachern, S. (2004). "An ANOVA" model for dependent random measures. J. Amer. Statist. Assoc., 99: 205–215.
  • Duan, J., Guindani, M., and Gelfand, A. (2007). "Generalized spatial Dirichlet processes." Biometrika, 94(4): 809–825.
  • Dunson, D. (2008). "Kernel local partition processes for functional data." Technical Report 26, Department of Statistical Science, Duke University.
  • Dunson, D. and Park, J.-H. (2008). "Kernel stick-breaking processes." Biometrika, 95(2): 307–323.
  • Escobar, M. and West, M. (1995). "Bayesian Density Estimation and Inference Using Mixtures." Journal of the American Statistical Association, 90: 577––588.
  • Ferguson, T. (1973). "A Bayesian analysis of some nonparametric problems." Ann. Statist., 1: 209–230.
  • Gelfand, A., Kottas, A., and MacEachern, S. (2005). "Bayesian nonparametric spatial modeling with Dirichlet process mixing." J. Amer. Statist. Assoc., 100: 1021–1035.
  • Griffin, J. and Steel, M. (2006). "Order-based dependent Dirichlet processes." J. Amer. Statist. Assoc., 101: 179–194.
  • Hjort, N., Holmes, C., Mueller, P., and (Eds.), S. W. (2010). Bayesian Nonparametrics: Principles and Practice. Cambridge University Press.
  • Ishwaran, H. and James, L. (2001). "Gibbs sampling methods for stick-breaking priors." J. Amer. Statist. Assoc., 96: 161–173.
  • Ishwaran, H. and Zarepour, M. (2002). "Dirichlet prior sieves in finite normal mixtures." Statistica Sinica, 12: 941–963.
  • –- (2002). "Exact and Approximate Sum-Representations for the Dirichlet Process." Canadian Journal of Statistics, 30: 269––283.
  • Jordan, M. (2004). "Graphical models." Statistical Science, Special Issue on Bayesian Statistics (19): 140–155.
  • Lauritzen, S. (1996). Graphical models. Oxford University Press.
  • Lo, A. (1984). "On a class of Bayesian nonparametric estimates I: Density estimates." Annals of Statistics, 12(1): 351–357.
  • MacEachern, S. (1999). "Dependent Nonparametric Processes." In Proceedings of the Section on Bayesian Statistical Science, American Statistical Association.
  • MacEachern, S. and Mueller, P. (1998). "Estimating Mixture of Dirichlet Process Models." Journal of Computational and Graphical Statistics, 7: 223––238.
  • Mueller, P., Quintana, F., and Rosner, G. (2004). "A Method for Combining Inference Across Related Nonparametric Bayesian Models." Journal of the Royal Statistical Society, 66: 735–749.
  • Muliere, P. and Petrone, S. (1993). "A Bayesian Predictive Approach to Sequential Search for an Optimal Dose: Parametric and Nonparametric Models." Journal of the Italian Statistical Society, 2: 349–364.
  • Muliere, P. and Secchi, P. (1995). "A note on a proper Bayesian bootstrap." Technical Report 18, Dipartimento di Economia Politica e Metodi Quantitativi, Universita degli Sudi di Pavia.
  • Neal, R. (1992). "Bayesian Mixture Modeling." In Proceedings of the Workshop on Maximum Entropy and Bayesian Methods of Statistical Analysis, volume 11, 197––211.
  • Nguyen, X. and Gelfand, A. (2010). "The Dirichlet labeling process for clustering functional data." Statistica Sinica, to appear.
  • Petrone, S., Guidani, M., and Gelfand, A. (2009). "Hybrid Dirichlet processes for functional data." Journal of the Royal Statistical Society B, 71(4): 755–782.
  • Pittman, J. (2002). "Poisson-Dirichlet and GEM invariant distributions for split-and-merge transformations of an interval partition." Combinatorics, Probability and Computing, 11: 501–514.
  • Rabiner, L. (1989). "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition." Proceedings of the IEEE, 77: 257––285.
  • Rodriguez, A. and Dunson, D. (2009). "Nonparametric Bayesian models through probit stick-breaking processes." Technical report, University of California, Santa Cruz.
  • Rodriguez, A., Dunson, D., and Gelfand, A. (2010). "Latent stick-breaking processes." J. Amer. Statist. Assoc., 105(490): 647–659.
  • Schwartz, L. (1965). "On Bayes procedures." Z. Wahr. Verw. Gebiete, 4: 10–26.
  • Sethuraman, J. (1994). "A constructive definition of Dirichlet priors." Statistica Sinica, 4: 639–650.
  • Teh, Y., Jordan, M., Beal, M., and Blei, D. (2006). "Hierarchical Dirichlet processes." J. Amer. Statist. Assoc., 101: 1566–1581.
  • Teh, Y. W. and Jordan, M. I. (2010). "Hierarchical Bayesian nonparametric models with applications." Bayesian Nonparametrics: Principles and Practice, In N. Hjort, C. Holmes, P. Mueller, and S. Walker (Eds.).
  • Wainwright, M. J. and Jordan, M. I. (2008). "Graphical models, exponential families, and variational inference." Foundations and Trends in Machine Learning, 1: 1–305.