There is a very rich literature proposing Bayesian approaches for clustering starting with a prior probability distribution on partitions. Most approaches assume exchangeability, leading to simple representations in terms of Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors encompass a broad class of such cases, including Dirichlet and Pitman-Yor processes. Even though there have been some proposals to relax the exchangeability assumption, allowing covariate-dependence and partial exchangeability, limited consideration has been given on how to include concrete prior knowledge on the partition. For example, we are motivated by an epidemiological application, in which we wish to cluster birth defects into groups and we have prior knowledge of an initial clustering provided by experts. As a general approach for including such prior knowledge, we propose a Centered Partition (CP) process that modifies the EPPF to favor partitions close to an initial one. Some properties of the CP prior are described, a general algorithm for posterior computation is developed, and we illustrate the methodology through simulation examples and an application to the motivating epidemiology study of birth defects.
Bayesian Anal.
16(1):
301-370
(March 2021).
DOI: 10.1214/20-BA1197
Arratia, R. and DeSalvo, S. (2016). “Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example.” Combinatorics, Probability and Computing, 25(3): 324–351. MR3482658 10.1017/S0963548315000358Arratia, R. and DeSalvo, S. (2016). “Probabilistic divide-and-conquer: a new exact simulation method, with integer partitions as an example.” Combinatorics, Probability and Computing, 25(3): 324–351. MR3482658 10.1017/S0963548315000358
Barrientos, A. F., Jara, A., Quintana, F. A., et al. (2012). “On the support of MacEachern’s dependent Dirichlet processes and extensions.” Bayesian Analysis, 7(2): 277–310. MR2934952 10.1214/12-BA709Barrientos, A. F., Jara, A., Quintana, F. A., et al. (2012). “On the support of MacEachern’s dependent Dirichlet processes and extensions.” Bayesian Analysis, 7(2): 277–310. MR2934952 10.1214/12-BA709
Barry, D. and Hartigan, J. A. (1992). “Product partition models for change point problems.” The Annals of Statistics, 260–279. MR1150343 10.1214/aos/1176348521Barry, D. and Hartigan, J. A. (1992). “Product partition models for change point problems.” The Annals of Statistics, 260–279. MR1150343 10.1214/aos/1176348521
Blei, D. M. and Frazier, P. I. (2011). “Distance dependent Chinese restaurant processes.” Journal of Machine Learning Research, 12(Aug): 2461–2488. MR2834504Blei, D. M. and Frazier, P. I. (2011). “Distance dependent Chinese restaurant processes.” Journal of Machine Learning Research, 12(Aug): 2461–2488. MR2834504
Botto, L. D., Lin, A. E., Riehle-Colarusso, T., Malik, S., Correa, A., and Study, N. B. D. P. (2007). “Seeking causes: classifying and evaluating congenital hearth defects in etiologic studies.” Birth Defects Research Part A: Clinical and Molecular Teratology, 79(10): 714–727.Botto, L. D., Lin, A. E., Riehle-Colarusso, T., Malik, S., Correa, A., and Study, N. B. D. P. (2007). “Seeking causes: classifying and evaluating congenital hearth defects in etiologic studies.” Birth Defects Research Part A: Clinical and Molecular Teratology, 79(10): 714–727.
Caron, F., Davy, M., Doucet, A., Duflos, E., and Vanheeghe, P. (2006). “Bayesian inference for dynamic models with Dirichlet process mixtures.” In International Conference on Information Fusion. Florence, Italy. MR2439814 10.1109/TSP.2007.900167Caron, F., Davy, M., Doucet, A., Duflos, E., and Vanheeghe, P. (2006). “Bayesian inference for dynamic models with Dirichlet process mixtures.” In International Conference on Information Fusion. Florence, Italy. MR2439814 10.1109/TSP.2007.900167
Casella, G., Moreno, E., Girón, F. J., et al. (2014). “Cluster analysis, model selection, and prior distributions on models.” Bayesian Analysis, 9(3): 613–658. MR3256058 10.1214/14-BA869Casella, G., Moreno, E., Girón, F. J., et al. (2014). “Cluster analysis, model selection, and prior distributions on models.” Bayesian Analysis, 9(3): 613–658. MR3256058 10.1214/14-BA869
Correa, A., Gilboa, S. M., Besser, L. M., Botto, L. D., Moore, C. A., Hobbs, C. A., Cleves, M. A., Riehle-Colarusso, T. J., Waller, D. K., Reece, E. A., et al. (2008). “Diabetes mellitus and birth defects.” American Journal of Obstetrics and Gynecology, 199(3): 237.e1–237.e9.Correa, A., Gilboa, S. M., Besser, L. M., Botto, L. D., Moore, C. A., Hobbs, C. A., Cleves, M. A., Riehle-Colarusso, T. J., Waller, D. K., Reece, E. A., et al. (2008). “Diabetes mellitus and birth defects.” American Journal of Obstetrics and Gynecology, 199(3): 237.e1–237.e9.
Dahl, D. B., Day, R., and Tsai, J. W. (2017). “Random partition distribution indexed by pairwise information.” Journal of the American Statistical Association, 112(518): 721–732. MR3671765 10.1080/01621459.2016.1165103Dahl, D. B., Day, R., and Tsai, J. W. (2017). “Random partition distribution indexed by pairwise information.” Journal of the American Statistical Association, 112(518): 721–732. MR3671765 10.1080/01621459.2016.1165103
Davey, B. A. and Priestley, H. A. (2002). Introduction to Lattices and Order. Cambridge University Press. MR1902334 10.1017/CBO9780511809088Davey, B. A. and Priestley, H. A. (2002). Introduction to Lattices and Order. Cambridge University Press. MR1902334 10.1017/CBO9780511809088
De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prünster, I., and Ruggiero, M. (2015). “Are Gibbs-type priors the most natural generalization of the Dirichlet process?” IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212–229.De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prünster, I., and Ruggiero, M. (2015). “Are Gibbs-type priors the most natural generalization of the Dirichlet process?” IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212–229.
De Iorio, M., Müller, P., Rosner, G. L., and MacEachern, S. N. (2004). “An ANOVA model for dependent random measures.” Journal of the American Statistical Association, 99(465): 205–215. MR2054299 10.1198/016214504000000205De Iorio, M., Müller, P., Rosner, G. L., and MacEachern, S. N. (2004). “An ANOVA model for dependent random measures.” Journal of the American Statistical Association, 99(465): 205–215. MR2054299 10.1198/016214504000000205
DeSalvo, S. (2017). “Improvements to exact Boltzmann sampling using probabilistic divide-and-conquer and the recursive method.” Pure Mathematics and Applications, 26(1): 22–45. MR3674129 10.1515/puma-2015-0020DeSalvo, S. (2017). “Improvements to exact Boltzmann sampling using probabilistic divide-and-conquer and the recursive method.” Pure Mathematics and Applications, 26(1): 22–45. MR3674129 10.1515/puma-2015-0020
Dunson, D. B. and Park, J.-H. (2008). “Kernel stick-breaking processes.” Biometrika, 95(2): 307–323. MR2521586 10.1093/biomet/asn012Dunson, D. B. and Park, J.-H. (2008). “Kernel stick-breaking processes.” Biometrika, 95(2): 307–323. MR2521586 10.1093/biomet/asn012
Fall, M. D. and Barat, É. (2014). “Gibbs sampling methods for Pitman-Yor mixture models.” Working paper or preprint. URL https://hal.archives-ouvertes.fr/hal-00740770Fall, M. D. and Barat, É. (2014). “Gibbs sampling methods for Pitman-Yor mixture models.” Working paper or preprint. URL https://hal.archives-ouvertes.fr/hal-00740770
Gelfand, A. E., Kottas, A., and MacEachern, S. N. (2005). “Bayesian nonparametric spatial modeling with Dirichlet process mixing.” Journal of the American Statistical Association, 100(471): 1021–1035. MR2201028 10.1198/016214504000002078Gelfand, A. E., Kottas, A., and MacEachern, S. N. (2005). “Bayesian nonparametric spatial modeling with Dirichlet process mixing.” Journal of the American Statistical Association, 100(471): 1021–1035. MR2201028 10.1198/016214504000002078
Gnedin, A. and Pitman, J. (2006). “Exchangeable Gibbs partitions and Stirling triangles.” Journal of Mathematical Sciences, 138(3): 5674–5685. MR2160320 10.1007/s10958-006-0335-zGnedin, A. and Pitman, J. (2006). “Exchangeable Gibbs partitions and Stirling triangles.” Journal of Mathematical Sciences, 138(3): 5674–5685. MR2160320 10.1007/s10958-006-0335-z
Griffin, J. E. and Steel, M. F. (2006). “Order-based dependent Dirichlet processes.” Journal of the American Statistical Association, 101(473): 179–194. MR2268037 10.1198/016214505000000727Griffin, J. E. and Steel, M. F. (2006). “Order-based dependent Dirichlet processes.” Journal of the American Statistical Association, 101(473): 179–194. MR2268037 10.1198/016214505000000727
Hartigan, J. (1990). “Partition models.” Communications in Statistics – Theory and Methods, 19(8): 2745–2756. MR1088047 10.1080/03610929008830345Hartigan, J. (1990). “Partition models.” Communications in Statistics – Theory and Methods, 19(8): 2745–2756. MR1088047 10.1080/03610929008830345
Jensen, S. T. and Liu, J. S. (2008). “Bayesian clustering of transcription factor binding motifs.” Journal of the American Statistical Association, 103(481): 188–200. MR2420226 10.1198/016214507000000365Jensen, S. T. and Liu, J. S. (2008). “Bayesian clustering of transcription factor binding motifs.” Journal of the American Statistical Association, 103(481): 188–200. MR2420226 10.1198/016214507000000365
Koren, G., Madjunkova, S., and Maltepe, C. (2014). “The protective effects of nausea and vomiting of pregnancy against adverse fetal outcome. A systematic review.” Reproductive Toxicology, 47: 77–80.Koren, G., Madjunkova, S., and Maltepe, C. (2014). “The protective effects of nausea and vomiting of pregnancy against adverse fetal outcome. A systematic review.” Reproductive Toxicology, 47: 77–80.
Lin, A. E., Herring, A. H., Amstutz, K. S., Westgate, M.-N., Lacro, R. V., Al-Jufan, M., Ryan, L., and Holmes, L. B. (1999). “Cardiovascular malformations: changes in prevalence and birth status, 1972–1990.” American Journal of Medical Genetics, 84(2): 102–110.Lin, A. E., Herring, A. H., Amstutz, K. S., Westgate, M.-N., Lacro, R. V., Al-Jufan, M., Ryan, L., and Holmes, L. B. (1999). “Cardiovascular malformations: changes in prevalence and birth status, 1972–1990.” American Journal of Medical Genetics, 84(2): 102–110.
MacEachern, S. N. (1999). “Dependent nonparametric processes.” In Proceedings of the Bayesian Section., 50–55. Alexandria, VA: American Statistical Association.MacEachern, S. N. (1999). “Dependent nonparametric processes.” In Proceedings of the Bayesian Section., 50–55. Alexandria, VA: American Statistical Association.
MacLehose, R. F. and Dunson, D. B. (2010). “Bayesian semiparametric multiple shrinkage.” Biometrics, 66(2): 455–462. MR2758825 10.1111/j.1541-0420.2009.01275.xMacLehose, R. F. and Dunson, D. B. (2010). “Bayesian semiparametric multiple shrinkage.” Biometrics, 66(2): 455–462. MR2758825 10.1111/j.1541-0420.2009.01275.x
Meilă, M. (2007). “Comparing clusterings – an information based distance.” Journal of Multivariate Analysis, 98(5): 873–895. MR2325412 10.1016/j.jmva.2006.11.013Meilă, M. (2007). “Comparing clusterings – an information based distance.” Journal of Multivariate Analysis, 98(5): 873–895. MR2325412 10.1016/j.jmva.2006.11.013
Møller, J., Pettitt, A. N., Reeves, R., and Berthelsen, K. K. (2006). “An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants.” Biometrika, 93(2): 451–458. MR2278096 10.1093/biomet/93.2.451Møller, J., Pettitt, A. N., Reeves, R., and Berthelsen, K. K. (2006). “An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants.” Biometrika, 93(2): 451–458. MR2278096 10.1093/biomet/93.2.451
Monjardet, B. (1981). “Metrics on partially ordered sets – A survey.” Discrete Mathematics, 35(1): 173–184. Special Volume on Ordered Sets. MR0620670 10.1016/0012-365X(81)90206-5Monjardet, B. (1981). “Metrics on partially ordered sets – A survey.” Discrete Mathematics, 35(1): 173–184. Special Volume on Ordered Sets. MR0620670 10.1016/0012-365X(81)90206-5
Müller, P., Quintana, F., and Rosner, G. L. (2011). “A product partition model with regression on covariates.” Journal of Computational and Graphical Statistics, 20(1): 260–278. MR2816548 10.1198/jcgs.2011.09066Müller, P., Quintana, F., and Rosner, G. L. (2011). “A product partition model with regression on covariates.” Journal of Computational and Graphical Statistics, 20(1): 260–278. MR2816548 10.1198/jcgs.2011.09066
Murray, I., Ghahramani, Z., and MacKay, D. J. C. (2006). “MCMC for doubly-intractable distributions.” In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), 359–366. AUAI Press.Murray, I., Ghahramani, Z., and MacKay, D. J. C. (2006). “MCMC for doubly-intractable distributions.” In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), 359–366. AUAI Press.
Neal, R. M. (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9: 249–265. MR1823804 10.2307/1390653Neal, R. M. (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9: 249–265. MR1823804 10.2307/1390653
Paganin, S., Herring, A. H., Olshan, A. F., Dunson, D. B., and The National Birth Defects Prevention Study (2020). “Centered Partition Processes: Informative Priors for Clustering – Supplementary Material.” Bayesian Analysis. 10.1214/20-BA1197SUPPPaganin, S., Herring, A. H., Olshan, A. F., Dunson, D. B., and The National Birth Defects Prevention Study (2020). “Centered Partition Processes: Informative Priors for Clustering – Supplementary Material.” Bayesian Analysis. 10.1214/20-BA1197SUPP
Petrone, S., Guindani, M., and Gelfand, A. E. (2009). “Hybrid Dirichlet mixture models for functional data.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(4): 755–782. MR2750094 10.1111/j.1467-9868.2009.00708.xPetrone, S., Guindani, M., and Gelfand, A. E. (2009). “Hybrid Dirichlet mixture models for functional data.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(4): 755–782. MR2750094 10.1111/j.1467-9868.2009.00708.x
Pitman, J. (1995). “Exchangeable and partially exchangeable random partitions.” Probability Theory and Related Fields, 102(2): 145–158. MR1337249 10.1007/BF01213386Pitman, J. (1995). “Exchangeable and partially exchangeable random partitions.” Probability Theory and Related Fields, 102(2): 145–158. MR1337249 10.1007/BF01213386
Pitman, J. (1997). “Some probabilistic aspects of set partitions.” The American Mathematical Monthly, 104(3): 201–209. MR1436042 10.2307/2974785Pitman, J. (1997). “Some probabilistic aspects of set partitions.” The American Mathematical Monthly, 104(3): 201–209. MR1436042 10.2307/2974785
Pitman, J. and Yor, M. (1997). “The two-Parameter Poisson-Dirichlet distribution derived from a stable subordinator.” The Annals of Probability, 25(2): 855–900. MR1434129 10.1214/aop/1024404422Pitman, J. and Yor, M. (1997). “The two-Parameter Poisson-Dirichlet distribution derived from a stable subordinator.” The Annals of Probability, 25(2): 855–900. MR1434129 10.1214/aop/1024404422
Polson, N. G., Scott, J. G., and Windle, J. (2013). “Bayesian inference for logistic models using Pólya-Gamma latent variables.” Journal of the American Statistical Association, 108(504): 1339–1349. MR3174712 10.1080/01621459.2013.829001Polson, N. G., Scott, J. G., and Windle, J. (2013). “Bayesian inference for logistic models using Pólya-Gamma latent variables.” Journal of the American Statistical Association, 108(504): 1339–1349. MR3174712 10.1080/01621459.2013.829001
Rao, V., Lin, L., and Dunson, D. B. (2016). “Data augmentation for models based on rejection sampling.” Biometrika, 103(2): 319–335. MR3509889 10.1093/biomet/asw005Rao, V., Lin, L., and Dunson, D. B. (2016). “Data augmentation for models based on rejection sampling.” Biometrika, 103(2): 319–335. MR3509889 10.1093/biomet/asw005
Rasmussen, S. A., Olney, R. S., Holmes, L. B., Lin, A. E., Keppler-Noreuil, K. M., and Moore, C. A. (2003). “Guidelines for case classification for the National Birth Defects Prevention Study.” Birth Defects Research Part A: Clinical and Molecular Teratology, 67(3): 193–201.Rasmussen, S. A., Olney, R. S., Holmes, L. B., Lin, A. E., Keppler-Noreuil, K. M., and Moore, C. A. (2003). “Guidelines for case classification for the National Birth Defects Prevention Study.” Birth Defects Research Part A: Clinical and Molecular Teratology, 67(3): 193–201.
Reefhuis, J., Devine, O., Friedman, J. M., Louik, C., and Honein, M. A. (2015). “Specific SSRIs and birth defects: Bayesian analysis to interpret new data in the context of previous reports.” British Medical Journal, 351.Reefhuis, J., Devine, O., Friedman, J. M., Louik, C., and Honein, M. A. (2015). “Specific SSRIs and birth defects: Bayesian analysis to interpret new data in the context of previous reports.” British Medical Journal, 351.
Rodriguez, A. and Dunson, D. B. (2011). “Nonparametric Bayesian models through probit stick-breaking processes.” Bayesian Analysis, 6(1). MR2781811 10.1214/11-BA605Rodriguez, A. and Dunson, D. B. (2011). “Nonparametric Bayesian models through probit stick-breaking processes.” Bayesian Analysis, 6(1). MR2781811 10.1214/11-BA605
Rossi, G. (2015). “Weighted paths between partitions.” arXiv preprint. URL https://arxiv.org/abs/1509.01852Rossi, G. (2015). “Weighted paths between partitions.” arXiv preprint. URL https://arxiv.org/abs/1509.01852
Scarpa, B. and Dunson, D. B. (2009). “Bayesian Hierarchical Functional Data Analysis Via Contaminated Informative Priors.” Biometrics, 65(3): 772–780. MR2649850 10.1111/j.1541-0420.2008.01163.xScarpa, B. and Dunson, D. B. (2009). “Bayesian Hierarchical Functional Data Analysis Via Contaminated Informative Priors.” Biometrics, 65(3): 772–780. MR2649850 10.1111/j.1541-0420.2008.01163.x
Smith, A. N. and Allenby, G. M. (2019). “Demand Models With Random Partitions.” Journal of the American Statistical Association. 10.1080/01621459.2019.1604360Smith, A. N. and Allenby, G. M. (2019). “Demand Models With Random Partitions.” Journal of the American Statistical Association. 10.1080/01621459.2019.1604360
Stam, A. (1983). “Generation of a random partition of a finite set by an urn model.” Journal of Combinatorial Theory, Series A, 35(2): 231–240. MR0712107 10.1016/0097-3165(83)90009-2Stam, A. (1983). “Generation of a random partition of a finite set by an urn model.” Journal of Combinatorial Theory, Series A, 35(2): 231–240. MR0712107 10.1016/0097-3165(83)90009-2
Stanley, R. P. (1997). Enumerative combinatorics. Vol. 1. Cambridge University Press. MR1442260 10.1017/CBO9780511805967Stanley, R. P. (1997). Enumerative combinatorics. Vol. 1. Cambridge University Press. MR1442260 10.1017/CBO9780511805967
Vinh, N. X., Epps, J., and Bailey, J. (2010). “Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance.” Journal of Machine Learning Research, 11(Oct): 2837–2854. MR2738784Vinh, N. X., Epps, J., and Bailey, J. (2010). “Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance.” Journal of Machine Learning Research, 11(Oct): 2837–2854. MR2738784
Vitelli, V., Øystein Sørensen, Crispino, M., Frigessi, A., and Arjas, E. (2018). “Probabilistic preference learning with the Mallows rank model.” Journal of Machine Learning Research, 18(158): 1–49. MR3813807Vitelli, V., Øystein Sørensen, Crispino, M., Frigessi, A., and Arjas, E. (2018). “Probabilistic preference learning with the Mallows rank model.” Journal of Machine Learning Research, 18(158): 1–49. MR3813807
Wade, S. and Ghahramani, Z. (2018). “Bayesian cluster analysis: point estimation and credible balls (with Discussion).” Bayesian Analysis, 13(2): 559–626. MR3807860 10.1214/17-BA1073Wade, S. and Ghahramani, Z. (2018). “Bayesian cluster analysis: point estimation and credible balls (with Discussion).” Bayesian Analysis, 13(2): 559–626. MR3807860 10.1214/17-BA1073
Waller, D. K., Shaw, G. M., Rasmussen, S. A., Hobbs, C. A., Canfield, M. A., Siega-Riz, A.-M., Gallaway, M. S., and Correa, A. (2007). “Prepregnancy obesity as a risk factor for structural birth defects.” Archives of Pediatrics & Adolescent Medicine, 161(8): 745–750.Waller, D. K., Shaw, G. M., Rasmussen, S. A., Hobbs, C. A., Canfield, M. A., Siega-Riz, A.-M., Gallaway, M. S., and Correa, A. (2007). “Prepregnancy obesity as a risk factor for structural birth defects.” Archives of Pediatrics & Adolescent Medicine, 161(8): 745–750.
Wilson, R. and Watkins, J. J. (2013). Combinatorics: Ancient & Modern. OUP Oxford. MR3204727 10.1093/acprof:oso/9780199656592.001.0001Wilson, R. and Watkins, J. J. (2013). Combinatorics: Ancient & Modern. OUP Oxford. MR3204727 10.1093/acprof:oso/9780199656592.001.0001
Yoon, P. W., Rasmussen, S. A., Lynberg, M. C., Moore, C. A., Anderka, M., Carmichael, S. L., Costa, P., Druschel, C., Hobbs, C. A., Romitti, P. A., Langlois, P. H., and Edmonds, L. D. (2001). “The National Birth Defects Prevention Study.” Public Health Reports, 116: 32–40.Yoon, P. W., Rasmussen, S. A., Lynberg, M. C., Moore, C. A., Anderka, M., Carmichael, S. L., Costa, P., Druschel, C., Hobbs, C. A., Romitti, P. A., Langlois, P. H., and Edmonds, L. D. (2001). “The National Birth Defects Prevention Study.” Public Health Reports, 116: 32–40.