Bayesian Analysis

On the Prior and Posterior Distributions Used in Graphical Modelling

Marco Scutari

Full-text: Open access

Abstract

Graphical model learning and inference are often performed using Bayesian techniques. In particular, learning is usually performed in two separate steps. First, the graph structure is learned from the data; then the parameters of the model are estimated conditional on that graph structure. While the probability distributions involved in this second step have been studied in depth, the ones used in the first step have not been explored in as much detail.

In this paper, we will study the prior and posterior distributions defined over the space of the graph structures for the purpose of learning the structure of a graphical model. In particular, we will provide a characterisation of the behaviour of those distributions as a function of the possible edges of the graph. We will then use the properties resulting from this characterisation to define measures of structural variability for both Bayesian and Markov networks, and we will point out some of their possible applications.

Article information

Source
Bayesian Anal. Volume 8, Number 3 (2013), 505-532.

Dates
First available in Project Euclid: 9 September 2013

Permanent link to this document
http://projecteuclid.org/euclid.ba/1378729914

Digital Object Identifier
doi:10.1214/13-BA819

Mathematical Reviews number (MathSciNet)
MR3102220

Zentralblatt MATH identifier
1329.62145

Keywords
Markov Networks Bayesian Networks Random Graphs Structure Learning Multivariate Discrete Distributions

Citation

Scutari, Marco. On the Prior and Posterior Distributions Used in Graphical Modelling. Bayesian Anal. 8 (2013), no. 3, 505--532. doi:10.1214/13-BA819. http://projecteuclid.org/euclid.ba/1378729914.


Export citation

References

  • Agresti, A. and Klingenberg, B. (2005). “Multivariate Tests Comparing Binomial Probabilities, with Application to Safety Studies for Drugs.” Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(4): 691–706.
  • Bang-Jensen, J. and Gutin, G. (2009). Digraphs: Theory, Algorithms and Applications. Springer-Verlag, 2nd edition.
  • Bilodeau, M. and Brenner, D. (1999). Theory of Multivariate Statistics. Springer-Verlag.
  • Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (2007). Discrete Multivariate Analysis: Theory and Practice. Springer.
  • Bollobás, B. (2001). Random Graphs. Cambridge University Press, 2nd edition.
  • Buntine, W. (1991). “Theory Refinement on Bayesian Networks.” In Proceedings of the 7th Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-91), 52–60. Morgan Kaufmann.
  • Chickering, D. M. (1995). “A Transformational Characterization of Equivalent Bayesian Network Structures.” In Besnard, P. and Hanks, S. (eds.), Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 87–98. Morgan Kaufmann.
  • — (2002). “Optimal Structure Identification with Greedy Search.” Journal of Machine Learning Resesearch, 3: 507–554.
  • Cowell, R. G., Dawid, P., Lauritzen, S. L., and Spiegelhalter, D. J. (2007). Probabilistic Networks and Expert Systems. Springer.
  • Diestel, R. (2005). Graph Theory. Springer, 3rd edition.
  • Edwards, D. I. (2000). Introduction to Graphical Modelling. Springer, 2nd edition.
  • Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman & Hall.
  • Farrell, P. and Rogers-Stewart, K. (2008). “Methods for Generating Longitudinally Correlated Binary Data.” International Statistical Review, 76(1): 28–38.
  • Fisher, N. I. and Sen, P. K. (1994). The Collected Works of Wassily Hoeffding. Springer-Verlag.
  • Friedman, N., Goldszmidt, M., and Wyner, A. (1999a). “Data Analysis with Bayesian Networks: A Bootstrap Approach.” In Laskey, K. B. and Prade, H. (eds.), Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 196–205. Morgan Kaufmann.
  • Friedman, N. and Koller, D. (2003). “Being Bayesian about Bayesian Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks.” Machine Learning, 50(1–2): 95–126.
  • Friedman, N., Pe’er, D., and Nachman, I. (1999b). “Learning Bayesian Network Structure from Massive Datasets: The ‘Sparse Candidate’ Algorithm.” In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 206–215. Morgan Kaufmann.
  • Geiger, D. and Heckerman, D. (1994). “Learning Gaussian Networks.” Technical Report MSR-TR-94-10, Microsoft Research, Redmond, Washington.
  • George, E. I. and McCulloch, R. E. (1997). “Approaches for Bayesian Variable Selection.” Statistica Sinica, 7: 339–373.
  • Gillispie, S. B. and Perlman, M. D. (2002). “The Size Distribution for Markov Equivalence Classes of Acyclic Digraph Models.” Artificial Intelligence, 141(1–2): 137–155.
  • Harary, F. and Palmer, E. M. (1973). Graphical Enumeration. Academic Press.
  • Heckerman, D., Geiger, D., and Chickering, D. M. (1995). “Learning Bayesian Networks: The Combination of Knowledge and Statistical Data.” Machine Learning, 20(3): 197–243.
  • Hoeffding, W. (1940). “Masstabinvariante Korrelationstheorie.” Schriften des Mathematischen Instituts und des Instituts für Angewandte Mathematik der Universität Berlin, 5(3): 179–223.
  • Imoto, S., Kim, S. Y., Shimodaira, H., Aburatani, S., Tashiro, K., Kuhara, S., and Miyano, S. (2002). “Bootstrap Analysis of Gene Networks Based on Bayesian Networks and Nonparametric Regression.” Genome Informatics, 13: 369–370.
  • Jensen, F. V. and Nielsen, T. D. (2007). Bayesian Networks and Decision Graphs. Springer, 2nd edition.
  • Johnson, N. L., Kotz, S., and Balakrishnan, N. (1997). Discrete Multivariate Distributions. Wiley.
  • Jungnickel, D. (2008). Graphs, Networks and Algorithms. Springer-Verlag, 3rd edition.
  • Kocherlakota, S. and Kocherlakota, K. (1992). Bivariate Discrete Distributions. CRC Press.
  • Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.
  • Korb, K. and Nicholson, A. (2010). Bayesian Artificial Intelligence. Chapman & Hall, 2nd edition.
  • Krummenauer, F. (1998). “Limit Theorems for Multivariate Discrete Distributions.” Metrika, 47(1): 47–69.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford University Press.
  • Ledoit, O. and Wolf, M. (2003). “Improved Estimation of the Covariance Matrix of Stock Returns with an Application to Portfolio Selection.” Journal of Empirical Finance, 10: 603–621.
  • Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979). Multivariate Analysis. Academic Press.
  • Mari, D. D. and Kotz, S. (2001). Correlation and Dependence. Imperial College Press.
  • Melançon, G., Dutour, I., and Bousquet-Mélou, M. (2000). “Random Generation of DAGs for Graph Drawing.” Technical Report INS-R0005, Centre for Mathematics and Computer Sciences, Amsterdam.
  • Melançon, G. and Fabrice, P. (2004). “Generating Connected Acyclic Digraphs Uniformly at Random.” Information Processing Letters, 90(4): 209–213.
  • Moors, J. J. A. and Muilwijk, J. (1971). “An Inequality for the Variance of a Discrete Random Variable.” Sankhyā: The Indian Journal of Statistics, Series B, 33(3/4): 385–388.
  • Mukherjee, S. and Speed, T. P. (2008). “Network Inference using Informative Priors.” Proceedings of the National Academy of Sciences (PNAS), 105: 14313–14318.
  • Neapolitan, R. E. (2003). Learning Bayesian Networks. Prentice Hall.
  • Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
  • — (2009). Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd edition.
  • R Development Core Team (2012). R: A Language and Environment for Statistical Computing. URL http://www.R-project.org
  • Robinson, R. W. (1973). “Counting Labeled Acyclic Digraphs.” In New Directions in the Theory of Graphs: Proceedings of the 3rd Ann Arbor Conference on Graph Theory, 239–273. Academic Press.
  • Rubinstein, R. Y. (1999). “The Cross-Entropy Method for Combinatorial and Continuous Optimization.” Methodology and Computing in Applied Probability, 1: 127–190.
  • Scutari, M. (2010). “Learning Bayesian Networks with the bnlearn R Package.” Journal of Statistical Software, 35(3): 1–22.
  • — (2012). bnlearn: Bayesian Network Structure Learning. R package version 3.2. URL http://www.bnlearn.com/
  • Seber, G. A. F. (2008). A Matrix Handbook for Statisticians. Wiley.
  • Steck, H. (2008). “Learning the Bayesian Network Structure: Dirichlet Prior versus Data.” In Proceedings of the 24th Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-08), 511–518. AUAI Press.
  • Steck, H. and Jaakkola, T. (2002). “On the Dirichlet Prior and Bayesian Regularization.” In Advances in Neural Information Processing Systems (NIPS), 697–704. MIT Press.
  • Tsamardinos, I., Brown, L. E., and Aliferis, C. F. (2006). “The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm.” Machine Learning, 65(1): 31–78.
  • Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley.

See also

  • Related item: Adrian Dobra (2013). Comment on Article by Scutari. Bayesian Anal. Vol. 8, Iss. 3, 533–538.
  • Related item: Christine B. Peterson, Francesco C. Stingo (2013). Comment on Article by Scutari. Bayesian Anal. Vol. 8, Iss. 3, 539–542.
  • Related item: Hao Wang (2013). Comment on Article by Scutari. Bayesian Anal. Vol. 8, Iss. 3, 543–548.
  • Related item: Marco Scutari (2013). Rejoinder. Bayesian Anal. Vol. 8, Iss. 3, 549–552.