## Bayesian Analysis

### Learning Markov Equivalence Classes of Directed Acyclic Graphs: An Objective Bayes Approach

#### Abstract

A Markov equivalence class contains all the Directed Acyclic Graphs (DAGs) encoding the same conditional independencies, and is represented by a Completed Partially Directed Acyclic Graph (CPDAG), also named Essential Graph (EG). We approach the problem of model selection among noncausal sparse Gaussian DAGs by directly scoring EGs, using an objective Bayes method. Specifically, we construct objective priors for model selection based on the Fractional Bayes Factor, leading to a closed form expression for the marginal likelihood of an EG. Next we propose a Markov Chain Monte Carlo (MCMC) strategy to explore the space of EGs using sparsity constraints, and illustrate the performance of our method on simulation studies, as well as on a real dataset. Our method provides a coherent quantification of inferential uncertainty, requires minimal prior specification, and shows to be competitive in learning the structure of the data-generating EG when compared to alternative state-of-the-art algorithms.

#### Article information

Source
Bayesian Anal., Volume 13, Number 4 (2018), 1235-1260.

Dates
First available in Project Euclid: 15 March 2018

https://projecteuclid.org/euclid.ba/1521079250

Digital Object Identifier
doi:10.1214/18-BA1101

#### Citation

Castelletti, Federico; Consonni, Guido; Della Vedova, Marco L.; Peluso, Stefano. Learning Markov Equivalence Classes of Directed Acyclic Graphs: An Objective Bayes Approach. Bayesian Anal. 13 (2018), no. 4, 1235--1260. doi:10.1214/18-BA1101. https://projecteuclid.org/euclid.ba/1521079250

#### References

• Andersson, S. A., Madigan, D., and Perlman, M. D. (1997a). “A characterization of Markov equivalence classes for acyclic digraphs.” The Annals of Statistics, 25: 505–541.
• Andersson, S. A., Madigan, D., and Perlman, M. D. (1997b). “On the Markov equivalence of chain graphs, undirected graphs, and acyclic digraphs.” Scandinavian Journal of Statistics, 24: 81–102.
• Andersson, S. A., Madigan, D., and Perlman, M. D. (2001). “Alternative Markov properties for chain graphs.” Scandinavian Journal of Statistics, 28: 33–85.
• Barbieri, M. M. and Berger, J. O. (2004). “Optimal predictive model selection.” The Annals of Statistics, 32: 870–897.
• Bayarri, M. J., Berger, J. O., Forte, A., and García-Donato, G. (2012). “Criteria for Bayesian model choice with application to variable selection.” The Annals of Statistics, 40: 1550–1577.
• Berger, J. O., Bernardo, J. M., and Sun, D. (2009). “The formal definition of reference priors.” The Annals of Statistics, 37: 905–938.
• Berger, J. O. and Pericchi, L. R. (1996). “The intrinsic Bayes factor for model selection and prediction.” Journal of the American Statistical Association, 91: 109–122.
• Bhadra, A. and Mallick, B. K. (2013). “Joint high-dimensional Bayesian variable and covariance selection with an application to eQTL analysis.” Biometrics, 69: 447–457.
• Castelo, R. and Perlman, M. D. (2004). “Learning essential graph Markov models from data.” In Advances in Bayesian networks, volume 146 of Studies in Fuzziness and Soft Computing, 255–269. Springer, Berlin.
• Chen, J. and Chen, Z. (2008). “Extended Bayesian information criteria for model selection with large model spaces.” Biometrika, 95: 759–771.
• Chickering, D. M. (2002). “Learning equivalence classes of Bayesian-network structures.” Journal of Machine Learning Research, 2: 445–498.
• Colombo, D. and Maathuis, M. H. (2014). “Order-independent constraint-based causal structure learning.” Journal of Machine Learning Research, 15: 3921–3962.
• Consonni, G., Forster, J. J., and La Rocca, L. (2013). “The Whetstone and the Alum Block: Balanced Objective Bayesian Comparison of Nested Models for Discrete Data.” Statistical Science, 28: 398–423.
• Consonni, G. and La Rocca, L. (2012). “Objective Bayes Factors for Gaussian Directed Acyclic Graphical Models.” Scandinavian Journal of Statistics, 39: 743–756.
• Consonni, G., La Rocca, L., and Peluso, S. (2017). “Objective Bayes Covariate-Adjusted Sparse Graphical Model Selection.” Scandinavian Journal of Statistics, 44: 741–764.
• Consonni, G. and Veronese, P. (2008). “Compatibility of Prior Specifications Across Linear Models.” Statistical Science, 23: 332–353.
• Cowell, R. G., Dawid, P. A., Lauritzen, S. L., and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. New York: Springer.
• Dawid, A. P. (1981). “Some matrix-variate distribution theory: Notational considerations and a Bayesian application.” Biometrika, 68: 265–274.
• Dawid, A. P. and Lauritzen, S. L. (1993). “Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models.” The Annals of Statistics, 21: 1272–1317.
• Dor, D. and Tarsi, M. (1992). “Simple algorithm to construct a consistent extension of a partially oriented graph.” Technical Report R-185, Cognitive Systems Laboratory, UCLA.
• Drton, M. and Eichler, M. (2006). “Maximum likelihood estimation in Gaussian chain graph models under the alternative Markov property.” Scandinavian Journal of Statistics, 33: 247–257.
• Drton, M. and Perlman, M. D. (2008). “A SINful approach to Gaussian graphical model selection.” Journal of Statistical Planning and Inference, 138: 1179–1200.
• Fouskakis, D., Ntzoufras, I., and Draper, D. (2015). “Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models.” Bayesian Analysis, 10: 75–107.
• Fouskakis, D., Ntzoufras, I., and Perrakis, K. (2017). “Power-Expected-Posterior Priors for Generalized Linear Models.” Bayesian Analysis. Advance publication.
• Foygel, R. and Drton, M. (2010). “Extended Bayesian Information Criteria for Gaussian Graphical Models.” In Advances in Neural Information Processing Systems 23, 2020–2028.
• Friedman, J., Hastie, T., and Tibshirani, R. (2008). “Sparse inverse covariance estimation with the graphical lasso.” Biostatistics, 9: 432–441.
• Friedman, N. (2004). “Inferring Cellular Networks Using Probabilistic Graphical Models.” Science, 303: 799–805.
• Geiger, D. and Heckerman, D. (2002). “Parameter priors for directed acyclic graphical models and the characterization of several probability distributions.” The Annals of Statistics, 30: 1412–1440.
• Geisser, S. and Cornfield, J. (1963). “Posterior distributions for multivariate normal parameters.” Journal of the Royal Statistical Society. Series B (Methodological), 25: 368–376.
• Gillispie, S. B. and Perlman, M. D. (2002). “The size distribution for Markov equivalence classes of acyclic digraph models.” Artificial Intelligence, 141: 137–155.
• Gupta, A. K. and Nagar, D. K. (2000). Matrix variate distributions. Chapman & Hall/CRC, Boca Raton, FL.
• Hauser, A. and Bühlmann, P. (2012). “Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs.” Journal of Machine Learning Research, 13: 2409–2464.
• Hauser, A. and Bühlmann, P. (2015). “Jointly interventional and observational data: estimation of interventional Markov equivalence classes of directed acyclic graphs.” Journal of the Royal Statistical Society. Series B (Methodology), 77: 291–318.
• He, Y. and Geng, Z. (2008). “Active learning of causal networks with intervention experiments and optimal designs.” Journal of Machine Learning Research, 9: 2523–2547.
• He, Y., Jia, J., and Yu, B. (2013). “Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs.” The Annals of Statistics, 41: 1742–1779.
• Kalisch, M. and Bühlmann, P. (2007). “Estimating high-dimensional directed acyclic graphs with the PC-algorithm.” Journal of Machine Learning Research, 8: 613–36.
• Kass, R. E. and Raftery, A. E. (1995). “Bayes Factors.” Journal of the American Statistical Association, 90: 773–795.
• Koller, D. and Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press.
• Lauritzen, S. L. (1996). Graphical Models. Oxford University Press.
• Madigan, D., Andersson, S., Perlman, M., and Volinsky, C. (1996). “Bayesian Model Averaging And Model Selection For Markov Equivalence Classes Of Acyclic Digraphs.” Communications in Statistics: Theory and Methods, 2493–2519.
• Moreno, E. (1997). “Bayes Factors for Intrinsic and Fractional Priors in Nested Models. Bayesian Robustness.” In Dodge, Y. (ed.), $L_{1}$-Statistical Procedures and Related Topics, 257–270. Institute of Mathematical Statistics.
• Nagarajan, R. and Scutari, M. (2013). Bayesian Networks in R with Applications in Systems Biology. New York: Springer.
• O’Hagan, A. (1995). “Fractional Bayes Factors for Model Comparison.” Journal of the Royal Statistical Society. Series B (Methodological), 57: 99–138.
• O’Hagan, A. and Forster, J. J. (2004). Bayesian Inference. Kendall’s Advanced Theory of Statistics. Arnold, 2nd edition.
• Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge.
• Pearl, J. (2003). “Statistics and causal inference: A review.” Test, 12: 281–345.
• Peréz, J. M. and Berger, J. O. (2002). “Expected-Posterior Prior Distributions for Model Selection.” Biometrika, 89: pp. 491–511.
• Pericchi, L. R. (2005). “Model selection and hypothesis testing based on objective probabilities and Bayes factors.” In Dey, D. and Rao, C. R. (eds.), Bayesian thinking: modeling and computation, volume 25 of Handbook of Statistics, 115–149. Elsevier.
• Peters, J. and Bühlmann, P. (2014). “Identifiability of Gaussian structural equation models with equal error variances.” Biometrika, 101: 219–228.
• Peterson, C., Stingo, F. C., and Vannucci, M. (2015). “Bayesian inference of multiple Gaussian graphical models.” Journal of the American Statistical Association, 110: 159–174.
• Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D., and Nolan, G. (2005). “Causal protein-signaling networks derived from multiparameter single-cell data.” Science, 308: 523–529.
• Schwarz, G. E. (1978). “Estimating the dimension of a model.” The Annals of Statistics, 6: 461–464.
• Shojaie, A. and Michailidis, G. (2009). “Analysis of gene sets based on the underlying regulatory network.” Journal of Computational Biology, 16: 407–26.
• Sonntag, D., Peña, J. M., and Gómez-Olmedo, M. (2015). “Approximate Counting of Graphical Models via MCMC Revisited.” International Journal of Intelligent Systems, 30: 384–420.
• Spirtes, P., Glymour, C., and Scheines, R. (2000). “Causation, Prediction and Search (2nd edition).” Cambridge, MA: The MIT Press., 1–16.
• Verma, T. and Pearl, J. (1991). “Equivalence and Synthesis of Causal Models.” In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI 90, 255–270. New York, NY, USA: Elsevier Science Inc.
• Womack, A. J., León-Novelo, L., and Casella, G. (2014). “Inference From Intrinsic Bayes’ Procedures Under Model Selection and Uncertainty.” Journal of the American Statistical Association, 109: 1040–1053.