The Annals of Statistics

Margins of discrete Bayesian networks

Robin J. Evans

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Bayesian network models with latent variables are widely used in statistics and machine learning. In this paper, we provide a complete algebraic characterization of these models when the observed variables are discrete and no assumption is made about the state-space of the latent variables. We show that it is algebraically equivalent to the so-called nested Markov model, meaning that the two are the same up to inequality constraints on the joint probabilities. In particular, these two models have the same dimension, differing only by inequality constraints for which there is no general description. The nested Markov model is therefore the closest possible description of the latent variable model that avoids consideration of inequalities. A consequence of this is that the constraint finding algorithm of Tian and Pearl [In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (2002) 519–527] is complete for finding equality constraints.

Latent variable models suffer from difficulties of unidentifiable parameters and nonregular asymptotics; in contrast the nested Markov model is fully identifiable, represents a curved exponential family of known dimension, and can easily be fitted using an explicit parameterization.

Article information

Ann. Statist., Volume 46, Number 6A (2018), 2623-2656.

Received: January 2017
Revised: August 2017
First available in Project Euclid: 7 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H99: None of the above, but in this section 62F12: Asymptotic properties of estimators

Algebraic statistics Bayesian network latent variable model nested Markov model Verma constraint


Evans, Robin J. Margins of discrete Bayesian networks. Ann. Statist. 46 (2018), no. 6A, 2623--2656. doi:10.1214/17-AOS1631.

Export citation


  • Čencov, N. N. (1982). Statistical Decision Rules and Optimal Inference. Translations of Mathematical Monographs 53. Amer. Math. Soc., Providence, RI. Translation from the Russian edited by Lev J. Leifman.
  • Allman, E. S., Matias, C. and Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099–3132.
  • Anandkumar, A., Hsu, D., Javanmard, A. and Kakade, S. (2013). Learning linear Bayesian networks with latent variables. In Proceedings of the 30th International Conference on Machine Learning 28 249–257.
  • Basu, S., Pollack, R. and Roy, M.-F. (1996). Algorithms in Real Algebraic Geometry, Springer, Berlin.
  • Bishop, C. M. (2007). Pattern Recognition and Machine Learning. Springer, Berlin.
  • Cox, D., Little, J. and O’Shea, D. (2007). Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, 3rd ed. Springer, New York.
  • Darwiche, A. (2009). Modeling and Reasoning with Bayesian Networks. Cambridge Univ. Press, Cambridge.
  • Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. Int. Stat. Rev. 70 161–189.
  • Drton, M. (2009). Likelihood ratio tests and singularities. Ann. Statist. 37 979–1012.
  • Evans, R. J. (2016). Graphs for margins of Bayesian networks. Scand. J. Stat. 43 625–648.
  • Evans, R. J. (2018). Supplement to “Margins of discrete Bayesian networks.” DOI:10.1214/17-AOS1631SUPP.
  • Evans, R. J. and Richardson, T. S. (2010). Maximum likelihood fitting of acyclic directed mixed graphs to binary data. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence 177–184.
  • Evans, R. J. and Richardson, T. S. (2014). Markovian acyclic directed mixed graphs for discrete data. Ann. Statist. 42 1452–1482.
  • Evans, R. J. and Richardson, T. S. (2018). Smooth identifiable supermodels of discrete DAG models with latent variables. Bernoulli. To appear. Available at arXiv:1511.06813.
  • Fox, C. J., Käufl, A. and Drton, M. (2015). On the causal interpretation of acyclic mixed graphs under multivariate normality. Linear Algebra Appl. 473 93–113.
  • Garcia, L. D., Stillman, M. and Sturmfels, B. (2005). Algebraic geometry of Bayesian networks. J. Symbolic Comput. 39 331–355.
  • Gill, R. D. (2014). Statistics, causality and Bell’s theorem. Statist. Sci. 29 512–528.
  • Henson, J., Lal, R. and Pusey, M. F. (2014). Theory-independent limits on correlations from generalized Bayesian networks. New J. Phys. 16 113043.
  • Kass, R. E. and Vos, P. W. (1997). Geometrical Foundations of Asymptotic Inference. Wiley, New York.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
  • Neyman, J. (1923). Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Rocz. Nauk Rol. 10 1–51. In Polish; English translation by D. Dabrowska and T. Speed in Statist. Sci. 5 463–472 (1990).
  • Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge.
  • Provan, J. S. and Billera, L. J. (1980). Decompositions of simplicial complexes related to diameters of convex polyhedra. Math. Oper. Res. 5 576–594.
  • Richardson, T. S. (2003). Markov properties for acyclic directed mixed graphs. Scand. J. Stat. 30 145–157.
  • Richardson, T. S., Evans, R. J. and Robins, J. M. (2011). Transparent parametrizations of models for potential outcomes. In Bayesian Statistics 9 569–610. Oxford Univ. Press, Oxford.
  • Richardson, T. S. and Spirtes, P. (2002). Ancestral graph Markov models. Ann. Statist. 30 962–1030.
  • Richardson, T. S., Evans, R. J., Robins, J. M. and Shpitser, I. (2017). Nested Markov properties for acyclic directed mixed graphs. Available at arXiv:1701.06686.
  • Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math. Model. 7 1393–1512.
  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66 688–701.
  • Shpitser, I., Evans, R. J., Richardson, T. S. and Robins, J. M. (2013). Sparse nested Markov models with log-linear parameters. In 29th Conference on Uncertainty in Artificial Intelligence 576–585.
  • Silva, R. and Ghahramani, Z. (2009). The hidden life of latent variables: Bayesian learning with mixed graph models. J. Mach. Learn. Res. 10 1187–1238.
  • Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press, Cambridge, MA.
  • Tian, J. and Pearl, J. (2002). On the testable implications of causal models with hidden variables. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence 519–527.
  • ver Steeg, G. and Galstyan, A. (2011). A sequence of relaxations constraining hidden variable models. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence 717–726.
  • Verma, T. S. and Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence 255–268.

Supplemental materials

  • Supplement to “Margins of discrete Bayesian networks”. Technical proofs and some additional examples are contained in the supplement.