Bayesian Analysis

Marginal Pseudo-Likelihood Learning of Discrete Markov Network Structures

Johan Pensar, Henrik Nyman, Juha Niiranen, and Jukka Corander

Full-text: Open access


Markov networks are a popular tool for modeling multivariate distributions over a set of discrete variables. The core of the Markov network representation is an undirected graph which elegantly captures the dependence structure over the variables. Traditionally, the Bayesian approach of learning the graph structure from data has been done under the assumption of chordality since non-chordal graphs are difficult to evaluate for likelihood-based scores. Recently, there has been a surge of interest towards the use of regularized pseudo-likelihood methods as such approaches can avoid the assumption of chordality. Many of the currently available methods necessitate the use of a tuning parameter to adapt the level of regularization for a particular dataset. Here we introduce the marginal pseudo-likelihood which has a built-in regularization through marginalization over the graph-specific nuisance parameters. We prove consistency of the resulting graph estimator via comparison with the pseudo-Bayesian information criterion. To identify high-scoring graph structures in a high-dimensional setting we design a two-step algorithm that exploits the decomposable structure of the score. Using synthetic and existing benchmark networks, the marginal pseudo-likelihood method is shown to perform favorably against recent popular structure learning methods.

Article information

Bayesian Anal. Volume 12, Number 4 (2017), 1195-1215.

First available in Project Euclid: 31 October 2016

Permanent link to this document

Digital Object Identifier

Markov networks structure learning pseudo-likelihood non-chordal graph Bayesian inference regularization

Creative Commons Attribution 4.0 International License.


Pensar, Johan; Nyman, Henrik; Niiranen, Juha; Corander, Jukka. Marginal Pseudo-Likelihood Learning of Discrete Markov Network Structures. Bayesian Anal. 12 (2017), no. 4, 1195--1215. doi:10.1214/16-BA1032.

Export citation


  • Abellán, J., Gómez-Olmedo, M., and Moral, S. (2006). “Some variations on the PC algorithm.” In Proceedings of the 3rd European Workshop on Probabilistic Graphical Models, 1–8.
  • Akaike, H. (1974). “A new look at the statistical model identification.” IEEE Transactions on Automatic Control, 19: 716–723.
  • Anandkumar, A., Tan, V. Y. F., Huang, F., and Willsky, A. S. (2012). “High-dimensional structure estimation in Ising models: Local separation criterion.” The Annals of Statistics, 40: 1346–1375.
  • Aurell, E. and Ekeberg, M. (2012). “Inverse Ising inference using all the data.” Physical Review Letters, 108: 090201.
  • Barber, R. F. and Drton, M. (2015). “High-dimensional Ising model selection with Bayesian information criteria.” Electronic Journal of Statistics, 9(1): 567–607.
  • Bartlett, M. and Cussens, J. (2013). “Advances in Bayesian Network Learning using Integer Programming.” In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, 182–191.
  • Berg, J., Järvisalo, M., and Malone, B. (2014). “Learning optimal bounded treewidth Bayesian networks via maximum satisfiability.” In Proceedings of the 17th Conference on Artificial Intelligence and Statistics, 86–95.
  • Besag, J. (1975). “Statistical analysis of non-lattice data.” Journal of the Royal Statistical Society. Series D (The Statistician), 24: 179–195.
  • Bromberg, F., Margaritis, D., and Honavar, V. (2009). “Efficient Markov network structure discovery using independence tests.” Journal of Artificial Intelligence Research, 35: 449–485.
  • Chow, C. and Liu, C. (1968). “Approximating discrete probability distributions with dependence trees.” IEEE Transactions on Information Theory, 14(3): 462–467.
  • Corander, J., Janhunen, T., Rintanen, J., Nyman, H., and Pensar, J. (2013). “Learning chordal Markov networks by constraint satisfaction.” In Advances in Neural Information Processing Systems 26, 1349–1357.
  • Csiszár, I. and Talata, Z. (2006). “Consistent estimation of the basic neighborhood of Markov random fields.” Annals of Statistics, 34: 123–145.
  • Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M., and Aurell, E. (2013). “Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models.” Physical Review E, 87: 012707.
  • Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., and Kadie, C. (2000). “Dependency networks for inference, collaborative filtering, and data visualization.” Journal of Machine Learning Research, 1: 49–75.
  • Heckerman, D., Geiger, D., and Chickering, D. M. (1995). “Learning Bayesian networks: The combination of knowledge and statistical data.” Machine Learning, 20: 197–243.
  • Höfling, H. and Tibshirani, R. (2009). “Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods.” Journal of Machine Learning Research, 10: 883–906.
  • Ji, C. and Seymour, L. (1996). “A consistent model selection procedure for Markov random fields based on penalized pseudolikelihood.” Annals of Applied Probability, 6: 423–443.
  • Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford: Oxford University Press.
  • Lee, S.-I., Ganapathi, V., and Koller, D. (2006). “Efficient structure learning of Markov networks using $\ell_{1}$-regularization.” In Advances in Neural Information Processing Systems 19, 817–824.
  • Liu, Q. and Ihler, A. T. (2012). “Distributed parameter estimation via pseudo-likelihood.” In Proceedings of the 29th International Conference on Machine Learning, 1487–1494.
  • Lowd, D. and Davis, J. (2014). “Improving Markov network structure learning using decision trees.” Journal of Machine Learning Research, 15: 501–532.
  • Meinshausen, N. and Bühlmann, P. (2006). “High-dimensional graphs and variable selection with the lasso.” The Annals of Statistics, 34(3): 1436–1462.
  • Mizrahi, Y. D., Denil, M., and de Freitas, N. (2014). “Linear and parallel learning of Markov random fields.” In Proceedings of the 31st International Conference on Machine Learning, 199–207.
  • Murphy, K. P. (2001). “The Bayes net toolbox for MATLAB.” Computing Science and Statistics, 33: 1024–1034.
  • Nyman, H., Pensar, J., Koski, T., and Corander, J. (2014). “Stratified graphical models – context-specific independence in graphical models.” Bayesian Analysis, 9(4): 883–908.
  • Parviainen, P., Farahani, H., and Lagergren, J. (2014). “Learning bounded tree-width Bayesian networks using integer linear programming.” In Proceedings of the 17th Conference on Artificial Intelligence and Statistics, 751–759.
  • Pensar, J., Nyman, H., Niiranen, J., and Corander, J. (2016). “Supplementary appendix to “Marginal pseudo-likelihood learning of discrete Markov network structures”.” Bayesian Analysis.
  • Pensar, J., Nyman, H., Koski, T., and Corander, J. (2015). “Labeled directed acyclic graphs: A generalization of context-specific independence in directed graphical models.” Data Mining and Knowledge Discovery, 29(2): 503–533.
  • Pietra, S. D., Pietra, V. D., and Lafferty, J. (1997). “Inducing features of random fields.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 19: 380–393.
  • Ravikumar, P., Wainwright, M. J., and Lafferty, J. D. (2010). “High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression.” Annals of Statistics, 38: 1287–1319.
  • Schmidt, M. (2010). “L1General.”
  • Schwarz, G. (1978). “Estimating the dimension of a model.” Annals of Statistics, 6: 461–464.
  • Scutari, M. (2010). “Learning Bayesian networks with the bnlearn R package.” Journal of Statistical Software, 35(3): 1–22.
  • Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, Prediction, and Search. MIT Press, 2nd edition.
  • Tsamardinos, I., Aliferis, C., Statnikov, A., and Statnikov, E. (2003). “Algorithms for large scale Markov blanket discovery.” In The 16th International FLAIRS Conference, 376–380.
  • Tsamardinos, I., Brown, L. E., and Aliferis, C. F. (2006). “The max–min hill-climbing Bayesian network structure learning algorithm.” Machine Learning, 65: 31–78.
  • Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Chichester: Wiley.

Supplemental materials

  • Appendix: Supplementary Appendix to “Marginal Pseudo-Likelihood Learning of Discrete Markov Network Structures”. The appendix contains a proof of the consistency theorem, pseudocode of the search algorithms, and detailed results from the numerical experiments.