## Statistical Science

### On Russian Roulette Estimates for Bayesian Inference with Doubly-Intractable Likelihoods

#### Abstract

A large number of statistical models are “doubly-intractable”: the likelihood normalising term, which is a function of the model parameters, is intractable, as well as the marginal likelihood (model evidence). This means that standard inference techniques to sample from the posterior, such as Markov chain Monte Carlo (MCMC), cannot be used. Examples include, but are not confined to, massive Gaussian Markov random fields, autologistic models and Exponential random graph models. A number of approximate schemes based on MCMC techniques, Approximate Bayesian computation (ABC) or analytic approximations to the posterior have been suggested, and these are reviewed here. Exact MCMC schemes, which can be applied to a subset of doubly-intractable distributions, have also been developed and are described in this paper. As yet, no general method exists which can be applied to all classes of models with doubly-intractable posteriors.

In addition, taking inspiration from the Physics literature, we study an alternative method based on representing the intractable likelihood as an infinite series. Unbiased estimates of the likelihood can then be obtained by finite time stochastic truncation of the series via Russian Roulette sampling, although the estimates are not necessarily positive. Results from the Quantum Chromodynamics literature are exploited to allow the use of possibly negative estimates in a pseudo-marginal MCMC scheme such that expectations with respect to the posterior distribution are preserved. The methodology is reviewed on well-known examples such as the parameters in Ising models, the posterior for Fisher–Bingham distributions on the $d$-Sphere and a large-scale Gaussian Markov Random Field model describing the Ozone Column data. This leads to a critical assessment of the strengths and weaknesses of the methodology with pointers to ongoing research.

#### Article information

Source
Statist. Sci., Volume 30, Number 4 (2015), 443-467.

Dates
First available in Project Euclid: 9 December 2015

https://projecteuclid.org/euclid.ss/1449670853

Digital Object Identifier
doi:10.1214/15-STS523

Mathematical Reviews number (MathSciNet)
MR3432836

Zentralblatt MATH identifier
06946197

#### Citation

Lyne, Anne-Marie; Girolami, Mark; Atchadé, Yves; Strathmann, Heiko; Simpson, Daniel. On Russian Roulette Estimates for Bayesian Inference with Doubly-Intractable Likelihoods. Statist. Sci. 30 (2015), no. 4, 443--467. doi:10.1214/15-STS523. https://projecteuclid.org/euclid.ss/1449670853

#### References

• Adams, R. P., Murray, I. and MacKay, D. J. (2009). Nonparametric bayesian density modeling with gaussian processes. Preprint. Available at arXiv:0912.4896.
• Alquier, P., Friel, N., Everitt, R. and Boland, A. (2014). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Preprint. Available at arXiv:1403.5496.
• Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist. 37 697–725.
• Andrieu, C. and Vihola, M. (2014). Establishing some order amongst exact approximations of mcmcs. Preprint. Available at arXiv:1404.6909.
• Atchadé, Y. F., Lartillot, N. and Robert, C. (2013). Bayesian computation for statistical models with intractable normalizing constants. Braz. J. Probab. Stat. 27 416–436.
• Aune, E., Simpson, D. P. and Eidsvik, J. (2014). Parameter estimation in high dimensional Gaussian distributions. Stat. Comput. 24 247–263.
• Bai, Z., Fahey, M. and Golub, G. (1996). Some large-scale matrix computation problems. J. Comput. Appl. Math. 74 71–89.
• Bakeyev, T. and De Forcrand, P. (2001). Noisy Monte Carlo algorithm reexamined. Phys. Rev. D 63 54505.
• Bardenet, R., Doucet, A. and Holmes, C. (2014). Towards scaling up Markov chain Monte Carlo: An adaptive subsampling approach. In Proceedings of the 31st International Conference on Machine Learning 405–413. JMLR Workshop and Conference Proceedings.
• Beaumont, M. A. (2003). Estimation of population growth or decline in genetically monitored populations. Genetics 164 1139–1160.
• Beaumont, M. A., Zhang, W. and Balding, D. J. (2002). Approximate Bayesian computation in population genetics. Genetics 162 2025–2035.
• Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B 36 192–236.
• Besag, J. (1986). On the statistical analysis of dirty pictures. J. Roy. Statist. Soc. Ser. B 48 259–302.
• Besag, J. E. and Moran, P. A. P. (1975). On the estimation and testing of spatial interaction in Gaussian lattice processes. Biometrika 62 555–562.
• Beskos, A., Papaspiliopoulos, O., Roberts, G. O. and Fearnhead, P. (2006). Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 333–382.
• Bhanot, G. and Kennedy, A. (1985). Bosonic lattice gauge theory with noise. Phys. Lett. B 157 70–76.
• Bolin, D. and Lindgren, F. (2011). Spatial models generated by nested stochastic partial differential equations, with an application to global ozone mapping. Ann. Appl. Stat. 5 523–550.
• Booth, T. (2007). Unbiased Monte Carlo estimation of the reciprocal of an integral. Nucl. Sci. Eng. 156 403–407.
• Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models. Soc. Netw. 33 41–55.
• Carter, L. L. and Cashwell, E. D. (1975). Particle-transport simulation with the Monte Carlo method. Technical report, Los Alamos Scientific Lab., N. Mex. (USA).
• Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 209–226.
• Del Moral, P., Doucet, A. and Jasra, A. (2006). Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 411–436.
• Diggle, P. J. (1990). A point process modelling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point. J. Roy. Statist. Soc. Ser. A 349–362.
• Douc, R. and Robert, C. P. (2011). A vanilla Rao–Blackwellization of Metropolis–Hastings algorithms. Ann. Statist. 39 261–277.
• Doucet, A., Pitt, M. and Kohn, R. (2012). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Preprint. Available at arXiv:1210.1871.
• Eidsvik, J., Shaby, B. A., Reich, B. J., Wheeler, M. and Niemi, J. (2014). Estimation and prediction in spatial models with block composite likelihoods. J. Comput. Graph. Statist. 23 295–315.
• Everitt, R. G. (2012). Bayesian parameter estimation for latent Markov random fields and social networks. J. Comput. Graph. Statist. 21 940–960.
• Fearnhead, P., Papaspiliopoulos, O. and Roberts, G. O. (2008). Particle filters for partially observed diffusions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 755–777.
• Friel, N. and Pettitt, A. N. (2004). Likelihood estimation and inference for the autologistic model. J. Comput. Graph. Statist. 13 232–246.
• Friel, N., Pettitt, A. N., Reeves, R. and Wit, E. (2009). Bayesian inference in hidden Markov random fields for binary data defined on large lattices. J. Comput. Graph. Statist. 18 243–261.
• Gelman, A. and Meng, X.-L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statist. Sci. 13 163–185.
• Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (1995). Bayesian Data Analysis. Chapman & Hall, London.
• Ghaoui, L. E. and Gueye, A. (2009). A convex upper bound on the log-partition function for binary distributions. In Advances in Neural Information Processing Systems (D. Koller, D. Schuurmans, Y. Bengio and L. Bottou, eds.) 21 409–416. Neural Information Processing Systems (NIPS).
• Gilks, W. R. (1996). Markov Chain Monte Carlo in Practice. Chapman & Hall, London.
• Glynn, P. W. and Rhee, C.-H. (2014). Exact estimation for Markov chain equilibrium expectations. J. Appl. Probab. 51A 377–389.
• Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD.
• Goodreau, S. M., Kitts, J. A. and Morris, M. (2009). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography 46 103–125.
• Green, P. J. and Richardson, S. (2002). Hidden Markov models and disease mapping. J. Amer. Statist. Assoc. 97 1055–1070.
• Grelaud, A., Robert, C. P. and Marin, J.-M. (2009). ABC methods for model choice in Gibbs random fields. C. R. Math. Acad. Sci. Paris 347 205–210.
• Gu, M. G. and Zhu, H.-T. (2001). Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 339–355.
• Heikkinen, J. and Hogmander, H. (1994). Fully Bayesian approach to image restoration with an application in biogeography. Applied Statistics 43 569–582.
• Hendricks, J. and Booth, T. (1985). Mcnp variance reduction overview. In Monte-Carlo Methods and Applications in Neutronics, Photonics and Statistical Physics 83–92. Springer, Berlin.
• Hughes, J., Haran, M. and Caragea, P. C. (2011). Autologistic models for binary data on a lattice. Environmetrics 22 857–871.
• Illian, J., Sørbye, S., Rue, H. and Hendrichsen, D. (2012). Using INLA to fit a complex point process model with temporally varying effects–a case study. J. Environ. Statist. 3 1–25.
• Ising, E. (1925). Beitrag zur Theorie des Ferromagnetismus. Zeitschrift Für Physik A Hadrons and Nuclei 31 253–258.
• Jacob, P. E. and Thiery, A. H. (2013). On non-negative unbiased estimators. Preprint. Available at arXiv:1309.6473.
• Jin, I. H. and Liang, F. (2014). Use of SAMC for Bayesian analysis of statistical models with intractable normalizing constants. Comput. Statist. Data Anal. 71 402–416.
• Joo, B., Horvath, I. and Liu, K. (2003). The Kentucky noisy Monte Carlo algorithm for Wilson dynamical fermions. Phys. Rev. D 67 074505.
• Jun, M. and Stein, M. L. (2008). Nonstationary covariance models for global data. Ann. Appl. Stat. 2 1271–1289.
• Kendall, W. S. (2005). Notes on perfect simulation. Markov Chain Monte Carlo: Innovations and Applications 7. World Scientific, Singapore.
• Kennedy, A. and Kuti, J. (1985). Noise without noise: A new Monte Carlo method. Phys. Rev. Lett. 54 2473–2476.
• Kent, J. T. (1982). The Fisher–Bingham distribution on the sphere. J. Roy. Statist. Soc. Ser. B 44 71–80.
• Korattikara, A., Chen, Y. and Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis–Hastings budget. In Proceedings of the 31st International Conference on Machine Learning 181–189. JMLR Workshop and Conference Proceedings.
• Liang, F. (2010). A double Metropolis–Hastings sampler for spatial models with intractable normalizing constants. J. Stat. Comput. Simul. 80 1007–1022.
• Liang, F., Liu, C. and Carroll, R. J. (2007). Stochastic approximation in Monte Carlo computation. J. Amer. Statist. Assoc. 102 305–320.
• Liechty, M. W., Liechty, J. C. and Müller, P. (2009). The shadow prior. J. Comput. Graph. Statist. 18 368–383.
• Lin, L., Liu, K. and Sloan, J. (2000). A noisy Monte Carlo algorithm. Phys. Rev. D 61 074505.
• Lindgren, F., Rue, H. and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 423–498.
• Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, New York.
• Lux, I. and Koblinger, L. (1991). Monte Carlo Particle Transport Methods: Neutron and Photon Calculations, Vol. 102. CRC press, Boca Raton.
• MacKay, D. J. C. (2003). Information Theory, Inference and Learning Algorithms. Cambridge Univ. Press, New York.
• Marin, J.-M., Pudlo, P., Robert, C. P. and Ryder, R. J. (2012). Approximate Bayesian computational methods. Stat. Comput. 22 1167–1180.
• McLeish, D. (2011). A general method for debiasing a Monte Carlo estimator. Monte Carlo Methods Appl. 17 301–315.
• Møller, J. and Waagepetersen, R. P. (2004). Statistical Inference and Simulation for Spatial Point Processes. Chapman & Hall/CRC, Boca Raton, FL.
• Møller, J., Pettitt, A. N., Reeves, R. and Berthelsen, K. K. (2006). An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93 451–458.
• Moores, M. T., Mengersen, K., Robert, C. P. (2014). Pre-processing for approximate bayesian computation in image analysis. Preprint. Available at arXiv:1403.4359.
• Murray, I., Ghahramani, Z. and MacKay, D. (2006). MCMC for doubly-intractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06) 359–366. AUAI Press, Arlington, VI.
• Neal, R. M. (2001). Annealed importance sampling. Stat. Comput. 11 125–139.
• Papaspiliopoulos, O. (2011). Monte Carlo probabilistic inference for diffusion processes: A methodological framework. In Bayesian Time Series Models 82–103. Cambridge Univ. Press, Cambridge.
• Propp, J. G. and Wilson, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures Algorithms 9 223–252.
• Rhee, C.-H. and Glynn, P. W. (2012). A new approach to unbiased estimation for SDE’s. In Proceedings of the Winter Simulation Conference, WSC’12, Berlin, Germany 17:117:7. Winter Simulation Conference.
• Robert, C. P. and Casella, G. (2010). Introducing Monte Carlo Methods with R. Springer, New York.
• Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. Chapman & Hall/CRC, Boca Raton, FL.
• Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 319–392.
• Schrödle, B. and Held, L. (2011). Spatio-temporal disease mapping using INLA. Environmetrics 22 725–734.
• Shaby, B. A. (2014). The open-faced sandwich adjustment for MCMC using estimating functions. J. Comput. Graph. Statist. 23 853–876.
• Sherlock, C., Thiery, A. H., Roberts, G. O. and Rosenthal, J. S. (2015). On the efficiency of pseudo-marginal random walk Metropolis algorithms. Ann. Statist. 43 238–275.
• Silvertown, J. and Antonovics, J. (2001). Integrating Ecology and Evolution in a Spatial Context: 14th Special Symposium of the British Ecological Society 14. Cambridge Univ. Press, Cambridge.
• Tavaré, S., Balding, D. J., Griffiths, R. C. and Donnelly, P. (1997). Inferring coalescence times from dna sequence data. Genetics 145 505–518.
• Taylor, B. M. and Diggle, P. J. (2014). INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes. J. Stat. Comput. Simul. 84 2266–2284.
• Troyer, M. and Wiese, U.-J. (2005). Computational complexity and fundamental limitations to fermionic quantum Monte Carlo simulations. Phys. Rev. Lett. 94 170201.
• Van Duijn, M. A., Gile, K. J. and Handcock, M. S. (2009). A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models. Social Networks 31 52–62.
• Walker, S. G. (2011). Posterior sampling when the normalizing constant is unknown. Comm. Statist. Simulation Comput. 40 784–792.
• Walker, S. G. (2014). A Bayesian analysis of the Bingham distribution. Braz. J. Probab. Stat. 28 61–72.
• Wang, F. and Landau, D. P. (2001). Efficient, multiple-range random walk algorithm to calculate the density of states. Phys. Rev. Lett. 86 2050.
• Welling, M. and Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning 681–688. Omnipress, Madison, WI.
• Zhang, Y., Ghahramani, Z., Storkey, A. J. and Sutton, C. A. (2012). Continuous relaxations for discrete Hamiltonian Monte Carlo. In Advances in Neural Information Processing Systems 4 3194–3202.
• Zhou, X. and Schmidler, S. (2009). Bayesian parameter estimation in Ising and Potts models: A comparative study with applications to protein modeling. Technical report.