Bayesian Analysis

Designing Simple and Efficient Markov Chain Monte Carlo Proposal Kernels

Abstract

We discuss a few principles to guide the design of efficient Metropolis–Hastings proposals for well-behaved target distributions without deeply divided modes. We illustrate them by developing and evaluating novel proposal kernels using a variety of target distributions. Here, efficiency is measured by the variance ratio relative to the independent sampler. The first principle is to introduce negative correlation in the MCMC sample or to reduce positive correlation: to propose something new, propose something different. This explains why single-moded proposals such as the Gaussian random-walk is poorer than the uniform random walk, which is in turn poorer than the bimodal proposals that avoid values very close to the current value. We evaluate three new bimodal proposals called Box, Airplane and StrawHat, and find that they have similar performance to the earlier Bactrian kernels, suggesting that the general shape of the proposal matters, but not the specific distributional form. We propose the “Mirror” kernel, which generates new values around the mirror image of the current value on the other side of the target distribution (effectively the “opposite” of the current value). This introduces negative correlations, leading in many cases to efficiency of $\gt 100\%$. The second principle, applicable to multidimensional targets, is that a sequence of well-designed one-dimensional proposals can be more efficient than a single $d$-dimensional proposal. Thirdly, we suggest that variable transformation be explored as a general strategy for designing efficient MCMC kernels. We apply these principles to a high-dimensional Gaussian target with strong correlations, a logistic regression problem and a molecular clock dating problem to illustrate their practical utility.

Article information

Source
Bayesian Anal., Volume 13, Number 4 (2018), 1037-1063.

Dates
First available in Project Euclid: 10 November 2017

https://projecteuclid.org/euclid.ba/1510282998

Digital Object Identifier
doi:10.1214/17-BA1084

Citation

Thawornwattana, Yuttapong; Dalquen, Daniel; Yang, Ziheng. Designing Simple and Efficient Markov Chain Monte Carlo Proposal Kernels. Bayesian Anal. 13 (2018), no. 4, 1037--1063. doi:10.1214/17-BA1084. https://projecteuclid.org/euclid.ba/1510282998

References

• Adler, S. L. (1981). “Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions.” Physical Review. D. Particles and Fields, 23: 2901–2904.
• Barone, P. and Frigessi, A. (1990). “Improving stochastic relaxation for Gaussian random fields.” Probability in the Engineering and Informational Sciences, 4(3): 369–389.
• Bédard, M., Douc, R., and Moulines, E. (2012). “Scaling analysis of multiple-try MCMC methods.” Stochastic Processes and Their Applications, 122(3): 758–786.
• Bédard, M., Douc, R., and Moulines, E. (2014). “Scaling analysis of delayed rejection MCMC methods.” Methodology and Computing in Applied Probability, 16(4): 811–838.
• Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J.-M., and Stuart, A. (2013). “Optimal tuning of the hybrid Monte Carlo algorithm.” Bernoulli, 19(5A): 1501–1534.
• Beskos, A., Roberts, G., and Stuart, A. (2009). “Optimal scalings for local Metropolis-Hastings chains on nonproduct targets in high dimensions.” The Annals of Applied Probability, 19(3): 863–898.
• Browne, W. J. (2006). “MCMC algorithms for constrained variance matrices.” Computational Statistics & Data Analysis, 50(7): 1655–1677.
• Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., and Riddell, A. (2017). “Stan: a probabilistic programming language.” Journal of Statistical Software, 76(1): 1–32.
• Devroye, L. (1986). Non-uniform random variate generation. Springer-Verlag, New York.
• Frigessi, A., Gåsemyr, J., and Rue, H. (2000). “Antithetic coupling of two Gibbs sampler chains.” Annals of Statistics, 28(4): 1128–1149.
• Gelfand, A. E., Smith, A. F. M., and Lee, T.-M. (1992). “Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling.” Journal of the American Statistical Association, 87(418): 523–532.
• Gelman, A., Roberts, G. O., and Gilks, W. R. (1996). “Efficient Metropolis jumping rules.” In Bayesian statistics, 5, 599–607. Oxford Univ. Press, New York.
• Geyer, C. J. (1992). “Practical Markov chain Monte Carlo.” Statistical Science, 7(4): 473–483.
• Girolami, M. and Calderhead, B. (2011). “Riemann manifold Langevin and Hamiltonian Monte Carlo methods.” Journal of the Royal Statistical Society. Series B, Statistical Methodology, 73(2): 123–214. With discussion and a reply by the authors.
• Hammersley, J. M. and Morton, K. W. (1956). “A new Monte Carlo technique: antithetic variates.” Mathematical Proceedings of the Cambridge Philosophical Society, 52: 449–475.
• Hastings, W. K. (1970). “Monte Carlo sampling methods using Markov chains and their applications.” Biometrika, 57(1): 97–109.
• Hoffman, M. D. and Gelman, A. (2014). “The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15: 1593–1623.
• Horai, S., Hayasaka, K., Kondo, R., Tsugane, K., and Takahata, N. (1995). “Recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs.” Proceedings of the National Academy of Sciences of the United States of America, 92(2): 532–536.
• Jukes, T. H. and Cantor, C. R. (1969). “Evolution of protein molecules.” In Munro, H. N. (ed.), Mammalian Protein Metabolism, volume 3, 21–132. Academic Press, New York.
• Kemeny, J. G. and Snell, J. L. (1960). Finite Markov chains. D. Van Nostrand Co., Inc.
• Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). “Equation of state calculations by fast computing machines.” The Journal of Chemical Physics, 21(6): 1087–1092.
• Mira, A. (2001). “Efficiency of finite state space Monte Carlo Markov chains.” Statistics & Probability Letters, 54(4): 405–411.
• Pasarica, C. and Gelman, A. (2010). “Adaptively scaling the Metropolis algorithm using expected squared jumped distance.” Statist. Sinica, 20(1): 343–364.
• Peskun, P. H. (1973). “Optimum Monte-Carlo sampling using Markov chains.” Biometrika, 60: 607–612.
• Pillai, N. S., Stuart, A. M., and Thiéry, A. H. (2012). “Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions.” The Annals of Applied Probability, 22(6): 2320–2356.
• Roberts, G. O., Gelman, A., and Gilks, W. R. (1997). “Weak convergence and optimal scaling of random walk Metropolis algorithms.” The Annals of Applied Probability, 7(1): 110–120.
• Roberts, G. O. and Rosenthal, J. S. (1998). “Optimal scaling of discrete approximations to Langevin diffusions.” Journal of the Royal Statistical Society. Series B, Statistical Methodology, 60(1): 255–268.
• Sherlock, C. and Roberts, G. (2009). “Optimal scaling of the random walk Metropolis on elliptically symmetric unimodal targets.” Bernoulli, 15(3): 774–798.
• Thawornwattana, Y., Dalquen, D., and Yang, Z. (2017). “Supplementary Material of the “Designing Simple and Efficient Markov Chain Monte Carlo Proposal Kernels”.” Bayesian Analysis.
• Tierney, L. (1994). “Markov chains for exploring posterior distributions.” Annals of Statistics, 22(4): 1701–1762.
• Tierney, L. (1998). “A note on Metropolis-Hastings kernels for general state spaces.” The Annals of Applied Probability, 8(1): 1–9.
• Wang, Z., Mohamed, S., and de Freitas, N. (2013). “Adaptive Hamiltonian and Riemann Manifold Monte Carlo.” In Proceedings of the 30th International Conference on Machine Learning (ICML), 1462–1470.
• Yang, Z. and Rodríguez, C. E. (2013). “Searching for efficient Markov chain Monte Carlo proposal kernels.” Proceedings of the National Academy of Sciences of the United States of America, 110(48): 19307–19312.

Supplemental materials

• Supplementary Material of “Designing Simple and Efficient Markov Chain Monte Carlo Proposal Kernels”. (I) Efficiency curves for Box, Airplane and StrawHat kernels for a range of a values. (II) Two-dimensional Gaussian target example. (III) MCMC algorithms for the phylogenetic problem. (IV) Effect of μ∗ on efficiency.