## Brazilian Journal of Probability and Statistics

### A brief review of optimal scaling of the main MCMC approaches and optimal scaling of additive TMCMC under non-regular cases

#### Abstract

Transformation based Markov Chain Monte Carlo (TMCMC) was proposed by Dutta and Bhattacharya (Statistical Methodology 16 (2014) 100–116) as an efficient alternative to the Metropolis–Hastings algorithm, especially in high dimensions. The main advantage of this algorithm is that it simultaneously updates all components of a high dimensional parameter using appropriate move types defined by deterministic transformation of a single random variable. This results in reduction in time complexity at each step of the chain and enhances the acceptance rate.

In this paper, we first provide a brief review of the optimal scaling theory for various existing MCMC approaches, comparing and contrasting them with the corresponding TMCMC approaches.The optimal scaling of the simplest form of TMCMC, namely additive TMCMC, has been studied extensively for the Gaussian proposal density in Dey and Bhattacharya (2017a). Here, we discuss diffusion-based optimal scaling behavior of additive TMCMC for non-Gaussian proposal densities—in particular, uniform, Student’s $t$ and Cauchy proposals. Although we could not formally prove our diffusion result for the Cauchy proposal, simulation based results lead us to conjecture that at least the recipe for obtaining general optimal scaling and optimal acceptance rate holds for the Cauchy case as well. We also consider diffusion based optimal scaling of TMCMC when the target density is discontinuous. Such non-regular situations have been studied in the case of Random Walk Metropolis Hastings (RWMH) algorithm by Neal and Roberts (Methodology and Computing in Applied Probability 13 (2011) 583–601) using expected squared jumping distance (ESJD), but the diffusion theory based scaling has not been considered.

We compare our diffusion based optimally scaled TMCMC approach with the ESJD based optimally scaled RWM with simulation studies involving several target distributions and proposal distributions including the challenging Cauchy proposal case, showing that additive TMCMC outperforms RWMH in almost all cases considered.

#### Article information

Source
Braz. J. Probab. Stat., Volume 33, Number 2 (2019), 222-266.

Dates
Accepted: November 2017
First available in Project Euclid: 4 March 2019

https://projecteuclid.org/euclid.bjps/1551690033

Digital Object Identifier
doi:10.1214/17-BJPS386

#### Citation

Dey, Kushal K.; Bhattacharya, Sourabh. A brief review of optimal scaling of the main MCMC approaches and optimal scaling of additive TMCMC under non-regular cases. Braz. J. Probab. Stat. 33 (2019), no. 2, 222--266. doi:10.1214/17-BJPS386. https://projecteuclid.org/euclid.bjps/1551690033

#### References

• Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive MCMC. Statistics and Computing 18, 343–373.
• Atchade, Y. F., Roberts, G. O. and Rosenthal, J. S. (2010). Towards optimal scaling of Metropolis-coupled Markov chain Monte Carlo. Statistics and Computing 21, 555–568.
• Atchade, Y. F. and Rosenthal, J. S. (2005). On adaptive Markov chain Monte Carlo algorithm. Bernoulli 11, 815–828.
• Bedard, M. (2006). On the robustness of optimal scaling for random walk Metropolis algorithms. Doctoral thesis, Graduate Department of Statistics, University of Toronto.
• Bedard, M. (2007). Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. The Annals of Applied Probability 17, 1222–1244.
• Bedard, M. (2008). Optimal acceptance rates for Metropolis algorithms: Moving beyond 0.234. Stochastic Processes and Their Applications 118, 2198–2222.
• Bedard, M. (2009). On the optimal scaling problem of Metropolis algorithms for hierarchical target distributions. Preprint.
• Bédard, M., Douc, R. and Moulines, E. (2012). Scaling analysis of multiple-try MCMC methods. Stochastic Processes and Their Applications 122, 758–786.
• Bédard, M., Douc, R. and Moulines, E. (2014). Scaling analysis of delayed rejection MCMC methods. Methodology and Computing in Applied Probability 16, 811–838.
• Bedard, M. and Rosenthal, J. S. (2008). Optimal scaling of Metropolis algorithms: Heading toward general target distributions. Canadian Journal of Statistics 36, 483–503.
• Besag, J. (1994). Discussion: Markov chains for exploring posterior distributions. The Annals of Statistics 22, 1734–1741.
• Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J.-M. and Stuart, A. (2013). Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli 19, 1501–1534.
• Cheung, S. H. and Beck, J. L. (2009). Bayesian model updating using hybrid Monte Carlo simulation with application to structural dynamic models with many uncertain parameters. Journal of Engineering Mechanics 135, 243–255.
• Craiu, R. V., Rosenthal, J. S. and Yang, C. (2009). Learn from thy neighbor: Parallel chain adaptive MCMC. Journal of the American Statistical Association 488, 1454–1466.
• Das, M. and Bhattacharya, S. (2017). Transdimensional transformation based Markov chain Monte Carlo. Available at arXiv:1403.5207.
• Dey, K. K. (2013). On ergodic behaviour of additive transformation based Markov chain Monte Carlo. Master’s dissertation, Indian Statistical Institute.
• Dey, K. K. (2017). Optimal spacing in randomized Metropolis coupled Markov chain Monte Carlo. Technical report.
• Dey, K. K. and Bhattacharya, S. (2016). On geometric ergodicity of additive and multiplicative transformation based Markov chain Monte Carlo in high dimensions. Brazilian Journal of Probability and Statistics. 30, 570–613. Also available at arXiv:1312.0915v2.
• Dey, K. K. and Bhattacharya, S. (2017). A brief tutorial on transformation based Markov chain Monte Carlo and optimal scaling of the additive transformation. Brazilian Journal of Probability and Statistics. 31, 569–617. Also available at arXiv:1307.1446.
• Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters, Section B 195, 216–222.
• Dutta, S. and Bhattacharya, S. (2014). Markov chain Monte Carlo based on deterministic transformations. Statistical Methodology 16, 100–116. Also available at arXiv:1106.5850. Supplement available at arXiv:1306.6684.
• Frenkel, D. and Smit, B. (2002). Understanding Molecular Simulations. New York: Academic Press.
• Green, P. J. and Mira, A. (2001). Delayed rejection in reversible jump Metropolis–Hastings. Biometrika 88, 1035–1053.
• Grenander, U. and Miller, M. (1994). Representations of knowledge in complex systems. Journal of the Royal Statistical Society, Series B 56, 549–603.
• Haario, H., Laine, M., Mira, A. and Saksman, E. (2006). DRAM: Efficient adaptive MCMC. Statistics and Computing 16, 339–354.
• Haario, H., Saksman, E. and Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli 7, 223–242.
• Haario, H., Saksman, E. and Tamminen, J. (2005). Componentwise adaptation for high dimensional MCMC. Computational Statistics 20, 265–274.
• Harkness, M. A. and Green, P. J. (2000). Parallel chains, delayed rejection and reversible jump MCMC for object recognition. In British Machine Vision Conference.
• Hockney, R. W. (1970). The potential calculation and some applications. Methods in Computational Physics 9, 136–211.
• Jarner, S. F. and Hansen, E. (2000). Geometric ergodicity of Metropolis algorithms. Stochastic Processes and Their Applications 85, 341–361.
• Kennedy, A. D. and Pendleton, B. J. (1991). Acceptances and autocorrelations in hybrid Monte Carlo. Nuclear Physics B 20, 118–121.
• Khamaru, K. (2016). Randomized transdimensional transformation coupled Markov chain Monte Carlo. Master’s dissertation, Indian Statistical Institute.
• Liang, F., Liu, C. and Caroll, R. (2010). Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples. New York: Wiley.
• Liu, J. (2001). Monte Carlo Strategies in Scientific Computing. New York: Springer.
• Liu, J. S. and Sabatti, S. (2000). Generalized Gibbs sampler and multigrid Monte Carlo for Bayesian computation. Biometrika 87, 353–369.
• Mackenzie, P. (1989). An improved hybrid Monte Carlo method. Physics Letters, Section B 2263, 369–371.
• Martino, L. and Read, J. (2013). On the flexibility of the design of multiple try Metropolis schemes. Computational Statistics 28, 2797–2823.
• Mattingly, J. C., Pillai, N. S. and Stuart, A. M. (2011). Diffusion limits of the random walk Metropolis algorithm in high dimensions. The Annals of Applied Probability 22, 881–930.
• Metropolis, N., Rosenbluth, A., Rosenbluth, R., Teller, A. and Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics 21, 1087–1092.
• Mira, A. (2001). On Metropolis–Hastings algorithms with delayed rejection. Metron LIX, 231–241.
• Neal, P. and Roberts, G. O. (2006). Optimal scaling for partially updating MCMC algorithms. The Annals of Applied Probability 16, 475–515.
• Neal, P. and Roberts, G. O. (2011). Optimal scaling of random walk Metropolis algorithms with non-Gaussian proposals. Methodology and Computing in Applied Probability 13, 583–601.
• Neal, R. (2003). Slice sampling. The Annals of Statistics 31, 705–767 (with discussion).
• Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. L. Jones and X.-L. Meng, eds.) 113–162. New York: Chapman & Hall.
• Philips, D. and Smith, A. (1996). Bayesian model comparison via jump diffusions. In Markov Chain Monte Carlo in Practice (W. Gilks, S. Richardson and D. Spiegelhalter, eds.) 215–240. New York: Chapman & Hall.
• Pillai, N. S., Stuart, A. M. and Thiéry, A. H. (2012). Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions. The Annals of Applied Probability 22, 2320–2356.
• Raggi, D. (2005). Adaptive MCMC for inference on affine stochastic volatility models with jumps. The Economic Journal 8, 235–250.
• Robbins, H. and Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics 22, 400–407.
• Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods. New York: Springer.
• Roberts, G., Gelman, A. and Gilks, W. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. The Annals of Applied Probability 7, 110–120.
• Roberts, G. and Rosenthal, J. (2002). The polar slice sampler. Stochastic Models 18, 257–280.
• Roberts, G. O. and Rosenthal, J. S. (1997). Geometric ergodicity and hybrid Markov chains. Electronic Communications in Probability 2, 13–25.
• Roberts, G. O. and Rosenthal, J. S. (1998). Optimal scaling of discrete approximations to lange diffusions. Journal of the Royal Statistical Society, Series B 60, 255–268.
• Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various Metropolis–Hastings algorithms. Statistical Science 16, 351–367.
• Roberts, G. O. and Rosenthal, J. S. (2007). Coupling and ergodicity of adaptive MCMC. Journal of Applied Probability 44, 458–475.
• Roberts, G. O. and Rosenthal, J. S. (2009). Examples of adaptive MCMC. Journal of Computational and Graphical Statistics 18, 349–367.
• Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2, 341–363.
• Rosenthal, J. S. (2011). Optimal proposal distributions and adaptive MCMC. In Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. L. Jones and X.-L. Meng, eds.) 93–111. New York: Chapman & Hall.
• Tierney, L. and Mira, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18, 2507–2515.
• Trias, M., Vecchio, A. and Veitch, J. (2009). Delayed rejection schemes for efficient Markov-chain Monte-Carlo sampling of multimodal distributions. Available at arXiv:0904.2207.
• Umstätter, R., Meyer, R., Dupuis, R., Veitch, J., Woan, G. and Christensen, N. (2004). Estimating the parameters of gravitational waves from neutron stars using an adaptive MCMC method. Classical and Quantum Gravity 21, 1655–1675.