## Bernoulli

• Bernoulli
• Volume 25, Number 2 (2019), 1141-1159.

### Convergence rates for a class of estimators based on Stein’s method

#### Abstract

Gradient information on the sampling distribution can be used to reduce the variance of Monte Carlo estimators via Stein’s method. An important application is that of estimating an expectation of a test function along the sample path of a Markov chain, where gradient information enables convergence rate improvement at the cost of a linear system which must be solved. The contribution of this paper is to establish theoretical bounds on convergence rates for a class of estimators based on Stein’s method. Our analysis accounts for (i) the degree of smoothness of the sampling distribution and test function, (ii) the dimension of the state space, and (iii) the case of non-independent samples arising from a Markov chain. These results provide insight into the rapid convergence of gradient-based estimators observed for low-dimensional problems, as well as clarifying a curse-of-dimension that appears inherent to such methods.

#### Article information

Source
Bernoulli, Volume 25, Number 2 (2019), 1141-1159.

Dates
Revised: August 2017
First available in Project Euclid: 6 March 2019

https://projecteuclid.org/euclid.bj/1551862846

Digital Object Identifier
doi:10.3150/17-BEJ1016

#### Citation

Oates, Chris J.; Cockayne, Jon; Briol, François-Xavier; Girolami, Mark. Convergence rates for a class of estimators based on Stein’s method. Bernoulli 25 (2019), no. 2, 1141--1159. doi:10.3150/17-BEJ1016. https://projecteuclid.org/euclid.bj/1551862846

#### References

• [1] Andradóttir, S., Heyman, D.P. and Ott, T.J. (1993). Variance reduction through smoothing and control variates for Markov chain simulations. ACM Trans. Model. Comput. Simul. 3 167–189.
• [2] Assaraf, R. and Caffarel, M. (1999). Zero-variance principle for Monte Carlo algorithms. Phys. Rev. Lett. 83 4682–4685.
• [3] Assaraf, R. and Caffarel, M. (2003). Zero-variance zero-bias principle for observables in quantum Monte Carlo: Application to forces. J. Chem. Phys. 119 10536.
• [4] Azaïs, R., Delyon, B. and Portier, F. (2016). Integral estimation based on Markovian design. Available at arXiv:1609.01165.
• [5] Bahvalov, N.S. (1959). Approximate computation of multiple integrals. Vestnik Moskov. Univ. Ser. Mat. Meh. Astr. Fiz. Him. 1959 3–18.
• [6] Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Boston: Kluwer Academic.
• [7] Briol, F.-X., Oates, C.J., Cockayne, J., Chen, W.Y. and Girolami, M. (2017). On the sampling problem for kernel quadrature. In Proceedings of the 34th International Conference on Machine Learning.
• [8] Briol, F.-X., Oates, C.J., Girolami, M., Osborne, M.A. and Sejdinovic, D. (2017). Probabilistic integration: A role in statistical computation? Available at arXiv:1512.00933.
• [9] Carpenter, B., Hoffman, M.D., Brubaker, M., Lee, D., Li, P. and Betancourt, M. (2015). The Stan math library: Reverse-mode automatic differentiation in C$++$. Available at arXiv:1509.07164.
• [10] Chwialkowski, K., Strathmann, H. and Gretton, A. (2016). A kernel test of goodness of fit. In Proceedings of the 33rd International Conference on Machine Learning.
• [11] Cockayne, J., Oates, C.J., Sullivan, T. and Girolami, M. (2016). Probabilistic numerical methods for PDE-constrained Bayesian inverse problems. In Proceedings of the 36th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering.
• [12] Cockayne, J., Oates, C.J., Sullivan, T. and Girolami, M. (2017). Probabilistic meshless methods for Bayesian inverse problems. Available at arXiv:1605.07811.
• [13] Cockayne, J., Oates, C.J., Sullivan, T. and Girolami, M. (2017). Bayesian probabilistic numerical methods. Available at arXiv:1702.03673.
• [14] Dellaportas, P. and Kontoyiannis, I. (2012). Control variates for estimation based on reversible Markov chain Monte Carlo samplers. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 133–161.
• [15] Delyon, B. and Portier, F. (2016). Integral approximation by kernel smoothing. Bernoulli 22 2177–2208.
• [16] Dick, J., Gantner, R.N., Gia, Q.T.L. and Schwab, C. (2016). Higher order quasi-Monte Carlo integration for Bayesian estimation. Available at arXiv:1602.07363.
• [17] Gorham, J., Duncan, A.B., Vollmer, S.J. and Mackey, L. (2016). Measuring sample quality with diffusions. Available at arXiv:1611.06972.
• [18] Gorham, J. and Mackey, L. (2015). Measuring sample quality with Stein’s method. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems.
• [19] Gorham, J. and Mackey, L. (2017). Measuring sample quality with kernels. In Proceedings of the 34th International Conference on Machine Learning.
• [20] Hammer, H. and Tjelmeland, H. (2008). Control variates for the Metropolis–Hastings algorithm. Scand. J. Stat. 35 400–414.
• [21] Kanagawa, M., Sriperumbudur, B.K. and Fukumizu, K. (2016). Convergence guarantees for kernel-based quadrature rules in misspecified settings. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems.
• [22] Li, W., Chen, R. and Tan, Z. (2016). Efficient sequential Monte Carlo with multiple proposals and control variates. J. Amer. Statist. Assoc. 111 298–313.
• [23] Liu, Q. and Lee, J.D. (2017). Black-box importance sampling. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics.
• [24] Liu, Q., Lee, J.D. and Jordan, M.I. (2016). A kernelized Stein discrepancy for goodness-of-fit tests and model evaluation. In Proceedings of the 33rd International Conference on Machine Learning.
• [25] Maclaurin, D., Duvenaud, D., Johnson, M. and Adams, R.P. (2015). Autograd: Reverse-mode differentiation of native Python. Available at http://github.com/HIPS/autograd+.
• [26] Meyn, S.P. and Tweedie, R.L. (2009). Markov Chains and Stochastic Stability, 2nd ed. Cambridge: Cambridge Univ. Press.
• [27] Micchelli, C.A., Xu, Y. and Zhang, H. (2006). Universal kernels. J. Mach. Learn. Res. 7 2651–2667.
• [28] Migliorati, G., Nobile, F. and Tempone, R. (2015). Convergence estimates in probability and in expectation for discrete least squares with noisy evaluations at random points. J. Multivariate Anal. 142 167–182.
• [29] Mijatović, A. and Vogrinc, J. (2015). On the Poisson equation for Metropolis–Hastings chains. Available at arXiv:1511.07464.
• [30] Mijatović, A. and Vogrinc, J. (2017). Asymptotic variance for random walk Metropolis chains in high dimensions: Logarithmic growth via the Poisson equation. Available at arXiv:1707.08510.
• [31] Mira, A., Solgi, R. and Imparato, D. (2013). Zero variance Markov chain Monte Carlo for Bayesian estimators. Stat. Comput. 23 653–662.
• [32] Narcowich, F.J., Ward, J.D. and Wendland, H. (2005). Sobolev bounds on functions with scattered zeros, with applications to radial basis function surface fitting. Math. Comp. 74 743–763.
• [33] Niederreiter, H. (2010). Quasi-Monte Carlo Methods. New York: Wiley.
• [34] Oates, C.J., Cockayne, J., Briol, F.-X. and Girolami, M. (2019). Supplement to “Convergence rates for a class of estimators based on Stein’s method.” DOI:10.3150/17-BEJ1016SUPP.
• [35] Oates, C.J., Girolami, M. and Chopin, N. (2017). Control functionals for Monte Carlo integration. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 695–718.
• [36] Oates, C.J., Papamarkou, T. and Girolami, M. (2016). The controlled thermodynamic integral for Bayesian model evidence evaluation. J. Amer. Statist. Assoc. 111 634–645.
• [37] Robert, C. and Casella, G. (2013). Monte Carlo Statistical Methods. New York: Springer.
• [38] Roberts, G.O. and Rosenthal, J.S. (1998). On convergence rates of Gibbs samplers for uniform distributions. Ann. Appl. Probab. 8 1291–1302.
• [39] Rubinstein, R.Y. and Marcus, R. (1985). Efficiency of multivariate control variates in Monte Carlo simulation. Oper. Res. 33 661–677.
• [40] Schölkopf, B., Herbrich, R. and Smola, A.J. (2001). A generalized representer theorem. Lecture Notes in Comput. Sci. 2111 416–426.
• [41] Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability Theory 583–602. Berkeley, CA: Univ. California Press.
• [42] Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2 67–93.
• [43] Stuart, A.M. and Teckentrup, A.L. (2018). Posterior consistency for Gaussian process approximations of Bayesian posterior distributions. Math. Comp. To appear.
• [44] Wendland, H. (1995). Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree. Adv. Comput. Math. 4 389–396.
• [45] Wendland, H. (2004). Scattered Data Approximation. Cambridge: Cambridge Univ. Press.

#### Supplemental materials

• Supplement to “Convergence rates for a class of estimators based on Stein’s method”. Proofs of all theoretical results are provided.