## Bayesian Analysis

### Function-Specific Mixing Times and Concentration Away from Equilibrium

#### Abstract

Slow mixing is the central hurdle is applications of Markov chains, especially those used for Monte Carlo approximations (MCMC). In the setting of Bayesian inference, it is often only of interest to estimate the stationary expectations of a small set of functions, and so the usual definition of mixing based on total variation convergence may be too conservative. Accordingly, we introduce function-specific analogs of mixing times and spectral gaps, and use them to prove Hoeffding-like function-specific concentration inequalities. These results show that it is possible for empirical expectations of functions to concentrate long before the underlying chain has mixed in the classical sense, and we show that the concentration rates we achieve are optimal up to constants. We use our techniques to derive confidence intervals that are sharper than those implied by both classical Markov-chain Hoeffding bounds and Berry-Esseen-corrected central limit theorem (CLT) bounds. For applications that require testing, rather than point estimation, we show similar improvements over recent sequential testing results for MCMC. We conclude by applying our framework to real-data examples of MCMC, providing evidence that our theory is both accurate and relevant to practice.

#### Article information

Source
Bayesian Anal., Volume 15, Number 2 (2020), 505-532.

Dates
First available in Project Euclid: 13 June 2019

https://projecteuclid.org/euclid.ba/1560391236

Digital Object Identifier
doi:10.1214/19-BA1151

Mathematical Reviews number (MathSciNet)
MR4078723

#### Citation

Rabinovich, Maxim; Ramdas, Aaditya; Jordan, Michael I.; Wainwright, Martin J. Function-Specific Mixing Times and Concentration Away from Equilibrium. Bayesian Anal. 15 (2020), no. 2, 505--532. doi:10.1214/19-BA1151. https://projecteuclid.org/euclid.ba/1560391236

#### References

• Aldous, D. and Diaconis, P. (1986). “Shuffling cards and stopping times.” American Mathematical Monthly, 93(5): 333–348.
• Belin, T. R. and Rubin, D. B. (1995). “The analysis of repeated-measures data on schizophrenic reaction times using mixture models.” Statistics in Medicine, 14(8): 747–768.
• Choi, H. M. and Hobert, J. P. (2013). “The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic.” Electronic Journal of Statistics, 7: 2054–2064.
• Chung, K., Lam, H., Liu, Z., and Mitzenmacher, M. (2012). “Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified.” In 29th International Symposium on Theoretical Aspects of Computer Science, STACS 2012, 124–135.
• Conger, M. and Viswanath, D. (2006). “Riffle shuffles of decks with repeated cards.” The Annals of Probability, 34(2): 804–819.
• Diaconis, P. and Fill, J. A. (1990). “Strong stationary times via a new form of duality.” The Annals of Probability, 18(4): 1483–1522.
• Diaconis, P. and Hough, B. (2015). “Random walk on unipotent matrix groups.” arXiv preprint arXiv: 1512.06304.
• Flegal, J. M., Haran, M., and Jones, G. L. (2008). “Markov chain Monte Carlo: Can we trust the third significant figure?” Statistical Science, 23(2): 250–260.
• Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis, Third Edition. Chapman and Hall/CRC.
• Ghahramani, Z. and Griffiths, T. L. (2005). “Infinite latent feature models and the Indian buffet process.” In Advances in Neural Information Processing Systems 18: Annual Conference on Neural Information Processing Systems, NIPS 2005, 475–482.
• Gillman, D. (1998). “A Chernoff bound for random walks on expander graphs.” SIAM Journal on Computing, 27(4): 1203–1220.
• Glynn, P. W. and Lim, E. (2009). “Asymptotic validity of batch means steady-state confidence intervals.” In Advancing the Frontiers of Simulation, 87–104. Springer.
• Griffiths, T. L. and Steyvers, M. (2004). “Finding scientific topics.” Proceedings of the National Academy of Sciences, 101(suppl 1): 5228–5235.
• Gyori, B. M. and Paulin, D. (2012). “Non-asymptotic confidence intervals for MCMC in practice.” arXiv preprint arXiv: 1212.2016.
• Hayashi, M. and Watanabe, S. (2016). “Information geometry approach to parameter estimation in Markov chains.” The Annals of Statistics, 44(4): 1495–1535.
• Hsu, D., Kontorovich, A., Levin, D. A., Peres, Y., and Szepesvári, C. (2017). “Mixing Time Estimation in Reversible Markov Chains from A Single Sample Path.” arXiv preprint arXiv: 1708.07367.
• Hsu, D. J., Kontorovich, A., and Szepesvári, C. (2015). “Mixing time estimation in reversible Markov chains from a single sample path.” In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, NIPS 2015, 1459–1467.
• Jain, S., Neal, R. M., et al. (2007). “Splitting and merging components of a nonconjugate Dirichlet process mixture model.” Bayesian Analysis, 2(3): 445–472.
• Jones, G. L. and Hobert, J. P. (2001). “Honest exploration of intractable probability distributions via Markov chain Monte Carlo.” Statistical Science, 16(4): 312–334.
• Joulin, A., Ollivier, Y., et al. (2010). “Curvature, concentration and error estimates for Markov chain Monte Carlo.” The Annals of Probability, 38(6): 2418–2442.
• Kontorovich, A., Weiss, R., et al. (2014). “Uniform Chernoff and Dvoretzky-Kiefer-Wolfowitz-type inequalities for Markov chains and related processes.” Journal of Applied Probability, 51(4): 1100–1113.
• Léon, C. A. and Perron, F. (2004). “Optimal Hoeffding bounds for discrete reversible Markov chains.” The Annals of Applied Probability, 14(2): 958–970.
• Levin, D. A., Peres, Y., and Wilmer, E. L. (2008). Markov Chains and Mixing Times. American Mathematical Society.
• Lezaud, P. (1998). “Chernoff-type bound for finite Markov chains.” The Annals of Applied Probability, 8(3): 849–867.
• Lezaud, P. (2001). “Chernoff and Berry–Esséen inequalities for Markov processes.” ESAIM: Probability and Statistics, 5: 183–201.
• Meyn, S. P. and Tweedie, R. L. (2012). Markov Chains and Stochastic Stability. Springer Science & Business Media.
• Mimno, D. M., Hoffman, M. D., and Blei, D. M. (2012). “Sparse stochastic inference for latent Dirichlet allocation.” In Proceedings of the 29th International Conference on Machine Learning, ICML 2012.
• Neal, R. M. (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9(2): 249–265.
• Ollivier, Y. (2009). “Ricci curvature of Markov chains on metric spaces.” Journal of Functional Analysis, 256(3): 810–864.
• Paulin, D. (2012). “Concentration inequalities for Markov chains by Marton couplings and spectral methods.” arXiv preprint arXiv: 1212.2015.
• Rabinovich, M., Ramdas, A., Jordan, M. I., and Wainwright, M. J. (2019). “Function-Specific Mixing Times and Concentration Away from Equilibrium (Supplementary Material).” Bayesian Analysis.
• Robert, C. P. and Casella, G. (2005). Monte Carlo Statistical Methods (Springer Texts in Statistics). Secaucus, NJ, USA: Springer-Verlag New York, Inc.
• Román, J. C. and Hobert, J. P. (2015). “Geometric ergodicity of Gibbs samplers for Bayesian general linear mixed models with proper priors.” Linear Algebra and its Applications, 473: 54–77. Special issue on Statistics.
• Samson, P.-M. et al. (2000). “Concentration of measure inequalities for Markov chains and $\Phi$-mixing processes.” The Annals of Probability, 28(1): 416–461.
• Sinclair, A. (1992). “Improved bounds for mixing rates of Markov chains and multicommodity flow.” Combinatorics, Probability, and Computing, 1(4): 351–370.
• Watanabe, S. and Hayashi, M. (2017). “Finite-length analysis on tail probability for Markov chain and application to simple hypothesis testing.” The Annals of Applied Probability, 27(2): 811–845.