Statistical Science

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modeling

James P. Hobert, Vivekananda Roy, and Christian P. Robert

Full-text: Open access

Abstract

The reversible Markov chains that drive the data augmentation (DA) and sandwich algorithms define self-adjoint operators whose spectra encode the convergence properties of the algorithms. When the target distribution has uncountable support, as is nearly always the case in practice, it is generally quite difficult to get a handle on these spectra. We show that, if the augmentation space is finite, then (under regularity conditions) the operators defined by the DA and sandwich chains are compact, and the spectra are finite subsets of [0, 1). Moreover, we prove that the spectrum of the sandwich operator dominates the spectrum of the DA operator in the sense that the ordered elements of the former are all less than or equal to the corresponding elements of the latter. As a concrete example, we study a widely used DA algorithm for the exploration of posterior densities associated with Bayesian mixture models [J. Roy. Statist. Soc. Ser. B 56 (1994) 363–375]. In particular, we compare this mixture DA algorithm with an alternative algorithm proposed by Frühwirth-Schnatter [J. Amer. Statist. Assoc. 96 (2001) 194–209] that is based on random label switching.

Article information

Source
Statist. Sci., Volume 26, Number 3 (2011), 332-351.

Dates
First available in Project Euclid: 31 October 2011

Permanent link to this document
https://projecteuclid.org/euclid.ss/1320066924

Digital Object Identifier
doi:10.1214/11-STS365

Mathematical Reviews number (MathSciNet)
MR2918006

Zentralblatt MATH identifier
1246.60095

Keywords
Compact operator convergence rate eigenvalue label switching Markov operator Monte Carlo operator norm positive operator reversible Markov chain sandwich algorithm spectrum

Citation

Hobert, James P.; Roy, Vivekananda; Robert, Christian P. Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modeling. Statist. Sci. 26 (2011), no. 3, 332--351. doi:10.1214/11-STS365. https://projecteuclid.org/euclid.ss/1320066924


Export citation

References

  • Asmussen, S. and Glynn, P. (2011). A new proof of convergence of MCMC via the ergodic theorem. Statist. Probab. Lett. 81 1482–1485.
  • Buja, A. (1990). Remarks on functional canonical variates, alternating least squares methods and ACE. Ann. Statist. 18 1032–1069.
  • Celeux, G., Hurn, M. and Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. J. Amer. Statist. Assoc. 95 957–970.
  • Diaconis, P., Khare, K. and Saloff-Coste, L. (2008). Gibbs sampling, exponential families and orthogonal polynomials. Statist. Sci. 23 151–178. With comments and a rejoinder by the authors.
  • Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. J. Roy. Statist. Soc. Ser. B 56 363–375.
  • Flegal, J. M., Haran, M. and Jones, G. L. (2008). Markov chain Monte Carlo: Can we trust the third significant figure? Statist. Sci. 23 250–260.
  • Frühwirth-Schnatter, S. (2001). Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J. Amer. Statist. Assoc. 96 194–209.
  • Hobert, J. P. (2011). The data augmentation algorithm: Theory and methodology. In Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. Jones and X.-L. Meng, eds.). Chapman & Hall/CRC Press, Boca Raton, FL.
  • Hobert, J. P. and Marchev, D. (2008). A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms. Ann. Statist. 36 532–554.
  • Hobert, J. P. and Román, J. C. (2011). Discussion of “To center or not to center: That is not the question—An ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency,” by Y. Yu and X.-L. Meng. J. Comput. Graph. Statist. 20 571–580.
  • Jasra, A., Holmes, C. C. and Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statist. Sci. 20 50–67.
  • Jones, G. L. and Hobert, J. P. (2001). Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statist. Sci. 16 312–334.
  • Jones, G. L., Haran, M., Caffo, B. S. and Neath, R. (2006). Fixed-width output analysis for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 101 1537–1547.
  • Lee, K., Marin, J. M., Mengersen, K. L. and Robert, C. (2008). Bayesian inference on mixtures of distributions. In Platinum Jubilee of the Indian Statistical Institute (N. N. Sastry, ed.). Indian Statistical Institute, Bangalore.
  • Liu, J. S. and Sabatti, C. (2000). Generalised Gibbs sampler and multigrid Monte Carlo for Bayesian computation. Biometrika 87 353–369.
  • Liu, J. S., Wong, W. H. and Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81 27–40.
  • Liu, J. S., Wong, W. H. and Kong, A. (1995). Covariance structure and convergence rate of the Gibbs sampler with various scans. J. Roy. Statist. Soc. Ser. B 57 157–169.
  • Liu, J. S. and Wu, Y. N. (1999). Parameter expansion for data augmentation. J. Amer. Statist. Assoc. 94 1264–1274.
  • Meng, X.-L. and van Dyk, D. A. (1999). Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86 301–320.
  • Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.
  • Mira, A. and Geyer, C. J. (1999). Ordering Monte Carlo Markov chains. Technical Report 632, School of Statistics, Univ. Minnesota.
  • Papaspiliopoulos, O., Roberts, G. O. and Sköld, M. (2007). A general framework for the parametrization of hierarchical models. Statist. Sci. 22 59–73.
  • Retherford, J. R. (1993). Hilbert Space: Compact Operators and the Trace Theorem. London Mathematical Society Student Texts 27. Cambridge Univ. Press, Cambridge.
  • Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer, New York.
  • Roberts, G. O. and Rosenthal, J. S. (1997). Geometric ergodicity and hybrid Markov chains. Electron. Comm. Probab. 2 13–25 (electronic).
  • Roberts, G. O. and Rosenthal, J. S. (2001). Markov chains and de-initializing processes. Scand. J. Stat. 28 489–504.
  • Rosenthal, J. S. (2003). Asymptotic variance and convergence rates of nearly-periodic Markov chain Monte Carlo algorithms. J. Amer. Statist. Assoc. 98 169–177.
  • Roy, V. and Hobert, J. P. (2007). Convergence rates and asymptotic standard errors for Markov chain Monte Carlo algorithms for Bayesian probit regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 607–623.
  • Rudin, W. (1991). Functional Analysis, 2nd ed. McGraw-Hill, New York.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528–550.
  • Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701–1762.
  • van Dyk, D. A. and Meng, X.-L. (2001). The art of data augmentation (with discussion). J. Comput. Graph. Statist. 10 1–50.
  • Voss, H. (2003). Variational characterizations of eigenvalues of nonlinear eigenproblems. In Proceedings of the International Conference on Mathematical and Computer Modelling in Science and Engineering (M. Kocandrlova and V. Kelar, eds.) 379–383. Czech Technical Univ., Prague.
  • Yu, Y. and Meng, X. L. (2011). To center or not to center: That is not the question—An ancillarity-sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency (with discussion). J. Comput. Graph. Statist. 20 531–570.