Bayesian Analysis

Importance Sampling Schemes for Evidence Approximation in Mixture Models

Jeong Eun Lee and Christian P. Robert

Full-text: Open access

Abstract

The marginal likelihood is a central tool for drawing Bayesian inference about the number of components in mixture models. It is often approximated since the exact form is unavailable. A bias in the approximation may be due to an incomplete exploration by a simulated Markov chain (e.g. a Gibbs sequence) of the collection of posterior modes, a phenomenon also known as lack of label switching, as all possible label permutations must be simulated by a chain in order to converge and hence overcome the bias. In an importance sampling approach, imposing label switching to the importance function results in an exponential increase of the computational cost with the number of components. In this paper, two importance sampling schemes are proposed through choices for the importance function: a maximum likelihood estimate (MLE) proposal and a Rao–Blackwellised importance function. The second scheme is called dual importance sampling. We demonstrate that this dual importance sampling is a valid estimator of the evidence. To reduce the induced high demand in computation, the original importance function is approximated, but a suitable approximation can produce an estimate with the same precision and with less computational workload.

Article information

Source
Bayesian Anal. Volume 11, Number 2 (2016), 573-597.

Dates
First available in Project Euclid: 25 August 2015

Permanent link to this document
https://projecteuclid.org/euclid.ba/1440507475

Digital Object Identifier
doi:10.1214/15-BA970

Mathematical Reviews number (MathSciNet)
MR3472003

Keywords
model evidence importance sampling mixture models marginal likelihood

Citation

Lee, Jeong Eun; Robert, Christian P. Importance Sampling Schemes for Evidence Approximation in Mixture Models. Bayesian Anal. 11 (2016), no. 2, 573--597. doi:10.1214/15-BA970. https://projecteuclid.org/euclid.ba/1440507475


Export citation

References

  • Antoniak, C. (1974). “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.” The Annals of Statistics, 2: 1152–1174.
  • Ardia, D., Baştürk, N., Hoogerheide, L., and van Dijk, H. K. (2012). “A comparative study of Monte Carlo methods for efficient evaluation of marginal likelihood.” Computational Statistics and Data Analysis, 56: 3398–3414.
  • Berkhof, J., Mechelen, I. v., and Gelman, A. (2003). “A Bayesian approach to the selection and testing of mixture models.” Statistical Sinica, 13(3): 423–442.
  • Carlin, B. and Chib, S. (1995). “Bayesian model choice through Markov chain Monte Carlo.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57(3): 473–484.
  • Celeux, G., Hurn, M., and Robert, C. P. (2000). “Computational and inferential difficulties with mixture posterior distributions.” Journal of the American Statistical Association, 95(3): 957–979.
  • Chen, M.-H., Shao, Q. M., and Ibrahim, J. G. (2000). Monte Carlo Methods in Bayesian Computation. Springer Series in Statistics, first edition.
  • Chib, S. (1995). “Marginal likelihoods from the Gibbs output.” Journal of the American Statistical Association, 90: 1313–1321.
  • — (1996). “Calculating posterior distributions and modal estimates in Markov mixture models.” Journal of Econometrics, 75: 79–97.
  • Chopin, N. (2002). “A sequential particle filter method for static models.” Biometrika, 89(3): 539–552.
  • Chopin, N. and Robert, C. P. (2010). “Properties of nested sampling.” Biometrika, 97: 741–755.
  • Congdon, P. (2006). “Bayesian model choice based on Monte Carlo estimates of posterior model probabilities.” Computational Statistics and Data Analysis, 50: 346–357.
  • DiCiccio, A. P., Kass, R. E., Raftery, A., and Wasserman, L. (1997). “Computing Bayes factors by combining simulation and asymptotic approximations.” Journal of the American Statistical Association, 92: 903–915.
  • Diebolt, J. and Robert, C. (1994). “Estimation of finite mixture distributions through Bayesian sampling.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 56: 363–375.
  • Doucet, A., Godsill, S., and Andrieu, C. (2000). “On sequential Monte Carlo sampling methods for Bayesian filtering.” Statistics and Computing, 10: 197–208.
  • Escobar, M. and West, M. (1995). “Bayesian density estimation and inference using mixtures.” Journal of the American Statistical Association, 90(430): 577–588.
  • Friel, N. and Pettitt, A. N. (2008). “Marginal likelihood estimation via power posteriors.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70: 589–607.
  • Friel, N. and Wyse, J. (2012). “Estimating the evidence: a review.” Statistica Neerlandica, 66(3): 288–308.
  • Frühwirth-Schnatter, S. (2001). “Markov Chain Monte Carlo estimation for classical and dynamic switching and mixture models.” Journal of the American Statistical Association, 96: 194–209.
  • — (2004). “Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques.” Journal of Econometrics, 7: 143–167.
  • Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer Series in Statistics, first edition.
  • — (2008). bayesf : Finite Mixture and Markov Switching Models. MATLAB package version 2.0. http://statmath.wu.ac.at/ fruehwirth/monographie/book_matlab_version_2.0.pdf
  • Gelfand, A. E. and Smith, A. F. M. (1990). “Sampling-based approaches to calculating marginal densities.” Journal of the American Statistical Association, 85: 398–409.
  • Gelman, A. and Meng, X. L. (1998). “Simulating normalizing constants: From importance sampling to bridge sampling to path sampling.” Statistical Science, 13: 163–185.
  • Geweke, J. (2012). “Interpretation and inference in mixture models: simple MCMC works.” Computational Statistics and Data Analysis, 51: 3529–3550.
  • Green, P. (1995). “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.” Biometrika, 85(4): 711–732.
  • Jasra, A., Holmes, C., and Stephens, D. (2005). “Markov Chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling.” Statistical Science, 20(1): 50–67.
  • Jeffreys, H. (1939). Theory of Probability. Oxford, The Clarendon Press, first edition.
  • Marin, J. and Robert, C. (2007). Bayesian Core. Springer-Verlag, New York.
  • — (2010a). “Importance sampling methods for Bayesian discrimination between embedded models.” In: Chen, M.-H., Dey, D., Müller, P., Sun, D., and Ye, K. (eds.), Frontiers of Statistical Decision Making and Bayesian Analysis. Springer-Verlag, New York.
  • — (2010b). “On resolving the Savage–Dickey paradox.” Electronic Journal of Statistics, 4: 643–654.
  • Marin, J.-M., Mengersen, K., and Robert, C. P. (2005). “Bayesian modelling and inference on mixtures of distributions.” In: Rao, C. and Dey, D. (eds.), Handbook of Statistics, volume 25. Springer-Verlag, New York.
  • Marin, J.-M. and Robert, C. P. (2008). “Approximating the marginal likelihood in mixture models.” Bulletin of the Indian Chapter of ISBA, 1: 2–7.
  • Meng, X. L. and Schilling, S. (2002). “Warp Bridge sampling.” Journal of Computational Graphical Statistics, 11(3): 552–586.
  • Meng, X. L. and Wong, W. H. (1996). “Simulating ratios of normalizing constants via a simple identity.” Statistica Sinica, 6: 831–860.
  • Mira, A. and Nicholls, G. (2004). “Bridge estimation of the probability density at a point.” Statistica Sinica, 14: 603–612.
  • Neal, R. M. (1999). “Erroneous results in Marginal likelihood from the Gibbs output.” http://www.cs.toronto.edu/~radford/chib-letter.html
  • — (2001). “Annealed importance sampling.” Statistics and Computing, 11: 125–139.
  • Newton, M. A. and Raftery, A. E. (1994). “Approximate Bayesian inference with the weighted likelihood bootstrap.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 96(1): 3–48.
  • Papastamoulis, P. (2013). label.switching: Relabelling MCMC outputs of mixture models. R package version 1.2. http://CRAN.R-project.org/package=label.switching
  • Papastamoulis, P. and Iliopoulos, G. (2010). “An artificial allocations based solution to the label switching problem in Bayesian analysis of mixtures of distributions.” Journal of Computational and Graphical Statistics, 19(2): 313–331.
  • Papastamoulis, P. and Roberts, G. (2008). “Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models.” Biometrika, 95: 315–321.
  • Perrakis, K., Ntzoufras, I., and Tsionas, E. G. (2014). “On the use of marginal posteriors in marginal likelihood estimation via importance sampling.” Computational Statistics and Data Analysis, 77: 54–69.
  • Raftery, A., Newton, M., Satagopan, J., and Krivitsky, P. (2006). “Estimating the integrated likelihood via posterior simulation using the harmonic mean identity.” Technical Report 499, University of Washington, Department of Statistics.
  • Rasmussen, C. E. (2000). “The Infinite Gaussian Mixture Model.” In: Advances in Neural Information Processing Systems 12, 554–560. MIT Press.
  • Richardson, S. and Green, P. (1997). “On Bayesian analysis of mixtures and with an unknown number of components.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(4): 731–792.
  • Robert, C. and Marin, J.-M. (2008). “On some difficulties with a posterior probability approximation technique.” Bayesian Analysis, 3(2): 427–442.
  • Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer, second edition.
  • Rodriguez, C. and Walker, S. (2014). “Label switching in Bayesian mixture models:Deterministic relabeling strategies.” Journal of Computational and Graphical Statistics, 21(1): 23–45.
  • Rubin, D. B. (1987). “Comment on “The calculation of posterior distributions by data augmentation” by M. A. Tanner and W. H. Wong.” Journal of the American Statistical Association, 82: 543–546.
  • — (1988). “Using the SIR algorithm to simulate posterior distributions.” In: Bernardo, J. M., DeGroot, M. H., Lindley, D. V., and Smith, A. F. M. (eds.), Bayesian Statistics, 3, 395–402. Oxford University Press.
  • Satagopan, J., Newton, M., and Raftery, A. (2000). “Easy Estimation of Normalizing Constants and Bayes Factors from Posterior Simulation: Stabilizing the Harmonic Mean Estimator.” Technical Report 1028, University of Wisconsin-Madison, Department of Statistics.
  • Scott, S. L. (2002). “Bayesian methods for hidden Markov models: recursive computing in the 21st Century.” Journal of the American Statistical Association, 97: 337–351.
  • Servidea, J. D. (2002). “Bridge sampling with dependent random draws: techniques and strategy.” Ph.D. thesis, Department of Statistics, The University of Chicago.
  • Skilling, J. (2007). “Nested sampling for Bayesian computations.” Bayesian Analysis, 1(4): 833–859.
  • Stephens, M. (2000a). “Bayesian Analysis of Mixture Models with an Unknown Number of Components – An Alternative to Reversible Jump Methods.” The Annals of Statistics, 28(1): 40–74.
  • — (2000b). “Dealing with label switching in mixture models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62: 795–809.
  • Tierney, L. and Kadane, J. (1986). “Accurate approximations for posterior moments and marginal densities.” Journal of the American Statistical Association, 81: 82–86.
  • Verdinelli, I. and Wasserman, L. (1995). “Computing Bayes factors using a generalization of the Savage–Dickey density ratio.” Journal of the American Statistical Association, 90: 614–618.
  • Voter, A. F. (1985). “A Monte Carlo method for determining free-energy differences and transition state theory rate constants.” Journal of Chemical Physics, 82: 1890–1899.