Electronic Journal of Statistics

Use in practice of importance sampling for repeated MCMC for Poisson models

Dorota Gajda, Chantal Guihenneuc-Jouyaux, Judith Rousseau, Kerry Mengersen, and Darfiana Nur
Source: Electron. J. Statist. Volume 4 (2010), 361-383.

Abstract

The Importance Sampling method is used as an alternative approach to MCMC in repeated Bayesian estimations. In the particular context of numerous data sets, MCMC algorithms have to be called on several times which may become computationally expensive. Since Importance Sampling requires a sample from a posterior distribution, our idea is to use MCMC to generate only a certain number of Markov chains and use them later in the subsequent IS estimations. For each Importance Sampling procedure, the suitable chain is selected by one of three criteria we present here. The first and second criteria are based on the L1 norm of the difference between two posterior distributions and their Kullback-Leibler divergence respectively. The third criterion results from minimizing the variance of IS estimate. A supplementary automatic selection procedure is also proposed to choose those posterior for which Markov chains will be generated and to avoid arbitrary choice of importance functions. The featured methods are illustrated in simulation studies on three types of Poisson model: simple Poisson model, Poisson regression model and Poisson regression model with extra Poisson variability. Different parameter settings are considered.

First Page: Show Hide
Primary Subjects: 62F15, 65C05
Secondary Subjects: 65C60
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ejs/1268831481
Digital Object Identifier: doi:10.1214/09-EJS527
Mathematical Reviews number (MathSciNet): MR2645489

References

Asmussen, S., Kroese, D. P. and Rubinstein, R. Y. (2005). Heavy tails, importance sampling and cross-entropy., Stoch. Models 21 57–76.
Mathematical Reviews (MathSciNet): MR2124359
Zentralblatt MATH: 1073.62028
Digital Object Identifier: doi:10.1081/STM-200046472
Breslow, N. E. (1984). Extra-Poisson Variation in Log-Linear Models., Journal of the Royal Statistical Society. Series C (Applied Statistics) 33 38–44.
Brooks, S. P. and Roberts, G. O. (1998). Convergence assessment techniques for Markov chain Monte Carlo., Statistics and Computing 8 319–335. http://dx.doi.org/10.1023/A:1008820505350
Cappé, O., Guillin, A., Marin, J. M. and Robert, C. P. (2004). Population Monte Carlo., J. Comput. Graph. Statist. 13 907–929.
Mathematical Reviews (MathSciNet): MR2109057
Digital Object Identifier: doi:10.1198/106186004X12803
Cappé, O., Douc, R., Guillin, A., Marin, J. M. and Robert, C. P. (2007). Adaptive Importance Sampling in General Mixture Classes., Statistics and Computing (to appear). Available at http://www.citebase.org/abstract?id=oai:arXiv.org:0710.4242
Mathematical Reviews (MathSciNet): MR2461888
Digital Object Identifier: doi:10.1007/s11222-008-9059-x
Chen, M.-H. and Shao, Q.-M. (1997). Performance study of marginal posterior density estimation via Kullback-Leibler divergence., Test 6 321–350.
Mathematical Reviews (MathSciNet): MR1616900
Zentralblatt MATH: 0905.62021
Digital Object Identifier: doi:10.1007/BF02564702
Doss, H. (1994). Discussion of the paper “Markov chains for exploring posterior distributions” by Luke Tierney., Ann. Statist. 22 1728–1734.
Mathematical Reviews (MathSciNet): MR1329166
Zentralblatt MATH: 0829.62080
Digital Object Identifier: doi:10.1214/aos/1176325750
Project Euclid: euclid.aos/1176325750
Douc, R., Guillin, A., Marin, J.-M. and Robert, C. P. (2007a). Convergence of adaptive mixtures of importance sampling schemes., Ann. Statist. 35 420–448.
Mathematical Reviews (MathSciNet): MR2332281
Zentralblatt MATH: 1132.60022
Digital Object Identifier: doi:10.1214/009053606000001154
Project Euclid: euclid.aos/1181100193
Douc, R., Guillin, A., Marin, J.-M. and Robert, C. P. (2007b). Minimum variance importance sampling via population Monte Carlo., ESAIM Probab. Stat. 11 427–447 (electronic).
Mathematical Reviews (MathSciNet): MR2339302
Zentralblatt MATH: 1181.60028
Digital Object Identifier: doi:10.1051/ps:2007028
Gelfand, A. E., Dey, D. K. and Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods. In, Bayesian statistics, 4 (Peñíscola, 1991) 147–167. Oxford Univ. Press, New York.
Geman, S. and Geman, D. (1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images., IEEE Transactions on Pattern Analysis and Machine Intelligence 6 721–740.
Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration., Econometrica 57 1317–1339.
Mathematical Reviews (MathSciNet): MR1035115
Digital Object Identifier: doi:10.2307/1913710
Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood for dependent data., J. Roy. Statist. Soc. Ser. B 54 657–699. With discussion and a reply by the authors.
Mathematical Reviews (MathSciNet): MR1185217
Gilks, W., Richardson, S. and Spiegelhalter, D. (1996)., Markov chain Monte Carlo in practice. Interdisciplinary Statistics. Chapman & Hall, London. Edited by W. R. Gilks, S. Richardson and D. J. Spiegelhalter.
Mathematical Reviews (MathSciNet): MR1397966
Zentralblatt MATH: 0832.00018
Gustafson, P. and Wasserman, L. (1995). Local sensitivity diagnostics for Bayesian inference., Ann. Statist. 23 2153–2167.
Mathematical Reviews (MathSciNet): MR1389870
Zentralblatt MATH: 0854.62024
Digital Object Identifier: doi:10.1214/aos/1034713652
Project Euclid: euclid.aos/1034713652
Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications., Biometrika 57 97–109.
Kaufman, L. and Rousseeuw, P. J. (1990)., Finding groups in data: An introduction to cluster analysis. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons Inc., New York.
Mathematical Reviews (MathSciNet): MR1044997
Maechler, M., Rousseeuw, P., Struyf, A. and Hubert, M. (2005). Cluster Analysis Basics and Extensions. Rousseeuw et al provided the S original which has been ported to R by Kurt Hornik and has since been enhanced by Martin Maechler: speed improvements, silhouette() functionality, bug fixes, etc. See the ’Changelog’ file (in the package, source).
McVinish, R., Mengersen, K., Nur, D. C., Rousseau, J. and Guihenneuc-Jouyaux, C. (2008). Use of Importance Sampling for Repeated MCMC. School of Mathematical Sciences, Queensland University of Technology.,
Mengersen, K. L., Robert, C. P. and Guihenneuc-Jouyaux, C. (1999). MCMC convergence diagnostics: a reviewww. In, Bayesian statistics, 6 (Alcoceber, 1998) 415–440. Oxford Univ. Press, New York.
Mathematical Reviews (MathSciNet): MR1723507
Nakache, J. P. and Confais, J. (2005)., Approche pragmatique de la classification. Technip.
Zentralblatt MATH: 1071.62058
Ng, R. and Han, J. (1994). Efficient and effective clustering methods for spatial data mining. In, Proceedings of the 20th Conference on VLDB, Santiago, Chili 144–155.
R Development Core Team, (2008). R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria ISBN 3-900051-07-0. Available at, http://www.R-project.org
Robert, C. P. (2007)., The Bayesian choice, Second ed. Springer Texts in Statistics. Springer-Verlag, New York. From decision-theoretic foundations to computational implementation, Translated and revised from the French original by the author.
Mathematical Reviews (MathSciNet): MR2723361
Rubinstein, R. Y. and Kroese, D. P. (2004)., The cross-entropy method: A unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning. Information Science and Statistics. Springer-Verlag, New York.
Mathematical Reviews (MathSciNet): MR2080985
Thomas, A., O’Hara, B., Ligges, U. and Sturtz, S. (2006). Making BUGS Open., R News 6 12–17. Available at http://cran.r-project.org/doc/Rnews/
Tierney, L. (1994). Markov chains for exploring posterior distributions., Ann. Statist. 22 1701–1762. With discussion and a rejoinder by the author.
Mathematical Reviews (MathSciNet): MR1329166
Zentralblatt MATH: 0829.62080
Digital Object Identifier: doi:10.1214/aos/1176325750
Project Euclid: euclid.aos/1176325750
Woo, K. G., Lee, J. H., Kim, M. H. and Lee, Y. I. (2004). FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting., Informations & Software Technology 46 255–271.

2012 © Institute of Mathematical Statistics

Electronic Journal of Statistics

Electronic Journal of Statistics