Statistical Science

Recursive Pathways to Marginal Likelihood Estimation with Prior-Sensitivity Analysis

Ewan Cameron and Anthony Pettitt

Full-text: Open access

Abstract

We investigate the utility to computational Bayesian analyses of a particular family of recursive marginal likelihood estimators characterized by the (equivalent) algorithms known as “biased sampling” or “reverse logistic regression” in the statistics literature and “the density of states” in physics. Through a pair of numerical examples (including mixture modeling of the well-known galaxy data set) we highlight the remarkable diversity of sampling schemes amenable to such recursive normalization, as well as the notable efficiency of the resulting pseudo-mixture distributions for gauging prior sensitivity in the Bayesian model selection context. Our key theoretical contributions are to introduce a novel heuristic (“thermodynamic integration via importance sampling”) for qualifying the role of the bridging sequence in this procedure and to reveal various connections between these recursive estimators and the nested sampling technique.

Article information

Source
Statist. Sci. Volume 29, Number 3 (2014), 397-419.

Dates
First available in Project Euclid: 23 September 2014

Permanent link to this document
https://projecteuclid.org/euclid.ss/1411437520

Digital Object Identifier
doi:10.1214/13-STS465

Mathematical Reviews number (MathSciNet)
MR3264552

Zentralblatt MATH identifier
1331.62128

Keywords
Bayes factor Bayesian model selection importance sampling marginal likelihood Metropolis-coupled Markov Chain Monte Carlo nested sampling normalizing constant path sampling reverse logistic regression thermodynamic integration

Citation

Cameron, Ewan; Pettitt, Anthony. Recursive Pathways to Marginal Likelihood Estimation with Prior-Sensitivity Analysis. Statist. Sci. 29 (2014), no. 3, 397--419. doi:10.1214/13-STS465. https://projecteuclid.org/euclid.ss/1411437520.


Export citation

References

  • Aitkin, M. (2001). Likelihood and Bayesian analysis of mixtures. Statist. Model. 1 287–304.
  • Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2 1152–1174.
  • Arima, S. and Tardella, L. (2012). Improved harmonic mean estimator for phylogenetic model evidence. J. Comput. Biol. 19 418–438.
  • Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M. A. and Alekseyenko, A. V. (2012). Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol. Biol. Evol. 29 2157–2167.
  • Bailer-Jones, C. A. L. (2012). A Bayesian method for the analysis of deterministic and stochastic time series. Astron. Astrophys. 546 A89.
  • Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York.
  • Brewer, B. J. and Stello, D. (2009). Gaussian process modelling of asteroseismic data. Mon. Not. R. Astron. Soc. 395 2226–2233.
  • Caimo, A. and Friel, N. (2013). Bayesian model selection for exponential random graph models. Social Networks 35 11–24.
  • Calderhead, B. and Girolami, M. (2009). Estimating Bayes factors via thermodynamic integration and population MCMC. Comput. Statist. Data Anal. 53 4028–4045.
  • Cameron, E. and Pettitt, A. N. (2013). On the evidence for cosmic variation of the fine structure constant (II): A semi-parametric Bayesian model selection analysis of the quasar dataset. Preprint. Available at arXiv:1309.2737.
  • Cappé, O., Guillin, A., Marin, J. M. and Robert, C. P. (2004). Population Monte Carlo. J. Comput. Graph. Statist. 13 907–929.
  • Chen, M.-H. and Shao, Q.-M. (1997). On Monte Carlo methods for estimating ratios of normalizing constants. Ann. Statist. 25 1563–1594.
  • Chen, M.-H., Shao, Q.-M. and Ibrahim, J. G. (2000). Monte Carlo Methods in Bayesian Computation. Springer, New York.
  • Chib, S. (1995). Marginal likelihood from the Gibbs output. J. Amer. Statist. Assoc. 90 1313–1321.
  • Chopin, N. (2002). A sequential particle filter method for static models. Biometrika 89 539–551.
  • Chopin, N. and Robert, C. P. (2010). Properties of nested sampling. Biometrika 97 741–755.
  • Cornuet, J.-M., Marin, J.-M., Mira, A. and Robert, C. P. (2012). Adaptive multiple importance sampling. Scand. J. Stat. 39 798–812.
  • Davis, T. M. et al. (2007). Scrutinizing exotic cosmological models using ESSENCE supernova data combined with other cosmological probes. Astrophys. J. 666 716–725.
  • Del Moral, P., Doucet, A. and Jasra, A. (2006). Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 411–436.
  • Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. J. Roy. Statist. Soc. Ser. B 56 363–375.
  • Doss, H. (2012). Hyperparameter and model selection for nonparametric Bayes problems via Radon–Nikodym derivatives. Statist. Sinica 22 1–26.
  • Dudley, R. M. and Philipp, W. (1983). Invariance principles for sums of Banach space valued random elements and empirical processes. Z. Wahrsch. Verw. Gebiete 62 509–552.
  • Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
  • Evans, M., Robert, C. P., Davison, A. C., Jiang, W., Tanner, M. A., Doss, H., Qin, J., Fokianos, K., MacEachern, S. N., Peruggia, M., Guha, S., Chib, S., Ritov, Y., Robins, J. M. and Vardi, Y. (2003). Discussion on the paper by Kong, McCullagh, Meng, Nicolas and Tan. J. Roy. Statist. Soc. B 65 604–618.
  • Fan, Y., Rui, W., Chen, M.-H., Kuo, L. and Lewis, P. O. (2012). Choosing among partition models in Bayesian phylogenetics. Mol. Biol. Evol. 28 523–532.
  • Feroz, F. and Hobson, M. P. (2008). Multimodal nested sampling: An efficient and robust alternative to Markov Chain Monte Carlo methods for astronomical data analyses. Mon. Not. R. Astron. Soc. 384 449–463.
  • Feroz, F., Hobson, M. P., Cameron, E. and Pettitt, A. N. (2013). Importance nested sampling and the multinest algorithm. Preprint. Available at arXiv:1306.2144.
  • Ferrenberg, A. M. and Swendsen, R. H. (1989). Optimized Monte Carlo data analysis. Phys. Rev. Lett. 63 1195–1198.
  • Friel, N. and Pettitt, A. N. (2008). Marginal likelihood estimation via power posteriors. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 589–607.
  • Friel, N. and Wyse, J. (2012). Estimating the evidence—A review. Stat. Neerl. 66 288–308.
  • Gelfand, A. E. and Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. J. Roy. Statist. Soc. Ser. B 56 501–514.
  • Gelman, A. and Meng, X.-L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statist. Sci. 13 163–185.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Geyer, C. J. (1992). Practical Markov chain Monte Carlo. Statist. Sci. 7 473–483.
  • Geyer, C. J. (1994). Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo. Technical Report 568, School of Statistics, Univ. Minnesota, Minneapolis, MN.
  • Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood for dependent data. J. Roy. Statist. Soc. Ser. B 54 657–699.
  • Gill, R. D., Vardi, Y. and Wellner, J. A. (1988). Large sample theory of empirical distributions in biased sampling models. Ann. Statist. 16 1069–1112.
  • Grün, B. and Leisch, F. (2010). BayesMix: An R package for Bayesian mixture modeling. Technical report.
  • Gunn, J. E. and Gott, J. R. III (1972). On the infall of matter into clusters of galaxies and some effects on their evolution. Astrophys. J. 176 1–19.
  • Habeck, M. (2012). Evaluation of marginal likelihoods via the density of states. J. Mach. Learn. Res. 22 486–494.
  • Halmos, P. R. (1950). Measure Theory. Van Nostrand, New York.
  • Halmos, P. R. and Savage, L. J. (1949). Application of the Radon–Nikodym theorem to the theory of sufficient statistics. Ann. Math. Statist. 20 225–241.
  • Hesterberg, T. (1995). Weighted average importance sampling and defensive mixture distributions. Technometrics 37 185–194.
  • Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statist. Sci. 14 382–417.
  • Hörmander, L. (1983). The Analysis of Linear Partial Differential Operators. I: Distribution Theory and Fourier Analysis. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 256. Springer, Berlin.
  • Jara, A., Hanson, T. E., Quintana, F. A., Müller, P. and Rosner, G. L. (2011). DPpackage: Bayesian semi- and nonparametric modelling in R. J. Statist. Softw. 40 1–30.
  • Jasra, A., Holmes, C. C. and Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statist. Sci. 20 50–67.
  • Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge Univ. Press, Cambridge.
  • Jeffreys, H. (1961). Theory of Probability, 3rd ed. Clarendon Press, Oxford.
  • Jeffreys, W. H. and Berger, J. O. (1991). Sharpening Ockham’s razor on a Bayesian strop. Technical Report 91-44C, Dept. Statistics, Purdue Univ., West Lafayette, IN.
  • Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773–795.
  • Kilbinger, M., Wraith, D., Robert, C. P., Benabed, K., Cappé, O., Cardoso, J.-F., Fort, G., Prunet, S. and Bouchet, F. R. (2010). Bayesian model comparison in cosmology with population Monte Carlo. Mon. Not. R. Astron. Soc. 405 2381–2390.
  • Kirshner, R. P., Oemler, A. Jr., Schechter, P. L. and Shectman, S. A. (1981). A million cubic megaparsec void in Boötes? Astrophys. J. 248 57–60.
  • Kong, A., Liu, J. S. and Wong, W. H. (1994). Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc. 89 278–288.
  • Kong, A., McCullagh, P., Meng, X.-L., Nicolae, D. and Tan, Z. (2003). A theory of statistical models for Monte Carlo integration. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 585–618.
  • Kumar, S., Rosenberg, J. M., Bouzida, D., Swendsen, R. H. and Kollman, P. A. (1992). The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. J. Comput. Chem. 13 1011–1021.
  • Lartillot, N. and Phillipe, H. (2006). Computing Bayes factors using thermodynamic integration. Syst. Biol. 55 195–207.
  • Lee, K., Marin, J.-M., Mengersen, K. and Robert, C. P. (2008). Bayesian inference on mixtures of distributions. Preprint. Available at arXiv:0804.2413.
  • Lefebvre, G., Steele, R. and Vandal, A. C. (2010). A path sampling identity for computing the Kullback–Leibler and J divergences. Comput. Statist. Data Anal. 54 1719–1731.
  • Li, Y., Ni, Z.-X. and Lin, J.-G. (2011). A stochastic simulation approach to model selection for stochastic volatility models. Comm. Statist. Simulation Comput. 40 1043–1056.
  • Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, New York.
  • Marin, J.-M., Pudlo, P. and Sedki, M. (2012). Consistency of the adaptive multiple importance sampling. Preprint. Available at arXiv:1301.2548.
  • Marin, J.-M. and Robert, C. P. (2010). On resolving the Savage–Dickey paradox. Electron. J. Stat. 4 643–654.
  • Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statist. Sinica 6 831–860.
  • Miller, J. W. and Harrison, M. T. (2013). A simple example of Dirichlet process mixture inconsistency for the number of components. Preprint. Available at arXiv:1301.2708v1.
  • Mukherjee, P., Parkinson, D. and Liddle, A. R. (2006). A nested sampling algorithm for cosmological model selection. Astrophys. J. 638 L51–L54.
  • Neal, R. (1999). Erroneous results in “Marginal likelihood from the Gibbs output.” Available at http://www.cs.toronto.edu/~radford/ftp/chib-letter.pdf.
  • Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 9 249–265.
  • Neal, R. M. (2001). Annealed importance sampling. Stat. Comput. 11 125–139.
  • Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. J. Roy. Statist. Soc. Ser. B 56 3–48.
  • Ortega, J. M. and Rheinboldt, W. C. (1967). Monotone iterations for nonlinear equations with application to Gauss–Seidel methods. SIAM J. Numer. Anal. 4 171–190.
  • Pfanzagl, J. (1979). Conditional distributions as derivatives. Ann. Probab. 7 1046–1050.
  • Phillips, D. B. and Smith, A. F. M. (1996). Bayesian model comparison via jump diffusions. In Markov Chain Monte Carlo in Practice 215–239. Chapman & Hall, London.
  • Postman, M., Huchra, J. P. and Geller, M. J. (1986). Probes of large-scale structure in the Corona Borealis region. Astrophys. J. 92 1238–1247.
  • Raftery, A. E., Newton, M. A., Satagopan, J. M. and Krivitsky, P. N. (2007). Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. In Bayesian Statistics 8 371–416. Oxford Univ. Press, Oxford.
  • Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components. J. Roy. Statist. Soc. Ser. B 59 731–792.
  • Robert, C. P. and Wraith, D. (2009). Computational methods for Bayesian model choice. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering: The 29th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering. AIP Conference Proceedings 1193 251–262. American Institute of Physics, New York.
  • Roeder, K. (1990). Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. J. Amer. Statist. Assoc. 85 617–624.
  • Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. J. Amer. Statist. Assoc. 92 894–902.
  • Schechter, P. (1976). An analytic expression for the luminosity function of galaxies. Astrophys. J. 203 297–306.
  • Shirts, M. R. and Chodera, J. D. (2008). Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129 124105.
  • Skilling, J. (2006). Nested sampling for general Bayesian computation. Bayesian Anal. 1 833–859 (electronic).
  • Stephens, M. (2000). Bayesian analysis of mixture models with an unknown number of components—An alternative to reversible jump methods. Ann. Statist. 28 40–74.
  • Tan, Z., Gallicchio, E., Lapelosa, M. and Levy, R. M. (2012). Theory of binless multi-state free energy estimation with applications to protein-ligand binding. J. Chem. Phys. 136 144102.
  • Tierney, L. (1994). Markov chains for exploring posterior distributions. Ann. Statist. 22 1701–1762.
  • Vardi, Y. (1985). Empirical distributions in selection bias models. Ann. Statist. 13 178–205.
  • Weinberg, M. D. (2012). Computing the Bayes factor from a Markov chain Monte Carlo simulation of the posterior distribution. Bayesian Anal. 7 737–769.
  • Wolpert, R. L. and Schmidler, S. C. (2012). $\alpha$-stable limit laws for harmonic mean estimators of marginal likelihoods. Statist. Sinica 22 1233–1251.
  • Xie, W., Lewis, P., Fan, Y., Kuo, L. and Chen, M.-H. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 18 1001–1013.