Bayesian Analysis

Likelihood-free estimation of model evidence

Xavier Didelot, Richard G. Everitt, Adam M. Johansen, and Daniel J. Lawson

Full-text: Open access

Abstract

Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of "likelihood-free" methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to estimating the parameters of a given model, but can also be used to compare models.

Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from human population genetics: the comparison of different demographic models based upon genetic data from the Y chromosome.

Article information

Source
Bayesian Anal. Volume 6, Number 1 (2011), 49-76.

Dates
First available in Project Euclid: 13 June 2012

Permanent link to this document
https://projecteuclid.org/euclid.ba/1339611941

Digital Object Identifier
doi:10.1214/11-BA602

Mathematical Reviews number (MathSciNet)
MR2781808

Zentralblatt MATH identifier
1330.62118

Subjects
Primary: 62F15: Bayesian inference
Secondary: 62P10: Applications to biology and medical sciences 65C05: Monte Carlo methods 68W20: Randomized algorithms

Citation

Didelot, Xavier; Everitt, Richard G.; Johansen, Adam M.; Lawson, Daniel J. Likelihood-free estimation of model evidence. Bayesian Anal. 6 (2011), no. 1, 49--76. doi:10.1214/11-BA602. https://projecteuclid.org/euclid.ba/1339611941


Export citation

References

  • Beaumont, M. A., Zhang, W., and Balding, D. J. (2002). "Approximate Bayesian Computation in Population Genetics"." Genetics, 162(4): 2025–2035. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retri eve&db=pubmed&dopt=Abstract&list_uids=12524368
  • Chib, S. (1995). "Marginal Likelihood From the Gibbs Output"." Journal of the American Statistical Association, 90(432): 1313–1321.
  • Del Moral, P. (2004). Feynman-Kac formulae: genealogical and interacting particle systems with applications. Probability and Its Applications. New York: Springer.
  • Del Moral, P., Doucet, A., and Jasra, A. (2006). "Sequential monte carlo samplers"." Journal of the Royal Statistical Society Series B, 68(3): 411–436.
  • –- (2008). "An adaptive sequential Monte Carlo method for approximate Bayesian computation"." Technical Report, Imperial College London. http://www2.imperial.ac.uk/~aj2/smc_abc_arno.pdf
  • Dellaportas, P., Forster, J., and Ntzoufras, I. (2002). "On Bayesian model and variable selection using MCMC"." Statistics and Computing, 12(1): 27–36.
  • Fearnhead, P. and Prangle, D. (2010). "Semi-automatic Approximate Bayesian Computation"." Arxiv preprint arXiv:1004.1112.
  • Friel, N. and Pettitt, A. (2008). "Marginal likelihood estimation via power posteriors"." Journal Of The Royal Statistical Society Series B, 70(3): 589–607.
  • Fu, Y. and Li, W. (1997). "Estimating the age of the common ancestor of a sample of DNA sequences"." Molecular Biology and Evolution, 14(2): 195–199. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retri eve&db=pubmed&dopt=Abstract&list_uids=9029798
  • Gilks, W. and Spiegelhalter, D. (1996). Markov chain Monte Carlo in practice. Chapman & Hall/CRC.
  • Green, P. (1995). "Reversible jump Markov chain Monte Carlo computation and Bayesian model determination"." Biometrika, 82(4): 711–732.
  • Grelaud, A., Robert, C., Marin, J., Rodolphe, F., and Taly, J. (2009). "ABC likelihood-free methods for model choice in Gibbs random fields"." Bayesian Analysis, 4(2): 317–336.
  • Griffiths, R. and Tavaré, S. (1994). "Sampling theory for neutral alleles in a varying environment"." Philosophical Transactions of the Royal Society B, 344(1310): 403–410.
  • Hein, J., Schierup, M., and Wiuf, C. (2005). Gene genealogies, variation and evolution: a primer in coalescent theory. Oxford University Press, USA.
  • Jeffreys, H. (1961). Theory of probability. Clarendon Press, Oxford :, 3rd ed. edition.
  • Joyce, P. and Marjoram, P. (2008). "Approximately sufficient statistics and Bayesian computation"." Statistical Applications in Genetics and Molecular Biology, 7(1).
  • Kass, R. and Raftery, A. (1995). "Bayes factors"." Journal of the American Statistical Association, 90(430).
  • Kingman, J. F. C. (1982). "Exchangeability and the Evolution of Large Populations"." In Koch, G. and Spizzichino, F. (eds.), Exchangeability in Probability and Statistics, 97–112. North-Holland, Amsterdam.
  • –- (1982). "On the genealogy of large populations." Journal of Applied Probability, 19A: 27–43.
  • –- (1982). "The coalescent"." Stochastic Processes and their Applications, 13(235): 235–248.
  • Klaas, M., de Freitas, N., and Doucet, A. (2005). "Toward Practical N2 Monte Carlo: the Marginal Particle Filter." In Proceedings of the Proceedings of the Twenty-First Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), 308–315. Arlington, Virginia: AUAI Press. http://uai.sis.pitt.edu/papers/05/p308-klaas.pdf
  • Liu, J. (2001). Monte Carlo strategies in scientific computing. Springer Verlag.
  • Luciani, F., Sisson, S., Jiang, H., Francis, A., and Tanaka, M. (2009). "The epidemiological fitness cost of drug resistance in Mycobacterium tuberculosis"." Proceedings of the National Academy of Sciences, 106(34): 14711–14715.
  • Marjoram, P., Molitor, J., Plagnol, V., and Tavaré, S. (2003). "Markov chain Monte Carlo without likelihoods"." Proceedings of the National Academy of Sciences, 100(26): 15324–15328. http://www.pnas.org/cgi/content/abstract/100/26/15324
  • Meng, X. (1994). "Posterior predictive p-values"." The Annals of Statistics, 22(3): 1142–1160.
  • Neal, R. (2001). "Annealed importance sampling"." Statistics and Computing, 11(2): 125–139.
  • Newton, M. and Raftery, A. (1994). "Approximate Bayesian inference with the weighted likelihood bootstrap"." Journal of the Royal Statistical Society Series B, 56(1): 3–48.
  • Ohta, T. and Kimura, M. (1973). "A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population." Genetical Research, 22(2): 201–204. http://www.hubmed.org/display.cgi?uids=4777279
  • Pérez-Lezaun, A., Calafell, F., Seielstad, M., Mateu, E., Comas, D., Bosch, E., and Bertranpetit, J. (1997). "Population genetics of Y-chromosome short tandem repeats in humans"." Journal of Molecular Evolution, 45(3): 265–270. http://www.hubmed.org/display.cgi?uids=9302320
  • Peters, G., Fan, Y., and Sisson, S. (2008). "On sequential Monte Carlo, partial rejection control and approximate Bayesian computation"." Arxiv preprint arXiv:0808.3466.
  • Peters, G. W. (2005). "Topics In Sequential Monte Carlo Samplers"." M.Sc., University of Cambridge, Department of Engineering.
  • Pritchard, J., Seielstad, M., Perez-Lezaun, A., and Feldman, M. (1999). "Population growth of human Y chromosomes: a study of Y chromosome microsatellites"." Molecular Biology and Evolution, 16(12): 1791–1798. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retri eve&db=pubmed&dopt=Abstract&list_uids=10605120
  • Ratmann, O., Andrieu, C., Wiuf, C., and Richardson, S. (2009). "Model criticism based on likelihood-free inference, with an application to protein network evolution." Proceedings of the National Academy of Sciences, 106(26): 10576–10581. http://www.hubmed.org/display.cgi?uids=19525398
  • Robert, C. P. (2001). The Bayesian Choice. Springer Texts in Statistics. New York: Springer Verlag, 2nd edition.
  • Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods. New York: Springer, 2nd edition.
  • Robert, C. P., Mengersen, K., and Chen, C. (2010). "Model choice versus model criticism"." Proceedings of the National Academy of Sciences, 107(3): E5–E5. http://www.pnas.org/content/107/3/E5.short
  • Rogers, A. R. and Harpending, H. (1992). "Population growth makes waves in the distribution of pairwise genetic differences." Molecular Biology and Evolution, 9(3): 552–569. http://www.hubmed.org/display.cgi?uids=1316531
  • Seielstad, M. T., Minch, E., and Cavalli-Sforza, L. L. (1998). "Genetic evidence for a higher female migration rate in humans"." Nature Genetics, 20(3): 278–280. http://www.hubmed.org/display.cgi?uids=9806547
  • Shao, J. (1999). Mathematical Statistics. Springer.
  • Sisson, S. A., Fan, Y., and Tanaka, M. M. (2007). "Sequential Monte Carlo without likelihoods"." Proceedings of the National Academy of Sciences, 104(6): 1760–1765. http://www.pnas.org/content/104/6/1760.abstract
  • Slatkin, M. and Hudson, R. R. (1991). "Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations"." Genetics, 129(2): 555–562. http://www.hubmed.org/display.cgi?uids=1743491
  • Stephens, M. (2000). "Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods"." The Annals of Statistics, 28(1): 40–74.
  • Tajima, F. (1989). "The effect of change in population size on DNA polymorphism"." Genetics, 123(3): 597–601. http://www.hubmed.org/display.cgi?uids=2599369
  • Tavaré, S., Balding, D. J., Griffiths, R. C., and Donnelly, P. (1997). "Inferring Coalescence Times From DNA Sequence Data"." Genetics, 145(2): 505–518. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retri eve&db=pubmed&dopt=Abstract&list_uids=9071603
  • Thomson, R., Pritchard, J. K., Shen, P., Oefner, P. J., and Feldman, M. W. (2000). "Recent common ancestry of human Y chromosomes: evidence from DNA sequence data"." Proceedings of the National Academy of Sciences, 97(13): 7360–7365. http://www.hubmed.org/display.cgi?uids=10861004
  • Thornton, K. and Andolfatto, P. (2006). "Approximate Bayesian Inference Reveals Evidence for a Recent, Severe Bottleneck in a Netherlands Population of Drosophila melanogaster"." Genetics, 172(3): 1607–1619. http://www.genetics.org/cgi/content/abstract/172/3/1607
  • Toni, T., Welch, D., Strelkowa, N., Ipsen, A., and Stumpf, M. (2009). "Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems"." Journal of The Royal Society Interface, 6(31): 187–202.
  • Weiss, G. and von Haeseler, A. (1998). "Inference of Population History Using a Likelihood Approach"." Genetics, 149(3): 1539–1546. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retri eve&db=pubmed&dopt=Abstract&list_uids=9649540
  • Wilkinson, R. (2008). "Approximate Bayesian computation (ABC) gives exact results under the assumption of model error"." Arxiv preprint arXiv:0811.3355.
  • Wilkinson, R. G. (2007). "Bayesian Estimation of Primate Divergence Times." Ph.D. thesis, University of Cambridge.