Electronic Journal of Statistics

An extended empirical saddlepoint approximation for intractable likelihoods

Matteo Fasiolo, Simon N. Wood, Florian Hartig, and Mark V. Bravington

Full-text: Open access

Abstract

The challenges posed by complex stochastic models used in computational ecology, biology and genetics have stimulated the development of approximate approaches to statistical inference. Here we focus on Synthetic Likelihood (SL), a procedure that reduces the observed and simulated data to a set of summary statistics, and quantifies the discrepancy between them through a synthetic likelihood function. SL requires little tuning, but it relies on the approximate normality of the summary statistics. We relax this assumption by proposing a novel, more flexible, density estimator: the Extended Empirical Saddlepoint approximation. In addition to proving the consistency of SL, under either the new or the Gaussian density estimator, we illustrate the method using three examples. One of these is a complex individual-based forest model for which SL offers one of the few practical possibilities for statistical inference. The examples show that the new density estimator is able to capture large departures from normality, while being scalable to high dimensions, and this in turn leads to more accurate parameter estimates, relative to the Gaussian alternative. The new density estimator is implemented by the esaddle R package, which is freely available on the Comprehensive R Archive Network (CRAN).

Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 1544-1578.

Dates
Received: June 2017
First available in Project Euclid: 26 May 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1527300140

Digital Object Identifier
doi:10.1214/18-EJS1433

Mathematical Reviews number (MathSciNet)
MR3806432

Zentralblatt MATH identifier
06875408

Keywords
Intractable likelihood saddlepoint approximation synthetic likelihood simulation-based inference implicit statistical model density estimation

Rights
Creative Commons Attribution 4.0 International License.

Citation

Fasiolo, Matteo; Wood, Simon N.; Hartig, Florian; Bravington, Mark V. An extended empirical saddlepoint approximation for intractable likelihoods. Electron. J. Statist. 12 (2018), no. 1, 1544--1578. doi:10.1214/18-EJS1433. https://projecteuclid.org/euclid.ejs/1527300140


Export citation

References

  • Andrieu, C., G. O. Roberts, et al. (2009). The pseudo-marginal approach for efficient Monte Carlo computations., The Annals of Statistics 37(2), 697–725.
  • Bartolucci, F. (2007). A penalized version of the empirical likelihood ratio for the population mean., Statistics & Probability Letters 77(1), 104–110.
  • Beaumont, M. A., W. Zhang, and D. J. Balding (2002). Approximate Bayesian computation in population genetics., Genetics 162(4), 2025–2035.
  • Blum, M., M. Nunes, D. Prangle, and S. Sisson (2013). A comparative review of dimension reduction methods in approximate Bayesian computation., Statistical Science 28(2), 189–208.
  • Blum, M. G. (2010). Approximate Bayesian computation: a nonparametric perspective., Journal of the American Statistical Association 105(491).
  • Butler, R. W. (2007)., Saddlepoint approximations with applications. Cambridge University Press.
  • Cherubini, U., E. Luciano, and W. Vecchiato (2004)., Copula methods in finance. John Wiley & Sons.
  • Daniels, H. E. (1954). Saddlepoint approximations in statistics., The Annals of Mathematical Statistics 25(4), 631–650.
  • Davison, A. C. and D. V. Hinkley (1988). Saddlepoint approximations in resampling methods., Biometrika 75(3), 417–431.
  • Dislich, C., S. Günter, J. Homeier, B. Schröder, and A. Huth (2009). Simulating forest dynamics of a tropical montane forest in South Ecuador., Erdkunde 63(4), 347–364.
  • Doucet, A., S. Godsill, and C. Andrieu (2000). On sequential Monte Carlo sampling methods for Bayesian filtering., Statistics and Computing 10(3), 197–208.
  • Doucet, A., P. E. Jacob, and S. Rubenthaler (2013). Derivative-free estimation of the score vector and observed information matrix with application to state-space models., arXiv preprint arXiv:1304.5768.
  • Durbin, J. (1980). Approximations for densities of sufficient estimators., Biometrika 67(2), 311–333.
  • Easton, G. S. and E. Ronchetti (1986). General saddlepoint approximations with applications to $\textL$-statistics., Journal of the American Statistical Association 81(394), 420–430.
  • Everitt, R. G., A. M. Johansen, E. Rowing, and M. Evdemon-Hogan (2015). Bayesian model comparison with intractable likelihoods., arXiv preprint arXiv:1504.00298.
  • Fasiolo, M., N. Pya, S. N. Wood, et al. (2016). A comparison of inferential methods for highly nonlinear state space models in ecology and epidemiology., Statistical Science 31(1), 96–118.
  • Fasiolo, M. and S. N. Wood (2015). Approximate methods for dynamic ecological models., arXiv preprint arXiv:1511.02644.
  • Feuerverger, A. (1989). On the empirical saddlepoint approximation., Biometrika 76(3), 457–464.
  • Fischer, R., F. Bohn, M. D. de Paula, C. Dislich, J. Groeneveld, A. G. Gutiérrez, M. Kazmierczak, N. Knapp, S. Lehmann, S. Paulick, et al. (2016). Lessons learned from applying a forest gap model to understand ecosystem and carbon dynamics of complex tropical forests., Ecological Modelling.
  • Fukunaga, K. and L. Hostetler (1975). The estimation of the gradient of a density function, with applications in pattern recognition., IEEE Transactions on Information Theory 21(1), 32–40.
  • Gutmann, M. U. and J. Corander (2016). Bayesian optimization for likelihood-free inference of simulator-based statistical models., The Journal of Machine Learning Research 17(1), 4256–4302.
  • Hartig, F., J. M. Calabrese, B. Reineking, T. Wiegand, and A. Huth (2011). Statistical inference for stochastic simulation models–theory and application., Ecology letters 14(8), 816–827.
  • Hartig, F., C. Dislich, T. Wiegand, and A. Huth (2014). Technical note: Approximate Bayesian parameterization of a process-based tropical forest model., Biogeosciences 11, 1261–1272.
  • Ionides, E. L., A. Bhadra, Y. Atchadé, and A. King (2011). Iterated filtering., The Annals of Statistics 39(3), 1776–1802.
  • Ionides, E. L., C. Bretó, and A. A. King (2006). Inference for nonlinear dynamical systems., Proceedings of the National Academy of Sciences 103(49), 18438–18443.
  • Joe, H. (2006). Generating random correlation matrices based on partial correlations., Journal of Multivariate Analysis 97(10), 2177–2189.
  • Marjoram, P., J. Molitor, V. Plagnol, and S. Tavaré (2003). Markov chain Monte Carlo without likelihoods., Proceedings of the National Academy of Sciences 100(26), 15324–15328.
  • McCullagh, P. (1987)., Tensor methods in statistics, Volume 161. Chapman and Hall London.
  • Meeds, E. and M. Welling (2014). GPS-ABC: Gaussian process surrogate approximate Bayesian computation., arXiv preprint arXiv:1401.2838.
  • Monti, A. C. and E. Ronchetti (1993). On the relationship between empirical likelihood and empirical saddlepoint approximation for multivariate m-estimators., Biometrika 80(2), 329–338.
  • Murray, L. M., A. Lee, and P. E. Jacob (2016). Parallel resampling in the particle filter., Journal of Computational and Graphical Statistics 25(3), 789–805.
  • Newey, W. K. (1991). Uniform convergence in probability and stochastic equicontinuity., Econometrica: Journal of the Econometric Society, 1161–1167.
  • Owen, A. B. (2001)., Empirical likelihood. CRC press.
  • Prangle, D. et al. (2017). Adapting the ABC distance function., Bayesian Analysis 12(1), 289–309.
  • Price, L. F., C. C. Drovandi, A. Lee, and D. J. Nott (2017). Bayesian synthetic likelihood., Journal of Computational and Graphical Statistics, 1–11.
  • Rao, C. R. (2009)., Linear statistical inference and its applications, Volume 22. John Wiley & Sons.
  • Rencher, A. C. and W. F. Christensen (2012)., Methods of multivariate analysis, Volume 709. John Wiley & Sons.
  • Roberts, A. W. and D. E. Varberg (1973). Convex, functions.
  • Ronchetti, E. and A. H. Welsh (1994). Empirical saddlepoint approximations for multivariate M-estimators., Journal of the Royal Statistical Society. Series B (Methodological) 52(2), 313–326.
  • Rubio, F. J., A. M. Johansen, et al. (2013). A simple approach to maximum intractable likelihood estimation., Electronic Journal of Statistics 7, 1632–1654.
  • Silverman, B. W. (1986)., Density estimation for statistics and data analysis, Volume 26. CRC press.
  • Spall, J. C. (2005)., Introduction to stochastic search and optimization: estimation, simulation, and control, Volume 65. John Wiley & Sons.
  • Toni, T., D. Welch, N. Strelkowa, A. Ipsen, and M. P. Stumpf (2009). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems., Journal of the Royal Society Interface 6(31), 187–202.
  • Van der Vaart, A. W. (2000)., Asymptotic statistics, Volume 3. Cambridge university press.
  • Wang, S. (1992). General saddlepoint approximations in the bootstrap., Statistics & Probability letters 13(1), 61–66.
  • Wegmann, D., C. Leuenberger, and L. Excoffier (2009). Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood., Genetics 182(4), 1207–1218.
  • Wilkinson, R. (2014). Accelerating ABC methods using Gaussian processes. In, AISTATS, pp. 1015–1023.
  • Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems., Nature 466(7310), 1102–1104.
  • Yan, J. et al. (2007). Enjoy the joy of copulas: with a package copula., Journal of Statistical Software 21(4), 1–21.