## Electronic Journal of Statistics

### Asymptotically exact inference in differentiable generative models

#### Abstract

Many generative models can be expressed as a differentiable function applied to input variables sampled from a known probability distribution. This framework includes both the generative component of learned parametric models such as variational autoencoders and generative adversarial networks, and also procedurally defined simulator models which involve only differentiable operations. Though the distribution on the input variables to such models is known, often the distribution on the output variables is only implicitly defined. We present a method for performing efficient Markov chain Monte Carlo inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where approximate Bayesian computation might otherwise be employed. We use the intuition that computing conditional expectations is equivalent to integrating over a density defined on the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to move between inputs exactly consistent with observations. We validate the method by performing inference experiments in a diverse set of models.

#### Article information

Source
Electron. J. Statist., Volume 11, Number 2 (2017), 5105-5164.

Dates
First available in Project Euclid: 15 December 2017

https://projecteuclid.org/euclid.ejs/1513306869

Digital Object Identifier
doi:10.1214/17-EJS1340SI

Mathematical Reviews number (MathSciNet)
MR3738207

Zentralblatt MATH identifier
1380.65025

Subjects
Primary: 65C05: Monte Carlo methods
Secondary: 62F15: Bayesian inference

#### Citation

Graham, Matthew M.; Storkey, Amos J. Asymptotically exact inference in differentiable generative models. Electron. J. Statist. 11 (2017), no. 2, 5105--5164. doi:10.1214/17-EJS1340SI. https://projecteuclid.org/euclid.ejs/1513306869

#### References

• [1] I. Akhter and M. J. Black. Pose-conditioned joint angle limits for 3D human pose reconstruction. In, IEEE Conference on Computer Vision and Pattern Recognition, 2015.
• [2] D. Allingham, R. King, and K. L. Mengersen. Bayesian estimation of quantile distributions., Statistics and Computing, 19(2):189–201, 2009.
• [3] H. C. Andersen. RATTLE: A velocity version of the SHAKE algorithm for molecular dynamics calculations., Journal of Computational Physics, 1983.
• [4] C. Andrieu and G. O. Roberts. The pseudo-marginal approach for efficient Monte Carlo computations., The Annals of Statistics, 2009.
• [5] C. P. Barnes, S. Filippi, M. P. H. Stumpf, and T. Thorne. Considerate approaches to constructing summary statistics for ABC model selection., Statistics and Computing, 22(6) :1181–1197, 2012.
• [6] E. Barth, K. Kuczera, B. Leimkuhler, and R. D. Skeel. Algorithms for constrained molecular dynamics., Journal of computational chemistry, 1995.
• [7] S. Barthelmé and N. Chopin. Expectation propagation for likelihood-free inference., Journal of the American Statistical Association, 109(505):315–333, 2014.
• [8] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind. Automatic differentiation in machine learning: a survey., arXiv preprint arXiv :1502.05767, 2015.
• [9] M. A. Beaumont, J.-M. Cornuet, J.-M. Marin, and C. P. Robert. Adaptive approximate Bayesian computation., Biometrika, 96(4):983–990, 2009.
• [10] M. A. Beaumont, W. Zhang, and D. J. Balding. Approximate Bayesian computation in population genetics., Genetics, 2002.
• [11] S. Behnel, R. Bradshaw, C. Citro, L. Dalcin, D. S. Seljebotn, and K. Smith. Cython: The best of both worlds., Computing in Science & Engineering, 13(2):31–39, 2011.
• [12] M. Betancourt. The fundamental incompatibility of scalable Hamiltonian Monte Carlo and naive data subsampling. In, Proceedings of the 32nd International Conference on Machine Learning, 2015.
• [13] M. Betancourt, S. Byrne, and M. Girolami. Optimizing the integrator step size for Hamiltonian Monte Carlo., arXiv preprint arXiv :1411.6669, 2014.
• [14] M. Betancourt and M. Girolami. Hamiltonian Monte Carlo for hierarchical models., Current trends in Bayesian methodology with applications, 79:30, 2015.
• [15] M. Bigerelle, D. Najjar, B. Fournier, N. Rupin, and A. Iost. Application of lambda distributions and bootstrap analysis to the prediction of fatigue lifetime and confidence intervals., International Journal of Fatigue, 28(3):223–236, 2006.
• [16] M. G. Blum. Approximate Bayesian computation: a nonparametric perspective., Journal of the American Statistical Association, 105(491) :1178–1187, 2010.
• [17] M. G. Blum, M. A. Nunes, D. Prangle, and S. A. Sisson. A comparative review of dimension reduction methods in approximate Bayesian computation., Statistical Science, 28(2):189–208, 2013.
• [18] G. Bonnet. Transformations des signaux aléatoires a travers les systemes non linéaires sans mémoire., Annals of Telecommunications, 19(9):203–220, 1964.
• [19] M. A. Brubaker, M. Salzmann, and R. Urtasun. A family of MCMC methods on implicitly defined manifolds. In, International Conference on Artificial Intelligence and Statistics, 2012.
• [20] S. Byrne and M. Girolami. Geodesic Monte Carlo on embedded manifolds., Scandinavian Journal of Statistics, 2013.
• [21] T. Chen, E. Fox, and C. Guestrin. Stochastic gradient Hamiltonian Monte Carlo. In, Proceedings of the 31st International Conference on Machine Learning, 2014.
• [22] T. Christensen, A. Hurn, and K. Lindsay. The devil is in the detail: hints for practical optimisation., Economic Analysis and Policy, 38(2):345–368, 2008.
• [23] C. J. Corrado et al. Option pricing based on the generalized lambda distribution., Journal of Futures Markets, 21(3):213–236, 2001.
• [24] J. Dahlin, F. Lindsten, J. Kronander, and T. B. Schön. Accelerating pseudo-marginal Metropolis-Hastings by correlating auxiliary variables., arXiv preprint arXiv :1511.05483, 2015.
• [25] G. Deligiannidis, A. Doucet, M. K. Pitt, and R. Kohn. The correlated pseudo-marginal method., arXiv preprint arXiv :1511.04992, 2015.
• [26] P. Diaconis, S. Holmes, and M. Shahshahani. Sampling from a manifold. In, Advances in Modern Statistical Theory and Applications, pages 102–125. Institute of Mathematical Statistics, 2013.
• [27] P. J. Diggle and R. J. Gratton. Monte Carlo methods of inference for implicit statistical models., Journal of the Royal Statistical Society. Series B (Methodological), pages 193–227, 1984.
• [28] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid Monte Carlo., Physics Letters B, 1987.
• [29] V. A. Epanechnikov. Non-parametric estimation of a multivariate probability density., Theory of Probability & Its Applications, 14(1):153–158, 1969.
• [30] H. Federer., Geometric measure theory. Springer, 2014.
• [31] M. Freimer, G. Kollia, G. S. Mudholkar, and C. T. Lin. A study of the generalized Tukey lambda family., Communications in Statistics-Theory and Methods, 17(10) :3547–3567, 1988.
• [32] B. J. Frey. Extending factor graphs so as to unify directed and undirected graphical models. In, Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence, pages 257–264. Morgan Kaufmann Publishers Inc., 2002.
• [33] Y.-X. Fu and W.-H. Li. Estimating the age of the common ancestor of a sample of DNA sequences., Molecular biology and evolution, 14(2):195–199, 1997.
• [34] A. Gelman, D. Lee, and J. Guo. Stan: A probabilistic programming language for bayesian inference and optimization., Journal of Educational and Behavioral Statistics, 40(5):530–543, 2015.
• [35] A. Gelman and D. B. Rubin. Inference from iterative simulation using multiple sequences., Statistical science, pages 457–472, 1992.
• [36] W. Gilchrist., Statistical Modelling with Quantile Functions. CRC Press, 2000.
• [37] M. Girolami and B. Calderhead. Riemann-manifold Langevin and Hamiltonian Monte Carlo methods., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123–214, 2011.
• [38] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In, Advances in Neural Information Processing Systems, 2014.
• [39] C. C. Gordon, T. Churchill, C. E. Clauser, B. Bradtmiller, J. T. McConville, I. Tebbets, and R. A. Walker. Anthropometric survey of US army personell: Final report. Technical report, United States Army, 1988.
• [40] C. Gourieroux, A. Monfort, and E. Renault. Indirect inference., Journal of applied econometrics, 8(S1):S85–S118, 1993.
• [41] C. Hartmann and C. Schutte. A constrained hybrid Monte-Carlo algorithm and the problem of calculating the free energy in several variables., ZAMM-Zeitschrift fur Angewandte Mathematik und Mechanik, 2005.
• [42] C. Hastings Jr, F. Mosteller, J. W. Tukey, and C. P. Winsor. Low moments for small samples: a comparative study of order statistics., The Annals of Mathematical Statistics, pages 413–426, 1947.
• [43] M. D. Hoffman and A. Gelman. The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo., Journal of Machine Learning Research, 2014.
• [45] R. Kindermann and L. Snell., Markov random fields and their applications. American Mathematical Society, 1980.
• [46] D. P. Kingma and M. Welling. Auto-encoding variational Bayes. In, Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2013.
• [47] P. Kloeden and E. Platen., Numerical Solution of Stochastic Differential Equations. Applications of Mathematics. Springer-Verlag, 1992.
• [48] B. Leimkuhler and C. Matthews. Efficient molecular dynamics using geodesic integration and solvent–solute splitting. In, Proc. R. Soc. A. The Royal Society, 2016.
• [49] B. Leimkuhler and G. W. Patrick. A symplectic integrator for Riemannian manifolds., Journal of Nonlinear Science, 6(4):367–384, 1996.
• [50] B. Leimkuhler and S. Reich., Simulating Hamiltonian dynamics. Cambridge University Press, 2004.
• [51] B. J. Leimkuhler and R. D. Skeel. Symplectic numerical integrators in constrained Hamiltonian systems., Journal of Computational Physics, 1994.
• [52] T. Lelièvre, M. Rousset, and G. Stoltz. Langevin dynamics with constraints and computation of free energy differences., Mathematics of computation, 2012.
• [53] F. Lindsten and A. Doucet. Pseudo-marginal hamiltonian monte carlo., arXiv preprint arXiv :1607.02516, 2016.
• [54] S. Linnainmaa. Taylor expansion of the accumulated rounding error., BIT Numerical Mathematics, 16(2):146–160, 1976.
• [55] D. J. MacKay., Information theory, inference and learning algorithms. Cambridge University Press, 2003.
• [56] J.-M. Marin, P. Pudlo, C. P. Robert, and R. J. Ryder. Approximate Bayesian computational methods., Statistics and Computing, 2012.
• [57] P. Marjoram, J. Molitor, V. Plagnol, and S. Tavaré. Markov chain Monte Carlo without likelihoods., Proceedings of the National Academy of Sciences, 2003.
• [58] R. I. McLachlan, K. Modin, O. Verdier, and M. Wilkins. Geometric generalisations of SHAKE and RATTLE., Foundations of Computational Mathematics, 14(2):339–370, 2014.
• [59] R. McVinish. Improving abc for quantile distributions., Statistics and Computing, 22(6) :1199–1207, 2012.
• [60] E. Meeds, R. Leenders, and M. Welling. Hamiltonian ABC. In, Proceedings of 31st Conference of Uncertainty in Artificial Intelligence, 2015.
• [61] T. Meeds and M. Welling. Optimization Monte Carlo: Efficient and embarrassingly parallel likelihood-free inference. In, Advances in Neural Information Processing Systems, 2015.
• [62] S. Mohamed and B. Lakshminarayanan. Learning in implicit generative models. In, Proceedings of the International Conference on Learning Representations, 2017.
• [63] J. J. Moré, B. S. Garbow, and K. E. Hillstrom., User Guide for MINPACK-1. ANL-80-74, Argonne National Laboratory, 1980.
• [64] I. Murray. Differentiation of the Cholesky decomposition., arXiv preprint arXiv :1602.07527, 2016.
• [65] I. Murray and R. P. Adams. Slice sampling covariance hyperparameters of latent Gaussian models. In, Advances in Neural Information Processing Systems, 2010.
• [66] I. Murray, R. P. Adams, and D. J. MacKay. Elliptical slice sampling. In, The Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, volume 9 of JMLR: W&CP, pages 541–548, 2010.
• [67] I. Murray and M. Graham. Pseudo-marginal slice sampling. In, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 911–919, 2016.
• [68] R. M. Neal., MCMC using Hamiltonian dynamics, chapter 5, pages 113–162. Chapman & Hall/CRC, 2011.
• [69] A. Öztürk and R. Dale. A study of fitting the generalized lambda distribution to solar radiation data., Journal of Applied Meteorology, 21(7):995 –1004, 1982.
• [70] S. Pal. Evaluation of nonnormal process capability indices using generalized lambda distribution., Quality Engineering, 17(1):77–85, 2004.
• [71] G. Papamakarios and I. Murray. Fast $\epsilon$-free inference of simulation models with Bayesian conditional density estimation., Advances in Neural Information Processing Systems 29, 2016.
• [72] O. Papaspiliopoulos, G. O. Roberts, and M. Sköld. Non-centered parameterisations for hierarchical models and data augmentation. In, Bayesian Statistics 7: Proceedings of the Seventh Valencia International Meeting, volume 307. Oxford University Press, USA, 2003.
• [73] O. Papaspiliopoulos, G. O. Roberts, and M. Sköld. A general framework for the parametrization of hierarchical models., Statistical Science, pages 59–73, 2007.
• [74] J. Pearl., Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, 1988.
• [75] M. Plummer, N. Best, K. Cowles, and K. Vines. CODA: Convergence diagnosis and output analysis for MCMC., R News, 6(1):7–11, 2006.
• [76] M. J. D. Powell., Numerical Methods for Nonlinear Algebraic Equations, chapter A Hybrid Method for Nonlinear Equations. Gordon and Breach, 1970.
• [77] D. Prangle. Summary statistics in approximate Bayesian computation., arXiv preprint arXiv :1512.05633, 2015.
• [78] R. Price. A useful theorem for nonlinear devices having Gaussian inputs., IRE Transactions on Information Theory, 4(2):69–72, 1958.
• [79] J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun, and M. W. Feldman. Population growth of human Y chromosomes: a study of Y chromosome microsatellites., Molecular biology and evolution, 16(12) :1791–1798, 1999.
• [80] J. S. Ramberg and B. W. Schmeiser. An approximate method for generating asymmetric random variables., Communications of the ACM, 17(2):78–82, 1974.
• [81] O. Ratmann, C. Andrieu, C. Wiuf, and S. Richardson. Model criticism based on likelihood-free inference, with an application to protein network evolution., Proceedings of the National Academy of Sciences, 2009.
• [82] D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In, Proceedings of The 31st International Conference on Machine Learning, pages 1278–1286, 2014.
• [83] C. P. Robert, K. Mengersen, and C. Chen. Model choice versus model criticism., Proceedings of the National Academy of Sciences of the United States of America, 2010.
• [84] D. B. Rubin. Bayesianly justifiable and relevant frequency calculations for the applied statistician., The Annals of Statistics, 12(4) :1151–1172, 1984.
• [85] J. Salvatier, T. V. Wiecki, and C. Fonnesbeck. Probabilistic programming in Python using PyMC3., PeerJ Computer Science, 2016.
• [86] S. A. Sisson and Y. Fan., Likelihood-free MCMC, chapter 12, pages 313–333. Chapman & Hall/CRC, 2011.
• [87] S. A. Sisson, Y. Fan, and M. M. Tanaka. Sequential Monte Carlo without likelihoods., Proceedings of the National Academy of Sciences, 104(6) :1760–1765, 2007.
• [88] J. C. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation., IEEE transactions on automatic control, 37(3):332–341, 1992.
• [89] B. Speelpenning., Compiling Fast Partial Derivatives of Functions Given by Algorithms. PhD thesis, University of Illinois at Urbana-Champaign, 1980.
• [90] S. Tavaré, D. J. Balding, R. C. Griffiths, and P. Donnelly. Inferring coalescence times from DNA sequence data., Genetics, 145(2):505–518, 1997.
• [91] Theano development team. Theano: A Python framework for fast computation of mathematical expressions., arXiv e-prints, abs /1605.02688, 2016.
• [92] T. Toni, D. Welch, N. Strelkowa, A. Ipsen, and M. P. Stumpf. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems., Journal of the Royal Society Interface, 6(31):187–202, 2009.
• [93] D. Tran, R. Ranganath, and D. M. Blei. Deep and hierarchical implicit models., arXiv preprint arXiv :1702.08896, 2017.
• [94] M.-N. Tran, D. J. Nott, and R. Kohn. Variational bayes with intractable likelihood., Journal of Computational and Graphical Statistics, 2017.
• [95] J. W. Tukey. Practical relationship between the common transformations of percentages or fractions and of amounts. Technical Report 36, Statistical Research Group,Princeton, 1960.
• [96] G. Weiss and A. von Haeseler. Inference of population history using a likelihood approach., Genetics, 149(3) :1539–1546, 1998.
• [97] M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In, Proceedings of the 28th International Conference on Machine Learning, 2011.
• [98] R. D. Wilkinson. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error., Statistical applications in genetics and molecular biology, 2013.
• [99] S. N. Wood. Statistical inference for noisy nonlinear ecological dynamic systems., Nature, 466 (7310):1102–1104, 2010.
• [100] E. Zappa, M. Holmes-Cerfon, and J. Goodman. Monte Carlo on manifolds: sampling densities and integrating functions., arXiv preprint arXiv :1702.08446, 2017.