Annales de l'Institut Henri Poincaré, Probabilités et Statistiques

New insights into Approximate Bayesian Computation

Gérard Biau, Frédéric Cérou, and Arnaud Guyader

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Approximate Bayesian Computation (ABC for short) is a family of computational techniques which offer an almost automated solution in situations where evaluation of the posterior likelihood is computationally prohibitive, or whenever suitable likelihoods are not available. In the present paper, we analyze the procedure from the point of view of $k$-nearest neighbor theory and explore the statistical properties of its outputs. We discuss in particular some asymptotic features of the genuine conditional density estimate associated with ABC, which is an interesting hybrid between a $k$-nearest neighbor and a kernel method.

Résumé

Le terme anglais « Approximate Bayesian Computation » (ABC en abrégé) désigne une famille de techniques bayésiennes ayant pour objet la simulation selon une loi de probabilité lorsque la vraisemblance a posteriori n’est pas disponible ou s’avère impossible à évaluer numériquement. Dans le présent article, nous envisageons cette procédure du point de vue de la théorie des $k$-plus proches voisins, en nous attachant plus particulièrement à examiner les propriétés statistiques des sorties de l’algorithme. Cela nous conduit à analyser le comportement asymptotique d’un estimateur de la densité conditionnelle naturellement associé à ABC, utilisé en pratique et possédant à la fois les caractéristiques d’un estimateur des $k$-plus proches voisins et celles d’une méthode à noyau.

Article information

Source
Ann. Inst. H. Poincaré Probab. Statist. Volume 51, Number 1 (2015), 376-403.

Dates
First available in Project Euclid: 14 January 2015

Permanent link to this document
http://projecteuclid.org/euclid.aihp/1421244410

Digital Object Identifier
doi:10.1214/13-AIHP590

Mathematical Reviews number (MathSciNet)
MR3300975

Zentralblatt MATH identifier
1307.62012

Subjects
Primary: 62C10: Bayesian problems; characterization of Bayes procedures 62F15: Bayesian inference 62G20: Asymptotic properties

Keywords
Approximate Bayesian Computation Nonparametric estimation Conditional density estimation Nearest neighbor methods Mathematical statistics

Citation

Biau, Gérard; Cérou, Frédéric; Guyader, Arnaud. New insights into Approximate Bayesian Computation. Ann. Inst. H. Poincaré Probab. Statist. 51 (2015), no. 1, 376--403. doi:10.1214/13-AIHP590. http://projecteuclid.org/euclid.aihp/1421244410.


Export citation

References

  • [1] I. S. Abramson. On bandwidth variation in kernel estimates – A square root law. Ann. Statist. 10 (1982) 1217–1223.
  • [2] D. M. Bashtannyk and R. J. Hyndman. Bandwidth selection for kernel conditional density estimation. Comput. Statist. Data Anal. 36 (2001) 279–298.
  • [3] M. Beaumont, J.-M. Cornuet, J.-M. Marin and C. P. Robert. Adaptive approximate Bayesian computation. Biometrika 96 (2009) 983–990.
  • [4] M. A. Beaumont, W. Zhang and D. J. Balding. Approximate Bayesian computation in population genetics. Genetics 162 (2002) 2025–2035.
  • [5] G. Biau, F. Cérou and A. Guyader. On the rate of convergence of the bagged nearest neighbor estimate. J. Mach. Learn. Res. 11 (2010) 687–712.
  • [6] M. Blum. Approximate Bayesian computation: A nonparametric perspective. J. Amer. Statist. Assoc. 105 (2010) 1178–1187.
  • [7] L. Breiman, W. Meisel and E. Purcell. Variable kernel estimates of multivariate densities. Technometrics 19 (1977) 135–144.
  • [8] F. Cérou and A. Guyader. Nearest neighbor classification in infinite dimension. ESAIM Probab. Stat. 10 (2006) 340–355.
  • [9] T. M. Cover. Estimation by the nearest neighbor rule. IEEE Trans. Inform. Theory 14 (1968) 50–55.
  • [10] M. de Guzmán. Differentiation of Integrals in $\mathbb{R}^{n}$. Lecture Notes in Mathematics 481. Springer, Berlin, 1975.
  • [11] L. Devroye. Necessary and sufficient conditions for the pointwise convergence of nearest neighbor regression function estimates. Z. Wahrsch. Verw. Gebiete 61 (1982) 467–481.
  • [12] L. Devroye and A. Krzyżak. New multivariate product density estimates. J. Multivariate Anal. 82 (2002) 88–110.
  • [13] L. Devroye, L. Györfi and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, New York, 1996.
  • [14] J. Fan and T. H. Yim. A crossvalidation method for estimating conditional densities. Biometrika 91 (2004) 819–834.
  • [15] O. P. Faugeras. A quantile-copula approach to conditional density estimation. J. Multivariate Anal. 100 (2009) 2083–2099.
  • [16] P. Fearnhead and D. Prangle. Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation. J. Roy. Statist. Soc. Ser. B 74 (2012) 419–474.
  • [17] E. Fix and J. L. Hodges. Discriminatory analysis – Nonparametric discrimination: Consistency properties. Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolph Field, TX, 1951.
  • [18] Y. X. Fu and W. H. Li. Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14 (1997) 195–199.
  • [19] L. Györfi and M. Kohler. Nonparametric estimation of conditional distributions. IEEE Trans. Inform. Theory 53 (2007) 1872–1879.
  • [20] P. Hall and J. S. Marron. Variable window width kernel estimates of probability densities. Probab. Theory Related Fields 80 (1988) 37–49.
  • [21] P. Hall, J. Racine and Q. Li. Cross-validation and the estimation of conditional probability densities. J. Amer. Statist. Assoc. 99 (2004) 1015–1026.
  • [22] B. H. Hansen. Nonparametric conditional density estimation. Technical report, Univ. Wisconsin, 2004.
  • [23] G. H. Hardy, J. E. Littlewood and G. Pólya. Inequalities. Cambridge Univ. Press, Cambridge, 1988.
  • [24] W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 (1970) 97–109.
  • [25] R. J. Hyndman, D. M. Bashtannyk and G. K. Grunwald. Estimating and visualizing conditional densities. J. Comput. Graph. Statist. 5 (1996) 315–336.
  • [26] B. Jessen, J. Marcinkiewicz and A. Zygmund. Note on the differentiability of multiple integrals. Fund. Math. 25 (1935) 217–234.
  • [27] M. C. Jones. Variable kernel density estimates and variable kernel density estimates. Aust. J. Stat. 32 (1990) 361–371.
  • [28] P. Joyce and P. Marjoran. Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. 7 (2008) Art. ID 26.
  • [29] E. Kaufmann and R.-D. Reiss. On conditional distributions of nearest neighbors. J. Multivariate Anal. 42 (1992) 67–76.
  • [30] D. O. Loftsgaarden and C. P. Quesenberry. A nonparametric estimate of a multivariate density function. Ann. Math. Statist. 36 (1965) 1049–1051.
  • [31] Y. P. Mack and M. Rosenblatt. Multivariate $k$-nearest neighbor density estimates. J. Multivariate Anal. 9 (1979) 1–15.
  • [32] J. M. Marin and C. P. Robert. Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New York, 2007.
  • [33] J. M. Marin, N. Pillai, C. P. Robert and J. Rousseau. Relevant statistics for Bayesian model choice. J. R. Stat. Soc. Ser B. To appear, 2014.
  • [34] J. M. Marin, P. Pudlo, C. P. Robert and R. Ryder. Approximate Bayesian computational methods. Stat. Comput. 22 (2012) 1167–1180.
  • [35] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E. Teller. Equations of state calculations by fast computing machines. J. Chem. Phys. 21 (1953) 1087–1091.
  • [36] D. S. Moore and J. W. Yackel. Consistency properties of nearest neighbor density function estimators. Ann. Statist. 5 (1977) 143–154.
  • [37] D. S. Moore and J. W. Yackel. Large sample properties of nearest neighbor density function estimators. In Statistical Decision Theory and Related Topics II: Proceedings of a Symposium Held at Purdue University, May 17–19, 1976, S. S. Gupta and D. S. Moore (Eds) 269–279. Academic Press, New York, 1977.
  • [38] E. A. Nadaraya. On estimating regression. Theory Probab. Appl. 9 (1964) 141–142.
  • [39] E. A. Nadaraya. On nonparametric estimates of density functions and regression curves. Theory Probab. Appl. 10 (1965) 186–190.
  • [40] E. Parzen. On the estimation of a probability density function and the mode. Ann. Math. Statist. 33 (1962) 1065–1076.
  • [41] J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun and M. W. Feldman. Population growth of human Y chromosomes: A study of Y chromosome microsatellites. Mol. Biol. Evol. 16 (1999) 1791–1798.
  • [42] B. D. Ripley. Stochastic Simulation. Wiley, New York, 1982.
  • [43] C. P. Robert and G. Casella. Monte Carlo Statistical Methods, 2nd edition. Springer, New York, 2004.
  • [44] C. P. Robert, J.-M. Cornuet, J.-M. Marin and N. S. Pillai. Lack of confidence in approximate Bayesian computation model choice. Proc. Natl. Acad. Sci. USA 108 (2011) 15112–15117.
  • [45] M. Rosenblatt. Conditional probability density and regression estimates. In Multivariate Analysis II, P. R. Krishnaiah (Ed.) 25–31. Academic Press, New York, 1969.
  • [46] R. M. Royall. A class of non-parametric estimates of a smooth regression function. Ph.D. thesis, Stanford Univ., 1966.
  • [47] D. Rubin. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 (1984) 1151–1172.
  • [48] S. A. Sisson, Y. Fan and M. M. Tanaka. Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104 (2007) 1760–1765.
  • [49] E. M. Stein. Singular Integrals and Differentiability Properties of Functions. Princeton Univ. Press, Princeton, 1970.
  • [50] C. J. Stone. Consistent nonparametric regression. Ann. Statist. 5 (1977) 595–645.
  • [51] S. Tavaré, D. Balding, R. Griffith and P. Donnelly. Inferring coalescence times from DNA sequence data. Genetics 145 (1997) 505–518.
  • [52] G. S. Watson. Smooth regression analysis. Sankhya A 26 (1964) 359–372.
  • [53] R. L. Wheeden and A. Zygmund. Measure and Integral. An Introduction to Real Analysis. Marcel Dekker, New York, 1977.
  • [54] R. D. Wilkinson. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12 (2008) 129–141.
  • [55] A. Zygmund. Trigonometric Series, Vol. II. Cambridge Univ. Press, Cambridge, 1959.