Electronic Journal of Statistics

Brittleness of Bayesian inference under finite information in a continuous world

Houman Owhadi, Clint Scovel, and Tim Sullivan

Full-text: Open access

Abstract

We derive, in the classical framework of Bayesian sensitivity analysis, optimal lower and upper bounds on posterior values obtained from Bayesian models that exactly capture an arbitrarily large number of finite-dimensional marginals of the data-generating distribution and/or that are as close as desired to the data-generating distribution in the Prokhorov or total variation metrics; these bounds show that such models may still make the largest possible prediction error after conditioning on an arbitrarily large number of sample data measured at finite precision. These results are obtained through the development of a reduction calculus for optimization problems over measures on spaces of measures. We use this calculus to investigate the mechanisms that generate brittleness/robustness and, in particular, we observe that learning and robustness are antagonistic properties. It is now well understood that the numerical resolution of PDEs requires the satisfaction of specific stability conditions. Is there a missing stability condition for using Bayesian inference in a continuous world under finite information?

Article information

Source
Electron. J. Statist. Volume 9, Number 1 (2015), 1-79.

Dates
First available in Project Euclid: 2 February 2015

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1422885673

Digital Object Identifier
doi:10.1214/15-EJS989

Mathematical Reviews number (MathSciNet)
MR3306570

Zentralblatt MATH identifier
1305.62123

Subjects
Primary: 62F15: Bayesian inference 62G35: Robustness
Secondary: 62A01: Foundations and philosophical topics 62E20: Asymptotic distribution theory 62F12: Asymptotic properties of estimators 62G20: Asymptotic properties

Keywords
Bayesian inference misspecification robustness uncertainty quantification optimal uncertainty quantification

Citation

Owhadi, Houman; Scovel, Clint; Sullivan, Tim. Brittleness of Bayesian inference under finite information in a continuous world. Electron. J. Statist. 9 (2015), no. 1, 1--79. doi:10.1214/15-EJS989. https://projecteuclid.org/euclid.ejs/1422885673


Export citation

References

  • [1] Abraham, C. and Cadre, B. (2002). Asymptotic properties of posterior distributions derived from misspecified models., C. R. Math. Acad. Sci. Paris 335 495–498.
  • [2] Abraham, C. and Cadre, B. (2008). Concentration of posterior distributions with misspecified models., Ann. I.S.U.P. 52 3–14.
  • [3] Akhiezer, N. I. (1965)., The Classical Moment Problem and Some Related Questions in Analysis. Hafner Publishing Co., New York. Translated by N. Kemmer.
  • [4] Aliprantis, C. D. and Border, K. C. (2006)., Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed. Springer, Berlin.
  • [5] Arveson, W. (1976)., An Invitation to C*-Algebras. Springer-Verlag, New York.
  • [6] Ash, R. B. (1972)., Real Analysis and Probability. Probability and Mathematical Statistics, No. 11. Academic Press, New York.
  • [7] Aumann, R. J. (1967). Measurable utility and the measurable choice theorem., La décision C.N.R.S. 15–26.
  • [8] Bahadur, R. R. and Savage, L. J. (1956). The nonexistence of certain statistical procedures in nonparametric problems., Ann. Math. Statist. 27 1115–1122.
  • [9] Barron, A., Schervish, M. J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems., Ann. Statist. 27 536–561.
  • [10] Bauer, H. (2001)., Measure and Integration Theory. de Gruyter Studies in Mathematics 26. Walter de Gruyter & Co., Berlin. Translated from the German by Robert B. Burckel.
  • [11] Bayarri, M. J. and Berger, J. O. (2004). The interplay of Bayesian and frequentist analysis., Statist. Sci. 19 58–80.
  • [12] Belot, G. (2013). Bayesian orgulity., Philos. Sci. 80 483–503.
  • [13] Belot, G. (2013). Failure of calibration is typical., arXiv:1306.4943.
  • [14] Berger, J. (2006). The case for objective Bayesian analysis., Bayesian Anal. 1 385–402.
  • [15] Berger, J. O. (1984). The robust Bayesian viewpoint. In, Robustness of Bayesian Analyses. Stud. Bayesian Econometrics 4 63–144. North-Holland, Amsterdam. With comments and with a reply by the author.
  • [16] Berger, J. O. (1994). An overview of robust Bayesian analysis., Test 3 5–124. With comments and a rejoinder by the author.
  • [17] Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect., Ann. Math. Statist. 37 51–58; correction, ibid 37 745–746.
  • [18] Berk, R. H. (1970). Consistency a posteriori., Ann. Math. Statist. 41 894–906.
  • [19] Bernšteĭn, S. N. (1964)., Sobranie sochinenii. Tom IV: Teoriya veroyatnostei. Matematicheskaya statistika. 1911–1946. Izdat. “Nauka”, Moscow.
  • [20] Bertsimas, D. and Popescu, I. (2005). Optimal inequalities in probability theory: a convex optimization approach., SIAM J. Optim. 15 780–804 (electronic).
  • [21] Blei, D. M., Jordan, M. I. and Ng, A. Y. (2003). Hierarchical Bayesian models for applications in information retrieval. In, Bayesian Statistics, 7 (Tenerife, 2002) 25–43. Oxford Univ. Press, New York.
  • [22] Bogachev, V. I. (2007)., Measure Theory. Vol. II. Springer-Verlag, Berlin.
  • [23] Bogachev, V. I. (2007)., Measure Theory. Vol. I. Springer-Verlag, Berlin.
  • [24] Boole, G. (1854)., An Investigation of the Laws of Thought on Which Are Founded the Mathematical Theories of Logic and Probabilities. Walton and Maberly, London.
  • [25] Box, G. E. P. (1953). Non-normality and tests on variances., Biometrika 40 318–335.
  • [26] Box, G. E. P. and Draper, N. R. (1987)., Empirical Model-Building and Response Surfaces. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons Inc., New York.
  • [27] Boyd, S. and Vandenberghe, L. (2004)., Convex Optimization. Cambridge University Press, Cambridge.
  • [28] Breiman, L., Le Cam, L. and Schwartz, L. (1964). Consistent estimates and zero-one sets., Ann. Math. Statist. 35 157–161.
  • [29] Castaing, C. and Valadier, M. (1977)., Convex Analysis and Measurable Multifunctions. Lecture Notes in Mathematics, Vol. 580. Springer-Verlag, Berlin.
  • [30] Castillo, I. and Nickl, R. (2013). Nonparametric Bernstein–von Mises theorems in Gaussian white noise., Ann. Statist. 41 1999–2028.
  • [31] Clarke, B. (2004). Comparing Bayes model averaging and stacking when model approximation error cannot be ignored., J. Mach. Learn. Res. 4 683–712.
  • [32] Cox, D. D. (1993). An analysis of Bayesian inference for nonparametric regression., Ann. Statist. 21 903–923.
  • [33] Daley, D. J. and Vere-Jones, D. (2008)., An Introduction to the Theory of Point Processes. Vol. II, second ed. Probability and Its Applications (New York). Springer, New York. General theory and structure.
  • [34] Dellacherie, C. and Meyer, P. A. (1975)., Probabilités et Potentiel. Hermann, Paris. Chapitres I à IV, Édition entièrement refondue, Publications de l’Institut de Mathématique de l’Université de Strasbourg, No. XV, Actualités Scientifiques et Industrielles, No. 1372.
  • [35] Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates., Ann. Statist. 14 1–67. With a discussion and a rejoinder by the authors.
  • [36] Diaconis, P. W. and Freedman, D. (1998). Consistency of Bayes estimates for nonparametric regression: normal theory., Bernoulli 4 411–444.
  • [37] Donoho, D. L. (1988). One-sided inference about functionals of a density., Ann. Statist. 16 1390–1420.
  • [38] Doob, J. L. (1949). Application of the theory of martingales. In, Le Calcul des Probabilités et ses Applications. Colloques Internationaux du Centre National de la Recherche Scientifique, no. 13 23–27. Centre National de la Recherche Scientifique, Paris.
  • [39] Doob, J. L. (1994)., Measure Theory. Graduate Texts in Mathematics 143. Springer-Verlag, New York.
  • [40] Dudley, R. M. (2002)., Real Analysis and Probability. Cambridge Studies in Advanced Mathematics 74. Cambridge University Press, Cambridge. Revised reprint of the 1989 original.
  • [41] Edwards, A. W. F. (1992)., Likelihood, expanded ed. Johns Hopkins University Press, Baltimore.
  • [42] Edwards, W., Lindman, H. and Savage, L. J. (1963). Bayesian statistical inference for psychological research., Psychological Review 70 193.
  • [43] Efron, B. (2013). Bayes’ Theorem in the 21st Century., Science 340 1177–1178.
  • [44] England andof Appeal (Civil Division), W. C. (2013). Nulty & Ors v. Milton Keynes Borough Council. [2013] EWCA Civ 15, Case No. A1/2012/0459. http://www.bailii.org/ew/cases/EWCA/Civ/, 2013/15.html.
  • [45] Feldman, J. (1958). Equivalence and perpendicularity of Gaussian processes., Pacific J. Math. 8 699–708.
  • [46] Forrester, P. J. and Warnaar, S. O. (2008). The importance of the Selberg integral., Bull Amer. Math. Soc. 45 489–534.
  • [47] Freedman, D. (1999). On the Bernstein-von Mises theorem with infinite-dimensional parameters., Ann. Statist. 27 1119–1140.
  • [48] Freedman, D. A. (1963). On the asymptotic behavior of Bayes’ estimates in the discrete case., Ann. Math. Statist. 34 1386–1403.
  • [49] Freedman, D. A. (1965). On the asymptotic behavior of Bayes estimates in the discrete case. II., Ann. Math. Statist. 36 454–456.
  • [50] Fushiki, T. (2005). Bootstrap prediction and Bayesian prediction under misspecified models., Bernoulli 11 747–758.
  • [51] Gelman, A. (2008). Objections to Bayesian statistics., Bayesian Anal. 3 445–449.
  • [52] Ghosal, S. (2010). The Dirichlet process, related priors and posterior asymptotics. In, Bayesian Nonparametrics. Camb. Ser. Stat. Probab. Math. 35–79. Cambridge Univ. Press, Cambridge.
  • [53] Grünwald, P. D. (2006). Bayesian inconsistency under misspecification., http://homepages.cwi.nl/~pdg/ftp/valenciapost.pdf.
  • [54] Gustafson, P. and Wasserman, L. (1995). Local sensitivity diagnostics for Bayesian inference., Ann. Statist. 23 2153–2167.
  • [55] Guyon, I., Saffari, A., Dror, G. and Cawley, G. (2010). Model selection: beyond the Bayesian/Frequentist divide., J. Mach. Learn. Res. 11 61–87.
  • [56] Hájek, J. (1958). On a property of normal distribution of any stochastic process., Czechoslovak Math. J. 8(83) 610–618.
  • [57] Hausman, J. A. and Taylor, W. E. (1981). A generalized specification test., Econom. Lett. 8 239–245.
  • [58] Huber, P. J. (1964). Robust estimation of a location parameter., Ann. Math. Statist. 35 73–101.
  • [59] Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In, Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics 221–233. Univ. California Press, Berkeley, Calif.
  • [60] Huber, P. J. and Ronchetti, E. M. (2009)., Robust Statistics, second ed. Wiley Series in Probability and Statistics. John Wiley & Sons Inc., Hoboken, NJ.
  • [61] Johnstone, I. M. (2010). High dimensional Bernstein–von Mises: simple examples. In, Borrowing Strength: Theory Powering Applications—A Festschrift for Lawrence D. Brown. Inst. Math. Stat. Collect. 6 87–98. Inst. Math. Statist., Beachwood, OH.
  • [62] Kallenberg, O. (1975)., Random Measures. Akademie-Verlag, Berlin. Schriftenreihe des Zentralinstituts für Mathematik und Mechanik bei der Akademie der Wissenschaften der DDR, Heft 23.
  • [63] Kechris, A. S. (1995)., Classical Descriptive Set Theory. Graduate Texts in Mathematics. Springer-Verlag, New York.
  • [64] Kendall, D. G. (1962). Simplexes and vector lattices., J. London Math. Soc. 37 365–371.
  • [65] Keynes, J. M. (1921)., A Treatise on Probability. Macmillan and Co., London.
  • [66] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics., Ann. Statist. 34 837–877.
  • [67] Kleijn, B. J. K. and van der Vaart, A. W. (2012). The Bernstein-Von-Mises theorem under misspecification., Electron. J. Stat. 6 354–381.
  • [68] Kuznetsov, V. P. (1991)., Intervalnye Statisticheskie Modeli [Interval Statistical Models]. “Radio i Svyaz’ ”, Moscow.
  • [69] Le Cam, L. (1953). On some asymptotic properties of maximum likelihood estimates and related Bayes’ estimates., Univ. California Publ. Statist. 1 277–329.
  • [70] Le Cam, L. and Schwartz, L. (1960). A necessary and sufficient condition for the existence of consistent estimates., The Annals of Mathematical Statistics 140–150.
  • [71] Leahu, H. (2011). On the Bernstein-von Mises phenomenon in the Gaussian white noise model., Electron. J. Stat. 5 373–404.
  • [72] Lindley, D. V. (1985)., Making Decisions, second ed. John Wiley & Sons, Ltd., London.
  • [73] Malakoff, D. (1999). Bayes offers a ‘new’ way to make sense of numbers., Science 286 1460–1464.
  • [74] Martin, R. and Hong, L. (2012). On convergence rates of Bayesian predictive densities and posterior distributions., arXiv:1210.0103v1.
  • [75] Mayo, D. G. (2012). How can we cultivate Senn’s ability?, RMM 3 14– 18.
  • [76] Mayo, D. G. (2012). Statistical Science and Philosophy of Science Part 2: Shallow versus Deep Explorations., RMM 3.
  • [77] Mayo, D. G. and Spanos, A. (2004). Methodology in practice: statistical misspecification testing., Philos. Sci. 71 1007–1025 (2005).
  • [78] McGrayne, S. B. (2012)., The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy. Yale University Press.
  • [79] Nickl, R. (2013). Statistical Theory. http://www.statslab.cam., ac.uk/˜nickl/Site/__files/stat2013.pdf.
  • [80] Owhadi, H. and Scovel, C. (2013). Brittleness of Bayesian inference and new Selberg formulas. Preprint at, arXiv:1304.7046.
  • [81] Owhadi, H. and Scovel, C. (2014). Qualitative Robustness in Bayesian Inference. Preprint at, arXiv:1411.3984.
  • [82] Owhadi, H., Scovel, C., Sullivan, T. J., McKerns, M. and Ortiz, M. (2013). Optimal uncertainty quantification., SIAM Rev. 55 271–345.
  • [83] Oxtoby, J. C. (1971)., Measure and Category. A Survey of the Analogies Between Topological and Measure Spaces. Graduate Texts in Mathematics, Vol. 2. Springer-Verlag, New York.
  • [84] Phelps, R. R. (2001)., Lectures on Choquet’s Theorem, second ed. Lecture Notes in Mathematics 1757. Springer-Verlag, Berlin.
  • [85] Sainte-Beuve, M. F. (1974). On the extension of von Neumann-Aumann’s theorem., J. Functional Analysis 17 112–129.
  • [86] Schwartz, L. (1965). On Bayes procedures., Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 4 10–26.
  • [87] Schwartz, L. (1974)., Radon Measures on Arbitrary Topological Spaces and Cylindrical Measures. Oxford Univ. Press, Oxford.
  • [88] Senn, S. (2007). Trying to be precise about vagueness., Statistics in Medecine 26 1417–1430.
  • [89] Senn, S. (2011). You may believe you are a Bayesian but you are probably wrong., RMM 2 48–66.
  • [90] Smith, J. E. (1995). Generalized Chebychev inequalities: theory and applications in decision analysis., Oper. Res. 43 807–825.
  • [91] Spanier, E. H. (1966)., Algebraic Topology. Springer-Verlag, New York.
  • [92] Stuart, A. M. (2010). Inverse problems: a Bayesian perspective., Acta Numer. 19 451–559.
  • [93] Telgársky, R. V. (1987). Topological games: on the 50th anniversary of the Banach–Mazur game., Rocky Mountain J. Math. 17 227–276.
  • [94] Tibshirani, R. and Wasserman, L. A. (1988). Sensitive parameters., The Canadian Journal of Statistics 16 185–192.
  • [95] Topsøe, F. (1970)., Topology and Measure. Lecture Notes in Mathematics, Vol. 133. Springer-Verlag, Berlin.
  • [96] von Mises, R. (1964)., Mathematical Theory of Probability and Statistics. Academic Press, New York. Edited and Complemented by Hilda Geiringer.
  • [97] von Weizsäcker, H. and Winkler, G. (1979/80). Integral representation in the set of solutions of a generalized moment problem., Math. Ann. 246 23–32.
  • [98] Walker, S. (2004). New approaches to Bayesian consistency., Ann. Statist. 32 2028–2043.
  • [99] Walker, S. and Hjort, N. L. (2001). On Bayesian consistency., J. R. Stat. Soc. Ser. B Stat. Methodol. 63 811–821.
  • [100] Walley, P. (1991)., Statistical Reasoning with Imprecise Probabilities. Monographs on Statistics and Applied Probability 42. Chapman and Hall Ltd., London.
  • [101] Wasserman, L. (1998). Asymptotic properties of nonparametric Bayesian procedures. In, Practical Nonparametric and Semiparametric Bayesian Statistics. Lecture Notes in Statist. 133 293–304. Springer, New York.
  • [102] Wasserman, L., Lavine, M. and Wolpert, R. L. (1993). Linearization of Bayesian robustness problems., J. Statist. Plann. Inference 37 307–316.
  • [103] Wasserman, L. and Seidenfeld, T. (1994). The dilation phenomenon in robust Bayesian inference., J. Statist. Plann. Inference 40 345–356.
  • [104] Wasserman, L. A. (1990). Prior envelopes based on belief functions., Ann. Statist. 18 454–464.
  • [105] Weichselberger, K. (2000). The theory of interval-probability as a unifying concept for uncertainty., Internat. J. Approx. Reason. 24 149–170. Reasoning with imprecise probabilities (Ghent, 1999).
  • [106] White, H. (1982). Maximum likelihood estimation of misspecified models., Econometrica 50 1–25.
  • [107] Winkler, G. (1988). Extreme points of moment sets., Math. Oper. Res. 13 581–587.
  • [108] Zsilinszky, L. (1998). Topological games and hyperspace topologies., Set-Valued Anal. 6 187–207.