Electronic Journal of Statistics
- Electron. J. Statist.
- Volume 3 (2009), 1039-1074.
Dynamics of Bayesian updating with dependent data and misspecified models
Full-text: Access has been disabled (more information)
Abstract
Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonparametrics. The asymptotics of Bayesian updating with mis-specified models or priors, or non-Markovian data, are far less well explored. Here I establish sufficient conditions for posterior convergence when all hypotheses are wrong, and the data have complex dependencies. The main dynamical assumption is the asymptotic equipartition (Shannon-McMillan-Breiman) property of information theory. This, along with Egorov’s Theorem on uniform convergence, lets me build a sieve-like structure for the prior. The main statistical assumption, also a form of capacity control, concerns the compatibility of the prior and the data-generating process, controlling the fluctuations in the log-likelihood when averaged over the sieve-like sets. In addition to posterior convergence, I derive a kind of large deviations principle for the posterior measure, extending in some cases to rates of convergence, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between these results and the replicator dynamics of evolutionary theory.
Article information
Source
Electron. J. Statist. Volume 3 (2009), 1039-1074.
Dates
First available in Project Euclid: 29 October 2009
Permanent link to this document
http://projecteuclid.org/euclid.ejs/1256822130
Digital Object Identifier
doi:10.1214/09-EJS485
Mathematical Reviews number (MathSciNet)
MR2557128
Zentralblatt MATH identifier
1326.62017
Subjects
Primary: 62C10: Bayesian problems; characterization of Bayes procedures 62G20: Asymptotic properties 62M09: Non-Markovian processes: estimation
Secondary: 60F10: Large deviations 62M05: Markov processes: estimation 92D15: Problems related to evolution 94A17: Measures of information, entropy
Keywords
Asymptotic equipartition Bayesian consistency Bayesian nonparametrics Egorov’s theorem large deviations posterior convergence replicator dynamics sofic systems
Citation
Shalizi, Cosma Rohilla. Dynamics of Bayesian updating with dependent data and misspecified models. Electron. J. Statist. 3 (2009), 1039--1074. doi:10.1214/09-EJS485. http://projecteuclid.org/euclid.ejs/1256822130.
References
- [1] Algoet, P. H. and Cover, T. M. (1988). A sandwich proof of the Shannon-McMillan-Breiman theorem., Annals of Probability 16, 899–909. http://projecteuclid.org/euclid.aop/1176991794.Mathematical Reviews (MathSciNet): MR929085
Zentralblatt MATH: 0653.28013
Digital Object Identifier: doi:10.1214/aop/1176991794
Project Euclid: euclid.aop/1176991794 - [2] Arora, S., Hazan, E., and Kale, S. (2005). The multiplicative weights update method: a meta algorithm and applications., http://www.cs.princeton.edu/~arora/pubs/MWsurvey.pdf.
- [3] Badii, R. and Politi, A. (1997)., Complexity: Hierarchical Structures and Scaling in Physics. Cambridge University Press, Cambridge, England.Mathematical Reviews (MathSciNet): MR1445998
- [4] Barron, A., Schervish, M. J., and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems., The Annals of Statistics 27, 536–561. http://projecteuclid.org/euclid.aos/1018031206.Mathematical Reviews (MathSciNet): MR1714718
Zentralblatt MATH: 0980.62039
Digital Object Identifier: doi:10.1214/aos/1017939142
Project Euclid: euclid.aos/1018031206 - [5] Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect., Annals of Mathematical Statistics 37, 51–58. See also correction, volume 37 (1966), pp. 745–746, http://projecteuclid.org/euclid.aoms/1177699597.Mathematical Reviews (MathSciNet): MR189176
Zentralblatt MATH: 0151.23802
Digital Object Identifier: doi:10.1214/aoms/1177699477
Project Euclid: euclid.aoms/1177699476 - [6] Berk, R. H. (1970). Consistency a posteriori., Annals of Mathematical Statistics 41, 894–906. http://projecteuclid.org/euclid.aoms/1177696967.Mathematical Reviews (MathSciNet): MR266356
Zentralblatt MATH: 0214.45703
Digital Object Identifier: doi:10.1214/aoms/1177696967
Project Euclid: euclid.aoms/1177696967 - [7] Blackwell, D. and Dubins, L. (1962). Merging of opinion with increasing information., Annals of Mathematical Statistics 33, 882–886. http://projecteuclid.org/euclid.aoms/1177704456.Mathematical Reviews (MathSciNet): MR149577
Zentralblatt MATH: 0109.35704
Digital Object Identifier: doi:10.1214/aoms/1177704456
Project Euclid: euclid.aoms/1177704456 - [8] Börgers, T. and Sarin, R. (1997). Learning through reinforcement and replicator dynamics., Journal of Economic Theory 77, 1–14.Mathematical Reviews (MathSciNet): MR1484291
Zentralblatt MATH: 0892.90198
Digital Object Identifier: doi:10.1006/jeth.1997.2319 - [9] Borkar, V. S. (2002). Reinforcement learning in Markovian evolutionary games., Advances in Complex Systems 5, 55–72.Mathematical Reviews (MathSciNet): MR1899835
Digital Object Identifier: doi:10.1142/S0219525902000535 - [10] Cesa-Bianchi, N. and Lugosi, G. (2006)., Prediction, Learning, and Games. Cambridge University Press, Cambridge, England.
- [11] Chamley, C. (2004)., Rational Herds: Economic Models of Social Learning. Cambridge University Press, Cambridge, England.
- [12] Charniak, E. (1993)., Statistical Language Learning. MIT Press, Cambridge, Massachusetts.
- [13] Choi, T. and Ramamoorthi, R. V. (2008). Remarks on consistency of posterior distributions. In, Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, B. Clarke and S. Ghosal, Eds. Institute of Mathematical Statistics, Beechwood, Ohio, 170–186. http://arxiv.org/abs/0805.3248.Mathematical Reviews (MathSciNet): MR2459224
Digital Object Identifier: doi:10.1214/074921708000000138 - [14] Choudhuri, N., Ghosal, S., and Roy, A. (2004). Bayesian estimation of the spectral density of a time series., Journal of the American Statistical Association 99, 1050–1059. http://www4.stat.ncsu.edu/~sghosal/papers/specden.pdf.Mathematical Reviews (MathSciNet): MR2109494
Zentralblatt MATH: 1055.62100
Digital Object Identifier: doi:10.1198/016214504000000557 - [15] Crutchfield, J. P. (1992). Semantics and thermodynamics. In, Nonlinear Modeling and Forecasting, M. Casdagli and S. Eubank, Eds. Addison-Wesley, Reading, Massachusetts, 317–359.
- [16] Daw, C. S., Finney, C. E. A., and Tracy, E. R. (2003). A review of symbolic analysis of experimental data., Review of Scientific Instruments 74, 916–930. http://www-chaos.engr.utk.edu/abs/abs-rsi2002.html.
- [17] Dębowski, Ł. (2006). Ergodic decomposition of excess entropy and conditional mutual information. Tech. Rep. 993, Institute of Computer Science, Polish Academy of Sciences (IPI PAN)., http://www.ipipan.waw.pl/~ldebowsk/docs/raporty/ee_report.pdf.
- [18] Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates., The Annals of Statistics 14, 1–26. http://projecteuclid.org/euclid.aos/1176349830.Mathematical Reviews (MathSciNet): MR829555
Digital Object Identifier: doi:10.1214/aos/1176349830
Project Euclid: euclid.aos/1176349830 - [19] Doob, J. L. (1949). Application of the theory of martingales. In, Colloques Internationaux du Centre National de la Recherche Scientifique. Vol. 13. Centre National de la Recherche Scientifique, Paris, 23–27.
- [20] Dynkin, E. B. (1978). Sufficient statistics and extreme points., Annals of Probability 6, 705–730. http://projecteuclid.org/euclid.aop/1176995424.Mathematical Reviews (MathSciNet): MR518321
Zentralblatt MATH: 0403.62009
Digital Object Identifier: doi:10.1214/aop/1176995424
Project Euclid: euclid.aop/1176995424 - [21] Earman, J. (1992)., Bayes or Bust? A Critical Account of Bayesian Confirmation Theory. MIT Press, Cambridge, Massachusetts.Mathematical Reviews (MathSciNet): MR1170349
- [22] Eichelsbacher, P. and Ganesh, A. (2002). Moderate deviations for Bayes posteriors., Scandanavian Journal of Statistics 29, 153–167.
- [23] Fisher, R. A. (1958)., The Genetical Theory of Natural Selection, Second ed. Dover, New York. First edition published Oxford: Clarendon Press, 1930.Mathematical Reviews (MathSciNet): MR1785121
- [24] Fraser, A. M. (2008)., Hidden Markov Models and Dynamical Systems. SIAM Press, Philadelphia.
- [25] Geman, S. and Hwang, C.-R. (1982). Nonparametric maximum likelihood estimation by the method of sieves., The Annals of Statistics 10, 401–414. http://projecteuclid.org/euclid.aos/1176345782.Mathematical Reviews (MathSciNet): MR653512
Zentralblatt MATH: 0494.62041
Digital Object Identifier: doi:10.1214/aos/1176345782
Project Euclid: euclid.aos/1176345782 - [26] Ghosal, S., Ghosh, J. K., and Ramamoorthi, R. V. (1999). Consistency issues in Bayesian nonparametrics. In, Asymptotics, Nonparametrics and Time Series: A Tribute to Madan Lal Puri, S. Ghosh, Ed. Marcel Dekker, 639–667. http://www4.stat.ncsu.edu/~sghosal/papers/review.pdf.
- [27] Ghosal, S., Ghosh, J. K., and van der Vaart, A. W. (2000). Convergence rates of posterior distributions., Annals of Statistics 28, 500–531. http://projecteuclid.org/euclid.aos/1016218228.Mathematical Reviews (MathSciNet): MR1790007
Digital Object Identifier: doi:10.1214/aos/1016218228
Project Euclid: euclid.aos/1016218228 - [28] Ghosal, S. and Tang, Y. (2006). Bayesian consistency for Markov processes., Sankhya 68, 227–239. http://sankhya.isical.ac.in/search/68_2/2006010.html.Mathematical Reviews (MathSciNet): MR2303082
- [29] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-iid observations., Annals of Statistics 35, 192–223. http://arxiv.org/abs/0708.0491.Mathematical Reviews (MathSciNet): MR2332274
Digital Object Identifier: doi:10.1214/009053606000001172
Project Euclid: euclid.aos/1181100186 - [30] Ghosh, J. K. and Ramamoorthi, R. V. (2003)., Bayesian Nonparametrics. Springer Verlag, New York.Mathematical Reviews (MathSciNet): MR1992245
- [31] Gray, R. M. (1988)., Probability, Random Processes, and Ergodic Properties. Springer-Verlag, New York. http://ee.stanford.edu/~gray/arp.html.Mathematical Reviews (MathSciNet): MR918767
- [32] Gray, R. M. (1990)., Entropy and Information Theory. Springer-Verlag, New York. http://ee.stanford.edu/~gray/it.html.Mathematical Reviews (MathSciNet): MR1070359
- [33] Haldane, J. B. S. (1954). The measurement of natural selection. In, Proceedings of the 9th International Congress of Genetics. Vol. 1. 480–487.
- [34] Hofbauer, J. and Sigmund, K. (1998)., Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge, England.
- [35] Kallenberg, O. (2002)., Foundations of Modern Probability, Second ed. Springer-Verlag, New York.Mathematical Reviews (MathSciNet): MR1876169
- [36] Kitchens, B. and Tuncel, S. (1985)., Finitary Measures for Subshifts of Finite Type and Sofic Systems. Memoirs of the American Mathematical Society, Vol. 338. American Mathematical Society, Providence, Rhode Island.
- [37] Kitchens, B. P. (1998)., Symbolic Dynamics: One-sided, Two-sided and Countable State Markov Shifts. Springer-Verlag, Berlin.Mathematical Reviews (MathSciNet): MR1484730
- [38] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics., Annals of Statistics 34, 837–877. http://arxiv.org/math.ST/0607023.Mathematical Reviews (MathSciNet): MR2283395
Zentralblatt MATH: 1095.62031
Digital Object Identifier: doi:10.1214/009053606000000029
Project Euclid: euclid.aos/1151418243 - [39] Knight, F. B. (1975). A predictive view of continuous time processes., Annals of Probability 3, 573–596. http://projecteuclid.org/euclid.aop/1176996302.
- [40] Krogh, A. and Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In, Advances in Neural Information Processing 7 [NIPS 1994], G. Tesauro, D. Tourtetsky, and T. Leen, Eds. MIT Press, Cambridge, Massachusetts, 231–238. http://books.nips.cc/papers/files/nips07/0231.pdf.
- [41] Lian, H. (2007). On rates of convergence for posterior distributions under misspecification. E-print, arxiv.org., http://arxiv.org/abs/math.ST/0702126.Mathematical Reviews (MathSciNet): MR2552970
Zentralblatt MATH: 1167.62045
Digital Object Identifier: doi:10.1080/03610920802478375 - [42] Lijoi, A., Prünster, I., and Walker, S. G. (2007). Bayesian consistency for stationary models., Econometric Theory 23, 749–759.Mathematical Reviews (MathSciNet): MR2364385
Zentralblatt MATH: 1237.62033
Digital Object Identifier: doi:10.1017/S0266466607070314 - [43] Lind, D. and Marcus, B. (1995)., An Introduction to Symbolic Dynamics and Coding. Cambridge University Press, Cambridge, England.
- [44] Marton, K. and Shields, P. C. (1994). Entropy and the consistent estimation of joint distributions., The Annals of Probability 22, 960–977. Correction, The Annals of Probability, 24 (1996): 541–545, http://projecteuclid.org/euclid.aop/1176988736.Mathematical Reviews (MathSciNet): MR1288138
Digital Object Identifier: doi:10.1214/aop/1176988736
Project Euclid: euclid.aop/1176988736 - [45] McAllister, D. A. (1999). Some PAC-Bayesian theorems., Machine Learning 37, 355–363.
- [46] Meir, R. (2000). Nonparametric time series prediction through adaptive model selection., Machine Learning 39, 5–34. http://www.ee.technion.ac.il/~rmeir/Publications/MeirTimeSeries00.pdf.
- [47] Ornstein, D. S. and Weiss, B. (1990). How sampling reveals a process., The Annals of Probability 18, 905–930. http://projecteuclid.org/euclid.aop/1176990729.Mathematical Reviews (MathSciNet): MR1062052
Zentralblatt MATH: 0709.60036
Digital Object Identifier: doi:10.1214/aop/1176990729
Project Euclid: euclid.aop/1176990729 - [48] Page, S. E. (2007)., The Difference: How the Power of Diveristy Creates Better Groups, Firms, Schools, and Societies. Princeton University Press, Princeton, New Jersey.
- [49] Papangelou, F. (1996). Large deviations and the Bayesian estimation of higher-order Markov transition functions., Journal of Applied Probability 33, 18–27. http://www.jstor.org/stable/3215260.Mathematical Reviews (MathSciNet): MR1371950
Zentralblatt MATH: 0845.60025
Digital Object Identifier: doi:10.2307/3215260 - [50] Perry, N. and Binder, P.-M. (1999). Finite statistical complexity for sofic systems., Physical Review E 60, 459–463.
- [51] Rivers, D. and Vuong, Q. H. (2002). Model selection tests for nonlinear dynamic models., The Econometrics Journal 5, 1–39.Mathematical Reviews (MathSciNet): MR1909299
Zentralblatt MATH: 1010.62110
Digital Object Identifier: doi:10.1111/1368-423X.t01-1-00071 - [52] Roy, A., Ghosal, S., and Rosenberger, W. F. (2009). Convergence properties of sequential Bayesian, d-optimal designs. Journal of Statistical Planning and Inference 139, 425–440.Mathematical Reviews (MathSciNet): MR2474016
Zentralblatt MATH: 1149.62066
Digital Object Identifier: doi:10.1016/j.jspi.2008.04.025 - [53] Ryabko, D. and Ryabko, B. (2008). Testing statistical hypotheses about ergodic processes. E-print, arxiv.org, 0804.0510., http://arxiv.org/abs/0804.0510.
- [54] Sato, Y. and Crutchfield, J. P. (2003). Coupled replicator equations for the dynamics of learning in multiagent systems., Physical Review E 67, 015206. http://arxiv.org/abs/nlin.AO/0204057.
- [55] Schervish, M. J. (1995)., Theory of Statistics. Springer Series in Statistics. Springer-Verlag, Berlin.
- [56] Schwartz, L. (1965). On Bayes procedures., Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 4, 10–26.
- [57] Shalizi, C. R. and Crutchfield, J. P. (2001). Computational mechanics: Pattern and prediction, structure and simplicity., Journal of Statistical Physics 104, 817–879. http://arxiv.org/abs/cond-mat/9907176.Mathematical Reviews (MathSciNet): MR1853995
Zentralblatt MATH: 1100.82500
Digital Object Identifier: doi:10.1023/A:1010388907793 - [58] Shalizi, C. R. and Klinkner, K. L. (2004). Blind construction of optimal nonlinear recursive predictors for discrete sequences. In, Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI 2004), M. Chickering and J. Y. Halpern, Eds. AUAI Press, Arlington, Virginia, 504–511. http://arxiv.org/abs/cs.LG/0406011.
- [59] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions., Annals of Statistics 29, 687–714. http://projecteuclid.org/euclid.aos/1009210686.Mathematical Reviews (MathSciNet): MR1865337
Zentralblatt MATH: 1041.62022
Digital Object Identifier: doi:10.1214/aos/1009210686
Project Euclid: euclid.aos/1009210686 - [60] Shields, P. C. (1996)., The Ergodic Theory of Discrete Sample Paths. American Mathematical Society, Providence, Rhode Island.
- [61] Strelioff, C. C., Crutchfield, J. P., and Hübler, A. W. (2007). Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling., Physical Review E 76, 011106. http://arxiv.org/math.ST/0703715.Mathematical Reviews (MathSciNet): MR2365498
- [62] Varn, D. P. and Crutchfield, J. P. (2004). From finite to infinite range order via annealing: The causal architecture of deformation faulting in annealed close-packed crystals., Physics Letters A 324, 299–307. http://arxiv.org/abs/cond-mat/0307296.
- [63] Vidyasagar, M. (2003)., Learning and Generalization: With Applications to Neural Networks, Second ed. Springer-Verlag, Berlin.
- [64] Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses., Econometrica 57, 307–333. http://www.jstor.org/pss/1912557.
- [65] Walker, S. (2004). New approaches to Bayesian consistency., Annals of Statistics 32, 2028–2043. http://arxiv.org/abs/math.ST/0503672.Mathematical Reviews (MathSciNet): MR2102501
Zentralblatt MATH: 1056.62040
Digital Object Identifier: doi:10.1214/009053604000000409
Project Euclid: euclid.aos/1098883780 - [66] Weiss, B. (1973). Subshifts of finite type and sofic systems., Monatshefte für Mathematik 77, 462–474.Mathematical Reviews (MathSciNet): MR340556
Zentralblatt MATH: 0285.28021
Digital Object Identifier: doi:10.1007/BF01295322 - [67] Xing, Y. and Ranneby, B. (2008). Both necessary and sufficient conditions for Bayesian exponential consistency., http://arxiv.org/abs/0812.1084.
- [68] Zhang, T. (2006). From, ε-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Annals of Statistics 34, 2180–2210. http://arxiv.org/math.ST/0702653.Mathematical Reviews (MathSciNet): MR2291497
Digital Object Identifier: doi:10.1214/009053606000000704
Project Euclid: euclid.aos/1169571794
The Institute of Mathematical Statistics and the Bernoulli Society

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Posterior consistency of nonparametric conditional moment restricted models
Liao, Yuan and Jiang, Wenxin, The Annals of Statistics, 2011 - Quasi-Bayesian analysis of nonparametric instrumental variables models
Kato, Kengo, The Annals of Statistics, 2013 - Convergence rates for density estimation with Bernstein
polynomials
Ghosal, Subhashis, The Annals of Statistics, 2001
- Posterior consistency of nonparametric conditional moment restricted models
Liao, Yuan and Jiang, Wenxin, The Annals of Statistics, 2011 - Quasi-Bayesian analysis of nonparametric instrumental variables models
Kato, Kengo, The Annals of Statistics, 2013 - Convergence rates for density estimation with Bernstein
polynomials
Ghosal, Subhashis, The Annals of Statistics, 2001 - Right Haar Measure for Convergence in Probability to Quasi Posterior Distributions
Stone, M., The Annals of Mathematical Statistics, 1965 - Posterior asymptotics of nonparametric location-scale mixtures for multivariate density estimation
Canale, Antonio and De Blasi, Pierpaolo, Bernoulli, 2017 - Necessary and Sufficient Conditions for High-Dimensional Posterior Consistency under g-Priors
Sparks, Douglas K., Khare, Kshitij, and Ghosh, Malay, Bayesian Analysis, 2015 - Convergence analysis of the Gibbs sampler for Bayesian general linear mixed models with improper priors
Román, Jorge Carlos and Hobert, James P., The Annals of Statistics, 2012 - Bayesian Solution Uncertainty Quantification for Differential Equations
Chkrebtii, Oksana A., Campbell, David A., Calderhead, Ben, and Girolami, Mark A., Bayesian Analysis, 2016 - On the topological support of species sampling priors
Bissiri, Pier Giovanni and Ongaro, Andrea, Electronic Journal of Statistics, 2014 - On posterior consistency of survival models
Kim, Yongdai and Lee, Jaeyong, The Annals of Statistics, 2001
