References
[1] Algoet, P. H. and Cover, T. M. (1988). A sandwich proof of the Shannon-McMillan-Breiman theorem. Annals of Probability 16, 899–909. http://projecteuclid.org/euclid.aop/1176991794.
[2] Arora, S., Hazan, E., and Kale, S. (2005). The multiplicative weights update method: a meta algorithm and applications. http://www.cs.princeton.edu/~arora/pubs/MWsurvey.pdf.
[3] Badii, R. and Politi, A. (1997). Complexity: Hierarchical Structures and Scaling in Physics. Cambridge University Press, Cambridge, England.
[4] Barron, A., Schervish, M. J., and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. The Annals of Statistics 27, 536–561. http://projecteuclid.org/euclid.aos/1018031206.
[5] Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect. Annals of Mathematical Statistics 37, 51–58. See also correction, volume 37 (1966), pp. 745–746, http://projecteuclid.org/euclid.aoms/1177699597.
[6] Berk, R. H. (1970). Consistency a posteriori. Annals of Mathematical Statistics 41, 894–906. http://projecteuclid.org/euclid.aoms/1177696967.
[7] Blackwell, D. and Dubins, L. (1962). Merging of opinion with increasing information. Annals of Mathematical Statistics 33, 882–886. http://projecteuclid.org/euclid.aoms/1177704456.
[8] Börgers, T. and Sarin, R. (1997). Learning through reinforcement and replicator dynamics. Journal of Economic Theory 77, 1–14.
[9] Borkar, V. S. (2002). Reinforcement learning in Markovian evolutionary games. Advances in Complex Systems 5, 55–72.
[10] Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press, Cambridge, England.
[11] Chamley, C. (2004). Rational Herds: Economic Models of Social Learning. Cambridge University Press, Cambridge, England.
[12] Charniak, E. (1993). Statistical Language Learning. MIT Press, Cambridge, Massachusetts.
[13] Choi, T. and Ramamoorthi, R. V. (2008). Remarks on consistency of posterior distributions. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, B. Clarke and S. Ghosal, Eds. Institute of Mathematical Statistics, Beechwood, Ohio, 170–186. http://arxiv.org/abs/0805.3248.
[14] Choudhuri, N., Ghosal, S., and Roy, A. (2004). Bayesian estimation of the spectral density of a time series. Journal of the American Statistical Association 99, 1050–1059. http://www4.stat.ncsu.edu/~sghosal/papers/specden.pdf.
[15] Crutchfield, J. P. (1992). Semantics and thermodynamics. In Nonlinear Modeling and Forecasting, M. Casdagli and S. Eubank, Eds. Addison-Wesley, Reading, Massachusetts, 317–359.
[16] Daw, C. S., Finney, C. E. A., and Tracy, E. R. (2003). A review of symbolic analysis of experimental data. Review of Scientific Instruments 74, 916–930. http://www-chaos.engr.utk.edu/abs/abs-rsi2002.html.
[17] Dębowski, Ł. (2006). Ergodic decomposition of excess entropy and conditional mutual information. Tech. Rep. 993, Institute of Computer Science, Polish Academy of Sciences (IPI PAN). http://www.ipipan.waw.pl/~ldebowsk/docs/raporty/ee_report.pdf.
[18] Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. The Annals of Statistics 14, 1–26. http://projecteuclid.org/euclid.aos/1176349830.
[19] Doob, J. L. (1949). Application of the theory of martingales. In Colloques Internationaux du Centre National de la Recherche Scientifique. Vol. 13. Centre National de la Recherche Scientifique, Paris, 23–27.
[20] Dynkin, E. B. (1978). Sufficient statistics and extreme points. Annals of Probability 6, 705–730. http://projecteuclid.org/euclid.aop/1176995424.
[21] Earman, J. (1992). Bayes or Bust? A Critical Account of Bayesian Confirmation Theory. MIT Press, Cambridge, Massachusetts.
[22] Eichelsbacher, P. and Ganesh, A. (2002). Moderate deviations for Bayes posteriors. Scandanavian Journal of Statistics 29, 153–167.
[23] Fisher, R. A. (1958). The Genetical Theory of Natural Selection, Second ed. Dover, New York. First edition published Oxford: Clarendon Press, 1930.
[24] Fraser, A. M. (2008). Hidden Markov Models and Dynamical Systems. SIAM Press, Philadelphia.
[25] Geman, S. and Hwang, C.-R. (1982). Nonparametric maximum likelihood estimation by the method of sieves. The Annals of Statistics 10, 401–414. http://projecteuclid.org/euclid.aos/1176345782.
[26] Ghosal, S., Ghosh, J. K., and Ramamoorthi, R. V. (1999). Consistency issues in Bayesian nonparametrics. In Asymptotics, Nonparametrics and Time Series: A Tribute to Madan Lal Puri, S. Ghosh, Ed. Marcel Dekker, 639–667. http://www4.stat.ncsu.edu/~sghosal/papers/review.pdf.
[27] Ghosal, S., Ghosh, J. K., and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Annals of Statistics 28, 500–531. http://projecteuclid.org/euclid.aos/1016218228.
[28] Ghosal, S. and Tang, Y. (2006). Bayesian consistency for Markov processes. Sankhya 68, 227–239. http://sankhya.isical.ac.in/search/68_2/2006010.html.
[29] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-iid observations. Annals of Statistics 35, 192–223. http://arxiv.org/abs/0708.0491.
[30] Ghosh, J. K. and Ramamoorthi, R. V. (2003). Bayesian Nonparametrics. Springer Verlag, New York.
[31] Gray, R. M. (1988). Probability, Random Processes, and Ergodic Properties. Springer-Verlag, New York. http://ee.stanford.edu/~gray/arp.html.
[32] Gray, R. M. (1990). Entropy and Information Theory. Springer-Verlag, New York. http://ee.stanford.edu/~gray/it.html.
[33] Haldane, J. B. S. (1954). The measurement of natural selection. In Proceedings of the 9th International Congress of Genetics. Vol. 1. 480–487.
[34] Hofbauer, J. and Sigmund, K. (1998). Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge, England.
[35] Kallenberg, O. (2002). Foundations of Modern Probability, Second ed. Springer-Verlag, New York.
[36] Kitchens, B. and Tuncel, S. (1985). Finitary Measures for Subshifts of Finite Type and Sofic Systems. Memoirs of the American Mathematical Society, Vol. 338. American Mathematical Society, Providence, Rhode Island.
[37] Kitchens, B. P. (1998). Symbolic Dynamics: One-sided, Two-sided and Countable State Markov Shifts. Springer-Verlag, Berlin.
[38] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. Annals of Statistics 34, 837–877. http://arxiv.org/math.ST/0607023.
[39] Knight, F. B. (1975). A predictive view of continuous time processes. Annals of Probability 3, 573–596. http://projecteuclid.org/euclid.aop/1176996302.
[40] Krogh, A. and Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing 7 [NIPS 1994], G. Tesauro, D. Tourtetsky, and T. Leen, Eds. MIT Press, Cambridge, Massachusetts, 231–238. http://books.nips.cc/papers/files/nips07/0231.pdf.
[41] Lian, H. (2007). On rates of convergence for posterior distributions under misspecification. E-print, arxiv.org. http://arxiv.org/abs/math.ST/0702126.
[42] Lijoi, A., Prünster, I., and Walker, S. G. (2007). Bayesian consistency for stationary models. Econometric Theory 23, 749–759.
[43] Lind, D. and Marcus, B. (1995). An Introduction to Symbolic Dynamics and Coding. Cambridge University Press, Cambridge, England.
[44] Marton, K. and Shields, P. C. (1994). Entropy and the consistent estimation of joint distributions. The Annals of Probability 22, 960–977. Correction, The Annals of Probability, 24 (1996): 541–545, http://projecteuclid.org/euclid.aop/1176988736.
[45] McAllister, D. A. (1999). Some PAC-Bayesian theorems. Machine Learning 37, 355–363.
[46] Meir, R. (2000). Nonparametric time series prediction through adaptive model selection. Machine Learning 39, 5–34. http://www.ee.technion.ac.il/~rmeir/Publications/MeirTimeSeries00.pdf.
[47] Ornstein, D. S. and Weiss, B. (1990). How sampling reveals a process. The Annals of Probability 18, 905–930. http://projecteuclid.org/euclid.aop/1176990729.
[48] Page, S. E. (2007). The Difference: How the Power of Diveristy Creates Better Groups, Firms, Schools, and Societies. Princeton University Press, Princeton, New Jersey.
[49] Papangelou, F. (1996). Large deviations and the Bayesian estimation of higher-order Markov transition functions. Journal of Applied Probability 33, 18–27. http://www.jstor.org/stable/3215260.
[50] Perry, N. and Binder, P.-M. (1999). Finite statistical complexity for sofic systems. Physical Review E 60, 459–463.
[51] Rivers, D. and Vuong, Q. H. (2002). Model selection tests for nonlinear dynamic models. The Econometrics Journal 5, 1–39.
[52] Roy, A., Ghosal, S., and Rosenberger, W. F. (2009). Convergence properties of sequential Bayesian d-optimal designs. Journal of Statistical Planning and Inference 139, 425–440.
[53] Ryabko, D. and Ryabko, B. (2008). Testing statistical hypotheses about ergodic processes. E-print, arxiv.org, 0804.0510. http://arxiv.org/abs/0804.0510.
[54] Sato, Y. and Crutchfield, J. P. (2003). Coupled replicator equations for the dynamics of learning in multiagent systems. Physical Review E 67, 015206. http://arxiv.org/abs/nlin.AO/0204057.
[55] Schervish, M. J. (1995). Theory of Statistics. Springer Series in Statistics. Springer-Verlag, Berlin.
[56] Schwartz, L. (1965). On Bayes procedures. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 4, 10–26.
[57] Shalizi, C. R. and Crutchfield, J. P. (2001). Computational mechanics: Pattern and prediction, structure and simplicity. Journal of Statistical Physics 104, 817–879. http://arxiv.org/abs/cond-mat/9907176.
[58] Shalizi, C. R. and Klinkner, K. L. (2004). Blind construction of optimal nonlinear recursive predictors for discrete sequences. In Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI 2004), M. Chickering and J. Y. Halpern, Eds. AUAI Press, Arlington, Virginia, 504–511. http://arxiv.org/abs/cs.LG/0406011.
[59] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Annals of Statistics 29, 687–714. http://projecteuclid.org/euclid.aos/1009210686.
[60] Shields, P. C. (1996). The Ergodic Theory of Discrete Sample Paths. American Mathematical Society, Providence, Rhode Island.
[61] Strelioff, C. C., Crutchfield, J. P., and Hübler, A. W. (2007). Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. Physical Review E 76, 011106. http://arxiv.org/math.ST/0703715.
[62] Varn, D. P. and Crutchfield, J. P. (2004). From finite to infinite range order via annealing: The causal architecture of deformation faulting in annealed close-packed crystals. Physics Letters A 324, 299–307. http://arxiv.org/abs/cond-mat/0307296.
[63] Vidyasagar, M. (2003). Learning and Generalization: With Applications to Neural Networks, Second ed. Springer-Verlag, Berlin.
[64] Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57, 307–333. http://www.jstor.org/pss/1912557.
[65] Walker, S. (2004). New approaches to Bayesian consistency. Annals of Statistics 32, 2028–2043. http://arxiv.org/abs/math.ST/0503672.
[66] Weiss, B. (1973). Subshifts of finite type and sofic systems. Monatshefte für Mathematik 77, 462–474.
[67] Xing, Y. and Ranneby, B. (2008). Both necessary and sufficient conditions for Bayesian exponential consistency. http://arxiv.org/abs/0812.1084.
[68] Zhang, T. (2006). From ε-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Annals of Statistics 34, 2180–2210. http://arxiv.org/math.ST/0702653.