The Annals of Statistics

Information geometry approach to parameter estimation in Markov chains

Masahito Hayashi and Shun Watanabe

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We consider the parameter estimation of Markov chain when the unknown transition matrix belongs to an exponential family of transition matrices. Then we show that the sample mean of the generator of the exponential family is an asymptotically efficient estimator. Further, we also define a curved exponential family of transition matrices. Using a transition matrix version of the Pythagorean theorem, we give an asymptotically efficient estimator for a curved exponential family.

Article information

Source
Ann. Statist., Volume 44, Number 4 (2016), 1495-1535.

Dates
Received: February 2015
Revised: November 2015
First available in Project Euclid: 7 July 2016

Permanent link to this document
https://projecteuclid.org/euclid.aos/1467894706

Digital Object Identifier
doi:10.1214/15-AOS1420

Mathematical Reviews number (MathSciNet)
MR3519931

Zentralblatt MATH identifier
1347.62182

Subjects
Primary: 62M05: Markov processes: estimation

Keywords
Exponential family natural parameter expectation parameter relative entropy Fisher information matrix asymptotic efficient estimator

Citation

Hayashi, Masahito; Watanabe, Shun. Information geometry approach to parameter estimation in Markov chains. Ann. Statist. 44 (2016), no. 4, 1495--1535. doi:10.1214/15-AOS1420. https://projecteuclid.org/euclid.aos/1467894706


Export citation

References

  • [1] Amari, S.-I. (2009). $\alpha$-divergence is unique, belonging to both $f$-divergence and Bregman divergence classes. IEEE Trans. Inform. Theory 55 4925–4931.
  • [2] Amari, S.-i. and Nagaoka, H. (2000). Methods of Information Geometry. Amer. Math. Soc., Providence, RI.
  • [3] Bhat, B. R. (1988). On exponential and curved exponential families in stochastic processes. Math. Sci. 13 121–134.
  • [4] Bhat, B. R. (2000). Stochastic Models: Analysis and Applications. New Age International, New Delhi.
  • [5] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
  • [6] Brègman, L. M. (1967). The relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7 200–217.
  • [7] Conn, A. R., Scheinberg, K. and Vicente, L. N. (2009). Introduction to Derivative-Free Optimization. SIAM, Philadelphia, PA.
  • [8] Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd ed. Springer, New York.
  • [9] Doob, J. L. (1953). Stochastic Processes. Wiley, New York.
  • [10] Feigin, P. D. (1981). Conditional exponential families and a representation theorem for asymptotic inference. Ann. Statist. 9 597–603.
  • [11] Hayashi, M. and Watanabe, S. (2013). Non-asymptotic and asymptotic analyses on Markov chains in several problems. Preprint. Available at arXiv:1309.7528.
  • [12] Hayashi, M. and Watanabe, S. (2014). Finite-length analysis on simple hypothesis testing for Markov chain. Preprint. Available at arXiv:1401.3801.
  • [13] Hudson, I. L. (1982). Large sample inference for Markovian exponential families with application to branching processes with immigration. Austral. J. Statist. 24 98–112.
  • [14] Ito, H. and Amari, S. (1988). Geometry of information sources. In Proceedings of the 11th Symposium on Information Theory and Its Applications (SITA1988) 57–60. Beppu. (In Japanese.)
  • [15] Jones, G. L. (2004). On the Markov chain central limit theorem. Probab. Surv. 1 299–320.
  • [16] Joulin, A. and Ollivier, Y. (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38 2418–2442.
  • [17] Kemeny, J. G. and Snell, J. L. (1960). Finite Markov Chains. Springer, New York.
  • [18] Kontoyiannis, I. and Meyn, S. P. (2003). Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13 304–362.
  • [19] Kuchler, U. and Sorensen, M. (1989). Exponential families of stochastic processes: A unifying semimartingale approach. Int. Stat. Rev. 57 123–144.
  • [20] Küchler, U. and Sørensen, M. (1998). On exponential families of Markov processes. J. Statist. Plann. Inference 66 3–19.
  • [21] Łatuszyński, K., Miasojedow, B. and Niemiro, W. (2013). Nonasymptotic bounds on the estimation error of MCMC algorithms. Bernoulli 19 2033–2066.
  • [22] Łatuszyński, K. and Niemiro, W. (2011). Rigorous confidence bounds for MCMC under a geometric drift condition. J. Complexity 27 23–38.
  • [23] Matsui, K., Kumagai, W. and Kanamori, T. (2014). Parallel distributed block coordinate descent methods based on pairwise comparison oracle. Preprint. Available at arXiv:1409.3912.
  • [24] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.
  • [25] Mitzenmacher, M. and Upfal, E. (2005). Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge Univ. Press, Cambridge.
  • [26] Nagaoka, H. (2005). The exponential family of Markov chains and its information geometry. In Proceedings of the 28th Symposium on Information Theory and Its Applications (SITA2005), Okinawa, Japan.
  • [27] Nakagawa, K. and Kanaya, F. (1993). On the converse theorem in statistical hypothesis testing for Markov chains. IEEE Trans. Inform. Theory 39 629–633.
  • [28] Natarajan, S. (1985). Large deviations, hypotheses testing, and source coding for finite Markov chains. IEEE Trans. Inform. Theory 31 360–365.
  • [29] Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal 7 308–313.
  • [30] Niemiro, W. and Pokarowski, P. (2009). Fixed precision MCMC estimation by median of products of averages. J. Appl. Probab. 46 309–329.
  • [31] Peskun, P. H. (1973). Optimum Monte-Carlo sampling using Markov chains. Biometrika 60 607–612.
  • [32] Rudolf, D. (2009). Explicit error bounds for lazy reversible Markov chain Monte Carlo. J. Complexity 25 11–24.
  • [33] Rudolf, D. (2010). Error bounds for computing the expectation by Markov chain Monte Carlo. Monte Carlo Methods Appl. 16 323–342.
  • [34] Shalizi, C. (2009). Maximum likelihood estimation for Markov chains. Available at http://www.stat.cmu.edu/~cshalizi/462/lectures/06/markov-mle.pdf.
  • [35] Sørensen, M. (1986). On sequential maximum likelihood estimation for exponential families of stochastic processes. Internat. Statist. Rev. 54 191–210.
  • [36] Stefanov, V. T. (1995). Explicit limit results for minimal sufficient statistics and maximum likelihood estimators in some Markov processes: Exponential families approach. Ann. Statist. 23 1073–1101.