## The Annals of Statistics

### Bayesian nonparametric analysis of reversible Markov chains

#### Abstract

We introduce a three-parameter random walk with reinforcement, called the $(\theta,\alpha,\beta)$ scheme, which generalizes the linearly edge reinforced random walk to uncountable spaces. The parameter $\beta$ smoothly tunes the $(\theta,\alpha,\beta)$ scheme between this edge reinforced random walk and the classical exchangeable two-parameter Hoppe urn scheme, while the parameters $\alpha$ and $\theta$ modulate how many states are typically visited. Resorting to de Finetti’s theorem for Markov chains, we use the $(\theta,\alpha,\beta)$ scheme to define a nonparametric prior for Bayesian analysis of reversible Markov chains. The prior is applied in Bayesian nonparametric inference for species sampling problems with data generated from a reversible Markov chain with an unknown transition kernel. As a real example, we analyze data from molecular dynamics simulations of protein folding.

#### Article information

Source
Ann. Statist. Volume 41, Number 2 (2013), 870-896.

Dates
First available: 29 May 2013

http://projecteuclid.org/euclid.aos/1369836963

Digital Object Identifier
doi:10.1214/13-AOS1102

Zentralblatt MATH identifier
06190469

Mathematical Reviews number (MathSciNet)
MR3099124

#### Citation

Bacallado, Sergio; Favaro, Stefano; Trippa, Lorenzo. Bayesian nonparametric analysis of reversible Markov chains. The Annals of Statistics 41 (2013), no. 2, 870--896. doi:10.1214/13-AOS1102. http://projecteuclid.org/euclid.aos/1369836963.

#### References

• [1] Bacallado, S. (2011). Bayesian analysis of variable-order, reversible Markov chains. Ann. Statist. 39 838–864.
• [2] Bacallado, S., Favaro, S. and Trippa, L. (2013). Supplement to “Bayesian nonparametric analysis of reversible Markov chains.” DOI:10.1214/13-AOS1102SUPP.
• [3] Beal, M. J., Ghahramani, Z. and Rasmussen, C. E. (2002). The infinite hidden Markov model. Adv. Neural Inf. Process. Syst. 14 577–584.
• [4] Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist. 1 353–355.
• [5] Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: A review. J. Amer. Statist. Assoc. 88 364–373.
• [6] Comtet, L. (1974). Advanced Combinatorics: The Art of Finite and Infinite Expansions, enlarged ed. Reidel, Dordrecht.
• [7] Diaconis, P. (1988). Recent progress on de Finetti notions of exchangeability. In Bayesian Statistics 3 (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) 111–125. Oxford Univ. Press, New York.
• [8] Diaconis, P. and Freedman, D. (1980). de Finetti’s theorem for Markov chains. Ann. Probab. 8 115–130.
• [9] Diaconis, P. and Rolles, S. W. W. (2006). Bayesian analysis for reversible Markov chains. Ann. Statist. 34 1270–1292.
• [10] Engen, S. (1978). Stochastic Abundance Models: With Emphasis on Biological Communities and Species Diversity. Chapman & Hall, London.
• [11] Favaro, S., Lijoi, A., Mena, R. H. and Prünster, I. (2009). Bayesian non-parametric inference for species variety with a two-parameter Poisson–Dirichlet process prior. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 993–1008.
• [12] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
• [13] Fortini, S. and Petrone, S. (2012). Hierarchical reinforced urn processes. Statist. Probab. Lett. 82 1521–1529.
• [14] Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161–173.
• [15] Keane, M. S. and Rolles, S. W. W. (2000). Edge-reinforced random walk on finite graphs. In Infinite Dimensional Stochastic Analysis (Amsterdam, 1999). Verh. Afd. Natuurkd. 1. Reeks. K. Ned. Akad. Wet. 52 217–234. R. Neth. Acad. Arts Sci., Amsterdam.
• [16] Lijoi, A., Mena, R. H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769–786.
• [17] Lijoi, A., Mena, R. H. and Prünster, I. (2007). A Bayesian nonparametric method for prediction in EST analysis. BMC Bioinformatics 8 339–349.
• [18] Merkl, F. and Rolles, S. W. W. (2009). Recurrence of edge-reinforced random walk on a two-dimensional graph. Ann. Probab. 37 1679–1714.
• [19] Pande, V. S., Beauchamp, K. and Bowman, G. R. (2010). Everything you wanted to know about Markov State Models but were afraid to ask. Methods 52 99–105.
• [20] Pitman, J. (1996). Some developments of the Blackwell–MacQueen urn scheme. In Statistics, Probability and Game Theory. Institute of Mathematical Statistics Lecture Notes—Monograph Series (T. S. Ferguson, L. S. Shapley and J. B. MacQueen, eds.) 30 245–267. IMS, Hayward, CA.
• [21] Pitman, J. and Yor, M. (1997). The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 855–900.
• [22] Propp, J. G. and Wilson, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. In Proceedings of the Seventh International Conference on Random Structures and Algorithms (Atlanta, GA, 1995) 9 223–252. Wiley, New York.
• [23] Rolles, S. W. W. (2003). How edge-reinforced random walk arises naturally. Probab. Theory Related Fields 126 243–260.
• [24] Shaw, D. E. (2010). Atomic-level characterization of the structural dynamics of proteins. Science 330 341–346.
• [25] Teh, Y. W. and Jordan, M. I. (2010). Hierarchical Bayesian nonparametric models with applications. In Bayesian Nonparametrics 158–207. Cambridge Univ. Press, Cambridge.
• [26] Teh, Y. W., Jordan, M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101 1566–1581.
• [27] Zabell, S. L. (1982). W. E. Johnson’s “sufficientness” postulate. Ann. Statist. 10 1090–1099 (1 plate).
• [28] Zabell, S. L. (2005). The continuum of inductive methods revisited. In Symmetry and its Discontents: Essays on the History of Inductive Probability. Cambridge Univ. Press, New York.

#### Supplemental materials

• Supplementary material: Appendices B, C and D. Appendix B describes the two-parameter HDP-HMM in relation to the $(\theta,\alpha,\beta)$ scheme. Appendix C contains all proofs from Sections 4, 5 and 6. Appendix D contains a derivation of the exact sampler mentioned in Section 7 using Coupling From the Past.