Bayesian Analysis

Recursive Learning for Sparse Markov Models

Jie Xiong, Väinö Jääskinen, and Jukka Corander

Full-text: Open access


Markov chains of higher order are popular models for a wide variety of applications in natural language and DNA sequence processing. However, since the number of parameters grows exponentially with the order of a Markov chain, several alternative model classes have been proposed that allow for stability and higher rate of data compression. The common notion to these models is that they cluster the possible sample paths used to predict the next state into invariance classes with identical conditional distributions assigned to the same class. The models vary in particular with respect to constraints imposed on legitime partitions of the sample paths. Here we consider the class of sparse Markov chains for which the partition is left unconstrained a priori. A recursive computation scheme based on Delaunay triangulation of the parameter space is introduced to enable fast approximation of the posterior mode partition. Comparisons with stochastic optimization, k-means and nearest neighbor algorithms show that our approach is both considerably faster and leads on average to a more accurate estimate of the underlying partition. We show additionally that the criterion used in the recursive steps for comparison of triangulation cell contents leads to consistent estimation of the local structure in the sparse Markov model.

Article information

Bayesian Anal., Volume 11, Number 1 (2016), 247-263.

First available in Project Euclid: 15 April 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

clustering Delaunay triangulation recursive learning sequence analysis sparse Markov chains


Xiong, Jie; Jääskinen, Väinö; Corander, Jukka. Recursive Learning for Sparse Markov Models. Bayesian Anal. 11 (2016), no. 1, 247--263. doi:10.1214/15-BA949.

Export citation


  • Begleiter, R., El-Yaniv, R., and Yona, G. (2004). “On prediction using variable order Markov models.” Journal of Artificial Intelligence Research, 22: 385–421.
  • Bishop, C. M. (2007). Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 1st edition.
  • Bühlmann, P., Wyner, A. J., et al. (1999). “Variable length Markov chains.” The Annals of Statistics, 27(2): 480–513.
  • Corander, J., Connor, T. R., O’Dwyer, C. A., Kroll, J. S., and Hanage, W. P. (2012). “Population structure in the Neisseria, and the biological significance of fuzzy species.” Journal of The Royal Society Interface, 9(71): 1208–1215.
  • Dahl, D. B., et al. (2009). “Modal clustering in a class of product partition models.” Bayesian Analysis, 4(2): 243–264.
  • De Berg, M., Van Kreveld, M., Overmars, M., and Schwarzkopf, O. C. (2000). Computational geometry. Springer.
  • Deng, M., Liu, Q., Cheng, T., and Shi, Y. (2011). “An adaptive spatial clustering algorithm based on Delaunay triangulation.” Computers, Environment and Urban Systems, 35(4): 320–332.
  • Farcomeni, A. (2011). “Hidden Markov partition models.” Statistics & Probability Letters, 81(12): 1766–1770.
  • Gonzalez-Lopez, J. E., et al. (2010). “Minimal Markov Models.” arXiv:1002.0729.
  • Hubert, L. and Arabie, P. (1985). “Comparing partitions.” Journal of Classification, 2(1): 193–218.
  • Jääskinen, V., Xiong, J., Corander, J., and Koski, T. (2013). “Sparse Markov Chains for Sequence Data.” Scandinavian Journal of Statistics.
  • Koski, T. (2001). Hidden Markov models for bioinformatics, volume 2. Springer.
  • Lee, D.-T. and Schachter, B. J. (1980). “Two algorithms for constructing a Delaunay triangulation.” International Journal of Computer & Information Sciences, 9(3): 219–242.
  • Liu, D., Nosovskiy, G. V., and Sourina, O. (2008). “Effective clustering and boundary detection algorithm based on Delaunay triangulation.” Pattern Recognition Letters, 29(9): 1261–1273.
  • Marttinen, P., Myllykangas, S., and Corander, J. (2009). “Bayesian clustering and feature selection for cancer tissue samples.” BMC Bioinformatics, 10(1): 90.
  • Rissanen, J., et al. (1983). “A universal data compression system.” IEEE Transactions on Information Theory, 29(5): 656–664.
  • Xiong, J., Jääskinen, V., and Corander, J. (2015). “Appendix of article by Xiong, Jääskinen, and Corander.” Bayesian Analysis.
  • Yang, X. and Cui, W. (2008). “A novel spatial clustering algorithm based on Delaunay triangulation.” In: International Conference on Earth Observation Data Processing and Analysis, 728530. International Society for Optics and Photonics.

Supplemental materials