The Annals of Statistics

Variable length Markov chains

Peter Bühlmann and Abraham J. Wyner

Full-text: Open access


We study estimation in the class of stationary variable length Markov chains (VLMC) on a finite space. The processes in this class are still Markovian of high order, but with memory of variable length yielding a much bigger and structurally richer class of models than ordinary high-order Markov chains. From an algorithmic view, the VLMC model class has attracted interest in information theory and machine learning, but statistical properties have not yet been explored. Provided that good estimation is available, the additional structural richness of the model class enhances predictive power by finding a better tradeoff between model bias and variance and allowing better structural description which can be of specific interest. The latter is exemplified with some DNA data.

A version of the tree-structured context algorithm, proposed by Rissanen in an information theoretical setup is shown to have new good asymptotic properties for estimation in the class of VLMCs. This remains true even when the underlying model increases in dimensionality. Fur-thermore, consistent estimation of minimal state spaces and mixing properties of fitted models are given.

We also propose a new bootstrap scheme based on fitted VLMCs. We show its validity for quite general stationary categorical time series and for a broad range of statistical procedures.

Article information

Ann. Statist. Volume 27, Number 2 (1999), 480-513.

First available in Project Euclid: 5 April 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62M05: Markov processes: estimation
Secondary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces) 62G09: Resampling methods 62M10: Time series, auto-correlation, regression, etc. [See also 91B84] 94A15: Information theory, general [See also 62B10, 81P94]

Bootstrap categorical time series central limit theorem context algorithm data compression finite-memory sources FSMX model Kullback-Leibler distance model selection tree model.


Bühlmann, Peter; Wyner, Abraham J. Variable length Markov chains. Ann. Statist. 27 (1999), no. 2, 480--513. doi:10.1214/aos/1018031204.

Export citation


  • BICKEL, P. J., GOTZE, F. and VAN ZWET, W. R. 1997. Resampling fewer than n observations: ¨ gains, losses, and remedies for losses. Statist. Sinica 7 1 32. Z.
  • BRAUN, J. V. and MULLER, H.-G. 1998. Statistical methods for DNA sequence. Statist. Sci. 13 ¨ 142 162. Z.
  • BREIMAN, L.; FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA. Z.
  • BRILLINGER, D. R. 1995. Trend analysis: binary-valued and point cases. Stochastic Hydrology and Hydraulics 9 207 213. Z.
  • BUHLMANN, P. 1999. Model selection for variable length Markov chains and tuning the context ¨ algorithm. Ann. Inst. Statist. Math. To appear. Z.
  • COVER, T. M. and THOMAS, J. A. 1991. Elements of Information Theory. Wiley, New York. Z.
  • DOUKHAN, P. 1994. Mixing Properties and Examples. Lecture Notes in Statist. 85. Springer, Berlin. Z.
  • EFRON, B. 1979. Bootstrap methods: another look at the jackknife. Ann. Statist. 7 1 26. Z.
  • FAHRMEIR, L. and TUTZ, G. 1994. Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, Berlin.Z.
  • FEDER, M.; MERHAV, N. and GUTMAN, M. 1992. Universal prediction of individual sequences. IEEE Trans. Inform. Theory IT-38 1258 1270. Z.
  • GUTTORP, P. 1995. Stochastic Modeling of Scientific Data. Chapman and Hall, London. Z.
  • IOSIFESCU, M. and THEODORESCU, R. 1969. Random Processes and Learning. Springer, Berlin.
  • KUNSCH, H. R. 1989. The jackknife and the bootstrap for general stationary observations. Ann. ¨ Statist. 17 1217 1241. Z.
  • PRUM, B.; RODOLPHE, F. and DE TURCKHEIM, E. 1995. Finding words with unexpected frequencies in deoxyribonucleic acid sequences. J. Roy. Statist. Soc. Ser. B 57 205 220. Z.
  • RAFTERY, A. and TAVARE, S. 1994. Estimation and modelling repeated patterns in high-order ´ Markov chains with the mixture transition distribution model. Appl. Statist. 43 179 199. Z.
  • RAJARSHI, M. B. 1990. Bootstrap in Markov-sequences based on estimates of transition density. Ann. Inst. Statist. Math. 42 253 268. Z.
  • RISSANEN, J. 1983. A universal data compression system. IEEE Trans. Inform. Theory IT-29 656 664. Z.
  • RISSANEN, J. 1986. Complexity of strings in the class of Markov sources. IEEE Trans. Inform. Theory IT-32 526 532. Z.
  • RISSANEN, J. 1989. Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore. Z.
  • RITOV, Y. and BICKEL, P. J. 1990. Achieving information bounds in nonand semiparametric models. Ann. Statist. 18 925 938. Z.
  • WEINBERGER, M. J. and FEDER, M. 1994. Predictive stochastic complexity and model estimation for finite-state processes. J. Statist. Plann. Inference 39 353 372. Z.
  • WEINBERGER, M. J., LEMPEL, A. and ZIV, J. 1992. A sequential algorithm for the universal coding of finite memory sources. IEEE Trans. Inform. Theory IT-38 1002 1014. Z.
  • WEINBERGER, M. J., RISSANEN, J. and FEDER, M. 1995. A universal finite memory source. IEEE Trans. Inform. Theory IT-41 643 652. Z.
  • WITHERS, C. S. 1981. Central limit theorems for dependent variables I. Z. Wahrsch. Verw.