Variable length Markov chains

Peter Bühlmann; Abraham J. Wyner

doi:10.1214/aos/1018031204

April 1999 Variable length Markov chains

Peter Bühlmann, Abraham J. Wyner

Ann. Statist. 27(2): 480-513 (April 1999). DOI: 10.1214/aos/1018031204

Abstract

We study estimation in the class of stationary variable length Markov chains (VLMC) on a finite space. The processes in this class are still Markovian of high order, but with memory of variable length yielding a much bigger and structurally richer class of models than ordinary high-order Markov chains. From an algorithmic view, the VLMC model class has attracted interest in information theory and machine learning, but statistical properties have not yet been explored. Provided that good estimation is available, the additional structural richness of the model class enhances predictive power by finding a better tradeoff between model bias and variance and allowing better structural description which can be of specific interest. The latter is exemplified with some DNA data.

A version of the tree-structured context algorithm, proposed by Rissanen in an information theoretical setup is shown to have new good asymptotic properties for estimation in the class of VLMCs. This remains true even when the underlying model increases in dimensionality. Fur-thermore, consistent estimation of minimal state spaces and mixing properties of fitted models are given.

We also propose a new bootstrap scheme based on fitted VLMCs. We show its validity for quite general stationary categorical time series and for a broad range of statistical procedures.

Citation

Download Citation

Peter Bühlmann. Abraham J. Wyner. "Variable length Markov chains." Ann. Statist. 27 (2) 480 - 513, April 1999. https://doi.org/10.1214/aos/1018031204

Information

Published: April 1999

First available in Project Euclid: 5 April 2002

zbMATH: 0983.62048

MathSciNet: MR1714720

Digital Object Identifier: 10.1214/aos/1018031204

Subjects:

Primary: 62M05

Secondary: 60J10 , 62G09 , 62M10 , 94A15

Keywords: bootstrap , categorical time series , central limit theorem , context algorithm , data compression , finite-memory sources , FSMX model , Kullback-Leibler distance , Model selection , tree model.

Access the abstract

JOURNAL ARTICLE
34 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY