## The Annals of Statistics

### Approximate group context tree

#### Abstract

We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and estimation method which is computationally efficient. We develop oracle and adaptivity inequalities, as well as model selection properties, that hold under continuity of the transition probabilities and polynomial $\beta$-mixing. In particular, model misspecification is allowed.

These results are applied to interesting families of processes. For Markov processes, we obtain uniform rate of convergence for the estimation error of transition probabilities as well as perfect model selection results. For chains of infinite order with complete connections, we obtain explicit uniform rates of convergence on the estimation of conditional probabilities, which have an explicit dependence on the processes’ continuity rates. Similar guarantees are also derived for renewal processes.

Our results are shown to be applicable to discrete stochastic dynamic programming problems and to dynamic discrete choice models. We also apply our estimator to a linguistic study, based on recent work by Galves et al. [Ann. Appl. Stat. 6 (2012) 186–209], of the rhythmic differences between Brazilian and European Portuguese.

#### Article information

Source
Ann. Statist., Volume 45, Number 1 (2017), 355-385.

Dates
Revised: December 2015
First available in Project Euclid: 21 February 2017

https://projecteuclid.org/euclid.aos/1487667626

Digital Object Identifier
doi:10.1214/16-AOS1455

Mathematical Reviews number (MathSciNet)
MR3611495

Zentralblatt MATH identifier
06710514

#### Citation

Belloni, Alexandre; Oliveira, Roberto I. Approximate group context tree. Ann. Statist. 45 (2017), no. 1, 355--385. doi:10.1214/16-AOS1455. https://projecteuclid.org/euclid.aos/1487667626

#### References

• [1] Aguirregabiria, V. and Mira, P. (2010). Dynamic discrete choice structural models: A survey. J. Econometrics 156 38–67.
• [2] Arellano, M. and Honoré, B. H. (2001). Panel data models: Some recent developments. Handb. Econom. 5 3229–3296.
• [3] Bejerano, G. (2004). Algorithms for variable length Markov chain modeling. Bioinformatics 20 788–789.
• [4] Belloni, A. and Oliveira, R. I. (2016). Supplement to “Approximate group context tree.” DOI:10.1214/16-AOS1455SUPP.
• [5] Bertsekas, D. P. (1987). Dynamic Programming: Deterministic and Stochastic Models. Prentice Hall, Englewood Cliffs, NJ.
• [6] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [7] Browning, M. and Carro, J. M. (2010). Heterogeneity in dynamic discrete choice models. Econom. J. 13 1–39.
• [8] Browning, M. and Carro, J. M. (2014). Dynamic binary outcome models with maximal heterogeneity. J. Econometrics 178 805–823.
• [9] Bühlmann, P. (1999). Efficient and adaptive post-model-selection estimators. J. Statist. Plann. Inference 79 1–9.
• [10] Bühlmann, P. (2000). Model selection for variable length Markov chains and tuning the context algorithm. Ann. Inst. Statist. Math. 52 287–315.
• [11] Bühlmann, P. and Wyner, A. J. (1999). Variable length Markov chains. Ann. Statist. 27 480–513.
• [12] Chernozhukov, V., Fernandez-Val, I., Hahn, J. and Newey, W. (2009). Identification and estimation of marginal effects in nonlinear panel models. Available at arXiv:0904.1990.
• [13] Csiszár, I. and Shields, P. C. (1996). Redundancy rates for renewal and other processes. IEEE Trans. Inform. Theory 42 2065–2072.
• [14] Csiszár, I. and Talata, Z. (2006). Context tree estimation for not necessarily finite memory processes, via BIC and MDL. IEEE Trans. Inform. Theory 52 1007–1016.
• [15] Farias, V. F., Moallemi, C. C., Van Roy, B. and Weissman, T. (2010). Universal reinforcement learning. IEEE Trans. Inform. Theory 56 2441–2454.
• [16] Ferrari, F. and Wyner, A. (2003). Estimation of general stationary processes by variable length Markov chains. Scand. J. Statist. 30 459–480.
• [17] Galves, A., Galves, C., García, J. E., Garcia, N. L. and Leonardi, F. (2012). Context tree selection and linguistic rhythm retrieval from written texts. Ann. Appl. Stat. 6 186–209.
• [18] Garivier, A. (2006). Redundancy of the context-tree weighting method on renewal and Markov renewal processes. IEEE Trans. Inform. Theory 52 5579–5586.
• [19] Garivier, A. and Leonardi, F. (2011). Context tree selection: A unifying view. Stochastic Process. Appl. 121 2488–2506.
• [20] Lepskiĭ, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatnost. i Primenen. 35 459–470.
• [21] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2010). Taking advantage of sparsity in multi-task learning. In Proc. Computational Learning Theory Conference (COLT 2009).
• [22] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
• [23] Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York.
• [24] Rissanen, J. (1983). A universal data compression system. IEEE Trans. Inform. Theory 29 656–664.
• [25] Ross, S. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, New York.
• [26] Talata, Z. and Duncan, T. (2009). Unrestricted bic context tree estimation for not necessarily finite memory processes. In 2009 IEEE International Symposium on Information Theory 724–728.
• [27] Vert, J.-P. (2001). Adaptive context trees and text clustering. IEEE Trans. Inform. Theory 47 1884–1901.
• [28] Willems, F. M. J., Shtarkov, Y. M. and Tjalkens, T. J. (1995). The context-tree weighting method: Basic properties. IEEE Trans. Inform. Theory 41 653–664.
• [29] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.

#### Supplemental materials

• Supplement to “Approximate group context tree”. We provide additional discussion on the oracle context tree, omitted proofs from Section 5, a compendium of Martingale results, minimax rates for chain with infinite connections, and simulation results.