• Bernoulli
  • Volume 18, Number 3 (2012), 883-913.

Model selection for weakly dependent time series forecasting

Pierre Alquier and Olivier Wintenberger

Full-text: Open access


Observing a stationary time series, we propose a two-steps procedure for the prediction of its next value. The first step follows machine learning theory paradigm and consists in determining a set of possible predictors as randomized estimators in (possibly numerous) different predictive models. The second step follows the model selection paradigm and consists in choosing one predictor with good properties among all the predictors of the first step. We study our procedure for two different types of observations: causal Bernoulli shifts and bounded weakly dependent processes. In both cases, we give oracle inequalities: the risk of the chosen predictor is close to the best prediction risk in all predictive models that we consider. We apply our procedure for predictive models as linear predictors, neural networks predictors and nonparametric autoregressive predictors.

Article information

Bernoulli, Volume 18, Number 3 (2012), 883-913.

First available in Project Euclid: 28 June 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

adaptative inference aggregation of estimators autoregression estimation model selection randomized estimators statistical learning time series prediction weak dependence


Alquier, Pierre; Wintenberger, Olivier. Model selection for weakly dependent time series forecasting. Bernoulli 18 (2012), no. 3, 883--913. doi:10.3150/11-BEJ359.

Export citation


  • [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) 267–281. Budapest: Akadémiai Kiadó.
  • [2] Alquier, P. (2008). PAC-Bayesian bounds for randomized empirical risk minimizers. Math. Methods Statist. 17 279–304.
  • [3] Andrews, D.W.K. (1984). Nonstrong mixing autoregressive processes. J. Appl. Probab. 21 930–934.
  • [4] Audibert, J.Y. (2004). Aggregated estimators and empirical complexity for least square regression. Ann. Inst. Henri Poincaré Probab. Stat. 40 685–736.
  • [5] Baraud, Y., Comte, F. and Viennet, G. (2001). Adaptive estimation in autoregression or β-mixing regression via model selection. Ann. Statist. 29 839–875.
  • [6] Barron, A.R. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning 14 115–133.
  • [7] Catoni, O. (2003). A PAC-Bayesian approach to adaptative classification. Preprint, Laboratoire de Probabilités et Modèles Aéatoires.
  • [8] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Berlin: Springer. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001.
  • [9] Catoni, O. (2007). Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. Institute of Mathematical Statistics Lecture Notes – Monograph Series 56. Beachwood, OH: IMS.
  • [10] Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge: Cambridge Univ. Press.
  • [11] Dalalyan, A. and Tsybakov, A. (2008). Aggregation by exponential weighting, sharp oracle inequalities and sparsity. Machine Learning 72 39–61.
  • [12] Dedecker, J., Doukhan, P., Lang, G., León R., J.R., Louhichi, S. and Prieur, C. (2007). Weak Dependence: With Examples and Applications. Lecture Notes in Statistics 190. New York: Springer.
  • [13] Dedecker, J. and Prieur, C. (2005). New dependence coefficients. Examples and applications to statistics. Probab. Theory Related Fields 132 203–236.
  • [14] Doukhan, P. (1994). Mixing: Properties and Examples. Lecture Notes in Statistics 85. New York: Springer.
  • [15] Doukhan, P. and Wintenberger, O. (2008). Weakly dependent chains with infinite memory. Stochastic Process. Appl. 118 1997–2013.
  • [16] Goldstein, S. (1978/79). Maximal coupling. Z. Wahrsch. Verw. Gebiete 46 193–204.
  • [17] Ibragimov, I. (1962). Some limit theorems for stationary processes. Theory Probab. Appl. 7 349–382.
  • [18] Ing, C.K. (2007). Accumulated prediction errors, information criteria and optimal forecasting for autoregressive time series. Ann. Statist. 35 1238–1277.
  • [19] Lacour, C. (2008). Nonparametric estimation of the stationary density and the transition density of a Markov chain. Stochastic Process. Appl. 118 232–260.
  • [20] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003.
  • [21] McAllester, D.A. (1998). Some PAC-Bayesian theorems. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) 230–234 (electronic). New York: ACM.
  • [22] Meir, R. (2000). Nonparametric model selection through adaptive model selection. Machine Learning 39 5–34.
  • [23] Modha, D.S. and Masry, E. (1998). Memory-universal prediction of stationary random processes. IEEE Trans. Inform. Theory 44 117–133.
  • [24] R Development Core Team (2008). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
  • [25] Rio, E. (2000). Inégalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes. C. R. Acad. Sci. Paris Sér. I Math. 330 905–908.
  • [26] Rio, E. (2000). Théorie Asymptotique des Processus Aléatoires Faiblement Dépendants. Mathématiques & Applications (Berlin) [Mathematics & Applications] 31. Berlin: Springer.
  • [27] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • [28] Shawe-Taylor, J. and Williamson, R. (1997). A pac analysis of a Bayes estimator. In Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT’97 2–9. New York: ACM.
  • [29] Stoltz, G. (2005). Information incomplète et regret interne en prédiction de suites individuelles. Ph.D. thesis, Univ. Paris Sud.
  • [30] Vapnik, V.N. (1995). The Nature of Statistical Learning Theory. New York: Springer.