Open Access
August 2012 Model selection for weakly dependent time series forecasting
Pierre Alquier, Olivier Wintenberger
Bernoulli 18(3): 883-913 (August 2012). DOI: 10.3150/11-BEJ359
Abstract

Observing a stationary time series, we propose a two-steps procedure for the prediction of its next value. The first step follows machine learning theory paradigm and consists in determining a set of possible predictors as randomized estimators in (possibly numerous) different predictive models. The second step follows the model selection paradigm and consists in choosing one predictor with good properties among all the predictors of the first step. We study our procedure for two different types of observations: causal Bernoulli shifts and bounded weakly dependent processes. In both cases, we give oracle inequalities: the risk of the chosen predictor is close to the best prediction risk in all predictive models that we consider. We apply our procedure for predictive models as linear predictors, neural networks predictors and nonparametric autoregressive predictors.

References

1.

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) 267–281. Budapest: Akadémiai Kiadó. MR483125 0283.62006[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) 267–281. Budapest: Akadémiai Kiadó. MR483125 0283.62006

2.

[2] Alquier, P. (2008). PAC-Bayesian bounds for randomized empirical risk minimizers. Math. Methods Statist. 17 279–304. MR2483458 05614400 10.3103/S1066530708040017[2] Alquier, P. (2008). PAC-Bayesian bounds for randomized empirical risk minimizers. Math. Methods Statist. 17 279–304. MR2483458 05614400 10.3103/S1066530708040017

3.

[3] Andrews, D.W.K. (1984). Nonstrong mixing autoregressive processes. J. Appl. Probab. 21 930–934. MR766830 10.2307/3213710 0552.60049[3] Andrews, D.W.K. (1984). Nonstrong mixing autoregressive processes. J. Appl. Probab. 21 930–934. MR766830 10.2307/3213710 0552.60049

4.

[4] Audibert, J.Y. (2004). Aggregated estimators and empirical complexity for least square regression. Ann. Inst. Henri Poincaré Probab. Stat. 40 685–736. MR2096215 1052.62037 10.1016/j.anihpb.2003.11.006[4] Audibert, J.Y. (2004). Aggregated estimators and empirical complexity for least square regression. Ann. Inst. Henri Poincaré Probab. Stat. 40 685–736. MR2096215 1052.62037 10.1016/j.anihpb.2003.11.006

5.

[5] Baraud, Y., Comte, F. and Viennet, G. (2001). Adaptive estimation in autoregression or β-mixing regression via model selection. Ann. Statist. 29 839–875. MR1865343 1012.62034 10.1214/aos/1009210692 euclid.aos/1009210692 [5] Baraud, Y., Comte, F. and Viennet, G. (2001). Adaptive estimation in autoregression or β-mixing regression via model selection. Ann. Statist. 29 839–875. MR1865343 1012.62034 10.1214/aos/1009210692 euclid.aos/1009210692

6.

[6] Barron, A.R. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning 14 115–133. 0818.68127[6] Barron, A.R. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning 14 115–133. 0818.68127

7.

[7] Catoni, O. (2003). A PAC-Bayesian approach to adaptative classification. Preprint, Laboratoire de Probabilités et Modèles Aéatoires. MR1959840 1012.60023 10.1016/S0246-0203(02)00017-1[7] Catoni, O. (2003). A PAC-Bayesian approach to adaptative classification. Preprint, Laboratoire de Probabilités et Modèles Aéatoires. MR1959840 1012.60023 10.1016/S0246-0203(02)00017-1

8.

[8] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Berlin: Springer. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001. MR2163920 1076.93002[8] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Berlin: Springer. Lecture notes from the 31st Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001. MR2163920 1076.93002

9.

[9] Catoni, O. (2007). Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. Institute of Mathematical Statistics Lecture Notes – Monograph Series 56. Beachwood, OH: IMS. MR2483528 05544465[9] Catoni, O. (2007). Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. Institute of Mathematical Statistics Lecture Notes – Monograph Series 56. Beachwood, OH: IMS. MR2483528 05544465

10.

[10] Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge: Cambridge Univ. Press. MR2409394 1114.91001[10] Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge: Cambridge Univ. Press. MR2409394 1114.91001

11.

[11] Dalalyan, A. and Tsybakov, A. (2008). Aggregation by exponential weighting, sharp oracle inequalities and sparsity. Machine Learning 72 39–61.[11] Dalalyan, A. and Tsybakov, A. (2008). Aggregation by exponential weighting, sharp oracle inequalities and sparsity. Machine Learning 72 39–61.

12.

[12] Dedecker, J., Doukhan, P., Lang, G., León R., J.R., Louhichi, S. and Prieur, C. (2007). Weak Dependence: With Examples and Applications. Lecture Notes in Statistics 190. New York: Springer. MR2338725 1165.62001[12] Dedecker, J., Doukhan, P., Lang, G., León R., J.R., Louhichi, S. and Prieur, C. (2007). Weak Dependence: With Examples and Applications. Lecture Notes in Statistics 190. New York: Springer. MR2338725 1165.62001

13.

[13] Dedecker, J. and Prieur, C. (2005). New dependence coefficients. Examples and applications to statistics. Probab. Theory Related Fields 132 203–236. MR2199291 1061.62058 10.1007/s00440-004-0394-3[13] Dedecker, J. and Prieur, C. (2005). New dependence coefficients. Examples and applications to statistics. Probab. Theory Related Fields 132 203–236. MR2199291 1061.62058 10.1007/s00440-004-0394-3

14.

[14] Doukhan, P. (1994). Mixing: Properties and Examples. Lecture Notes in Statistics 85. New York: Springer. MR1312160 0801.60027[14] Doukhan, P. (1994). Mixing: Properties and Examples. Lecture Notes in Statistics 85. New York: Springer. MR1312160 0801.60027

15.

[15] Doukhan, P. and Wintenberger, O. (2008). Weakly dependent chains with infinite memory. Stochastic Process. Appl. 118 1997–2013. MR2462284 1166.60031 10.1016/j.spa.2007.12.004[15] Doukhan, P. and Wintenberger, O. (2008). Weakly dependent chains with infinite memory. Stochastic Process. Appl. 118 1997–2013. MR2462284 1166.60031 10.1016/j.spa.2007.12.004

16.

[16] Goldstein, S. (1978/79). Maximal coupling. Z. Wahrsch. Verw. Gebiete 46 193–204. MR516740 0398.60097[16] Goldstein, S. (1978/79). Maximal coupling. Z. Wahrsch. Verw. Gebiete 46 193–204. MR516740 0398.60097

17.

[17] Ibragimov, I. (1962). Some limit theorems for stationary processes. Theory Probab. Appl. 7 349–382. MR148125 0119.14204[17] Ibragimov, I. (1962). Some limit theorems for stationary processes. Theory Probab. Appl. 7 349–382. MR148125 0119.14204

18.

[18] Ing, C.K. (2007). Accumulated prediction errors, information criteria and optimal forecasting for autoregressive time series. Ann. Statist. 35 1238–1277. MR2341705 1118.62097 10.1214/009053606000001550 euclid.aos/1185304005 [18] Ing, C.K. (2007). Accumulated prediction errors, information criteria and optimal forecasting for autoregressive time series. Ann. Statist. 35 1238–1277. MR2341705 1118.62097 10.1214/009053606000001550 euclid.aos/1185304005

19.

[19] Lacour, C. (2008). Nonparametric estimation of the stationary density and the transition density of a Markov chain. Stochastic Process. Appl. 118 232–260. MR2376901 10.1016/j.spa.2007.04.013 1129.62028[19] Lacour, C. (2008). Nonparametric estimation of the stationary density and the transition density of a Markov chain. Stochastic Process. Appl. 118 232–260. MR2376901 10.1016/j.spa.2007.04.013 1129.62028

20.

[20] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003. MR2319879 1170.60006[20] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003. MR2319879 1170.60006

21.

[21] McAllester, D.A. (1998). Some PAC-Bayesian theorems. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) 230–234 (electronic). New York: ACM. MR1811587 0945.68157[21] McAllester, D.A. (1998). Some PAC-Bayesian theorems. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (Madison, WI, 1998) 230–234 (electronic). New York: ACM. MR1811587 0945.68157

22.

[22] Meir, R. (2000). Nonparametric model selection through adaptive model selection. Machine Learning 39 5–34. 0954.68124 10.1023/A:1007602715810[22] Meir, R. (2000). Nonparametric model selection through adaptive model selection. Machine Learning 39 5–34. 0954.68124 10.1023/A:1007602715810

23.

[23] Modha, D.S. and Masry, E. (1998). Memory-universal prediction of stationary random processes. IEEE Trans. Inform. Theory 44 117–133. MR1486652 10.1109/18.650998 0938.62106[23] Modha, D.S. and Masry, E. (1998). Memory-universal prediction of stationary random processes. IEEE Trans. Inform. Theory 44 117–133. MR1486652 10.1109/18.650998 0938.62106

24.

[24] R Development Core Team (2008). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.[24] R Development Core Team (2008). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.

25.

[25] Rio, E. (2000). Inégalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes. C. R. Acad. Sci. Paris Sér. I Math. 330 905–908. MR1771956 10.1016/S0764-4442(00)00290-1 0961.60032[25] Rio, E. (2000). Inégalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes. C. R. Acad. Sci. Paris Sér. I Math. 330 905–908. MR1771956 10.1016/S0764-4442(00)00290-1 0961.60032

26.

[26] Rio, E. (2000). Théorie Asymptotique des Processus Aléatoires Faiblement Dépendants. Mathématiques & Applications (Berlin) [Mathematics & Applications] 31. Berlin: Springer. MR2117923[26] Rio, E. (2000). Théorie Asymptotique des Processus Aléatoires Faiblement Dépendants. Mathématiques & Applications (Berlin) [Mathematics & Applications] 31. Berlin: Springer. MR2117923

27.

[27] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464. MR468014 0379.62005 10.1214/aos/1176344136 euclid.aos/1176344136 [27] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464. MR468014 0379.62005 10.1214/aos/1176344136 euclid.aos/1176344136

28.

[28] Shawe-Taylor, J. and Williamson, R. (1997). A pac analysis of a Bayes estimator. In Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT’97 2–9. New York: ACM.[28] Shawe-Taylor, J. and Williamson, R. (1997). A pac analysis of a Bayes estimator. In Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT’97 2–9. New York: ACM.

29.

[29] Stoltz, G. (2005). Information incomplète et regret interne en prédiction de suites individuelles. Ph.D. thesis, Univ. Paris Sud.[29] Stoltz, G. (2005). Information incomplète et regret interne en prédiction de suites individuelles. Ph.D. thesis, Univ. Paris Sud.

30.

[30] Vapnik, V.N. (1995). The Nature of Statistical Learning Theory. New York: Springer. MR1367965 0833.62008[30] Vapnik, V.N. (1995). The Nature of Statistical Learning Theory. New York: Springer. MR1367965 0833.62008
Copyright © 2012 Bernoulli Society for Mathematical Statistics and Probability
Pierre Alquier and Olivier Wintenberger "Model selection for weakly dependent time series forecasting," Bernoulli 18(3), 883-913, (August 2012). https://doi.org/10.3150/11-BEJ359
Published: August 2012
Vol.18 • No. 3 • August 2012
Back to Top