The Annals of Statistics

Consistency of support vector machines for forecasting the evolution of an unknown ergodic dynamical system from observations with unknown noise

Ingo Steinwart and Marian Anghel

Full-text: Open access


We consider the problem of forecasting the next (observable) state of an unknown ergodic dynamical system from a noisy observation of the present state. Our main result shows, for example, that support vector machines (SVMs) using Gaussian RBF kernels can learn the best forecaster from a sequence of noisy observations if (a) the unknown observational noise process is bounded and has a summable α-mixing rate and (b) the unknown ergodic dynamical system is defined by a Lipschitz continuous function on some compact subset of ℝd and has a summable decay of correlations for Lipschitz continuous functions. In order to prove this result we first establish a general consistency result for SVMs and all stochastic processes that satisfy a mixing notion that is substantially weaker than α-mixing.

Article information

Ann. Statist., Volume 37, Number 2 (2009), 841-875.

First available in Project Euclid: 10 March 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62M20: Prediction [See also 60G25]; filtering [See also 60G35, 93E10, 93E11]
Secondary: 37D25: Nonuniformly hyperbolic systems (Lyapunov exponents, Pesin theory, etc.) 37C99: None of the above, but in this section 37M10: Time series analysis 60K99: None of the above, but in this section 62M10: Time series, auto-correlation, regression, etc. [See also 91B84] 62M45: Neural nets and related approaches 68Q32: Computational learning theory [See also 68T05] 68T05: Learning and adaptive systems [See also 68Q32, 91E40]

Observational noise model forecasting dynamical systems support vector machines consistency


Steinwart, Ingo; Anghel, Marian. Consistency of support vector machines for forecasting the evolution of an unknown ergodic dynamical system from observations with unknown noise. Ann. Statist. 37 (2009), no. 2, 841--875. doi:10.1214/07-AOS562.

Export citation


  • [1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404.
  • [2] Baladi, V. (2000). Positive Transfer Operators and Decay of Correlations. World Scientific, Singapore.
  • [3] Baladi, V. (2001). Decay of correlations. In 1999 AMS Summer Institute on Smooth Ergodic Theory and Applications 297–325. Amer. Math. Soc., Providence, RI.
  • [4] Baladi, V., Benedicks, M. and Maume-Deschamps, V. (2002). Almost sure rates of mixing for i.i.d. unimodal maps. Ann. E.N.S 35 77–126.
  • [5] Baladi, V., Kondah, A. and Schmitt, B. (1996). Random correlations for small perturbations of expanding maps. Random Comput. Dynam. 4 179–204.
  • [6] Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.
  • [7] Boser, B. E., Guyon, I. and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Computational Learning Theory 144–152.
  • [8] Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes, 2nd ed. Springer, New York.
  • [9] Bowen, R. (1975). Equilibrium States and the Ergodic Theory of Anosov Diffeomorphisms. Springer, Berlin.
  • [10] Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surveys 2 107–144.
  • [11] Bradley, R. C. (2005). Introduction to strong mixing conditions 13. Technical report, Dept. Mathematics, Indiana Univ., Bloomington.
  • [12] Christmann, A. and Steinwart, I. (2007). Consistency and robustness of kernel based regression. Bernoulli 13 799–819.
  • [13] Collet, P. (1999). A remark about uniform de-correlation prefactors. Technical report.
  • [14] Collet, P., Martinez, S. and Schmitt, B. (2002). Exponential inequalities for dynamical measures of expanding maps of the interval. Probab. Theory Related Fields 123 301–322.
  • [15] Cortes, C. and Vapnik, V. (1995). Support vector networks. Machine Learning 20 273–297.
  • [16] Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge Univ. Press.
  • [17] Davies, M. (1994). Noise reduction schemes for chaotic time series. Phys. D 79 174–192.
  • [18] DeVito, E., Rosasco, L., Caponnetto, A., Piana, M. and Verri, A. (2004). Some properties of regularized kernel methods. J. Mach. Learn. Res. 5 1363–1390.
  • [19] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • [20] Diestel, J. and Uhl, J. J. (1977). Vector Measures. Amer. Math. Soc., Providence, RI.
  • [21] Fan, J. and Yao, Q. (2003). Nonlinear Time Series. Springer, New York.
  • [22] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2003). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
  • [23] Kostelich, E. J. and Schreiber, T. (1993). Noise reduction in chaotic time-series data: A survey of common methods. Phys. Rev. E 48 1752–1763.
  • [24] Kostelich, E. J. and Yorke, J. A. (1990). Noise reduction: Finding the simplest dynamical system consistent with the data. Phys. D 41 183–196.
  • [25] Lalley, S. P. (1999). Beneath the noise, chaos. Ann. Statist. 27 461–479.
  • [26] Lalley, S. P. (2001). Removing the noise from chaos plus noise. In Nonlinear Dynamics and Statistics 233–244. Birkhäuser, Boston.
  • [27] Lalley, S. P. and Nobel, A. B. (2006). Denoising deterministic time series. Dyn. Partial Differ. Equ. 3 259–279.
  • [28] Luzzatto, S. (2006). Stochastic-like behaviour in nonuniformly expanding maps. In Handbook of Dynamical Systems 1B (B. Hasselblatt and A. Katok, eds.) 265–326. Elsevier, Amsterdam.
  • [29] Meir, R. (2000). Nonparametric time series prediction through adaptive model selection. Machine Learning 39 5–34.
  • [30] Modha, D. S. and Masry, E. (1998). Memory-universal prediction of stationary random processes. IEEE Trans. Inform. Theory 44 117–133.
  • [31] Nobel, A. B. (1999). Limits to classification and regression estimation from ergodic processes. Ann. Statist. 27 262–273.
  • [32] Nobel, A. B. (2001). Consistent estimation of a dynamical map. In Nonlinear Dynamics and Statistics 267–280. Birkhäuser, Boston.
  • [33] Ruelle, D. (1989). The thermodynamic formalism for expanding maps. Comm. Math. Phys. 125 239–262.
  • [34] Ruelle, D. (2004). Thermodynamic Formalism, 2nd ed. Cambridge Univ. Press.
  • [35] Sauer, T. (1992). A noise reduction method for signals from nonlinear systems. Phys. D 58 193–201.
  • [36] Schölkopf, B. and Smola, A. J. (2002). Learning with Kernels. MIT Press, Cambridge, MA.
  • [37] Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2 67–93.
  • [38] Steinwart, I., Hush, D. and Scovel, C. (2006). Function classes that approximate the Bayes risk. In Proceedings of the 19th Annual Conference on Learning Theory, COLT 2006 79–93. Springer, Berlin.
  • [39] Steinwart, I., Hush, D. and Scovel, C. (2009). Learning from dependent observations. J. Multivariate Anal. 100 175–194.
  • [40] Steinwart, I., Hush, D. and Scovel, C. (2006). The reproducing kernel Hilbert space of the Gaussian RBF kernel. IEEE Trans. Inform. Theory 52 4635–4643.
  • [41] Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B. and Vandewalle, J. (2002). Least Squares Support Vector Machines. World Scientific, Singapore.
  • [42] Vapnik, V. N. (1998). Statistical Learning Theory. Wiley, New York.
  • [43] Wahba, G. (1990). Spline Models for Observational Data. Series in Applied Mathematics 59. SIAM, Philadelphia.