We establish a Bernstein-type inequality for a class of stochastic processes that includes the classical geometrically $\phi$-mixing processes, Rio’s generalization of these processes and many time-discrete dynamical systems. Modulo a logarithmic factor and some constants, our Bernstein-type inequality coincides with the classical Bernstein inequality for i.i.d. data. We further use this new Bernstein-type inequality to derive an oracle inequality for generic regularized empirical risk minimization algorithms and data generated by such processes. Applying this oracle inequality to support vector machines using the Gaussian kernels for binary classification, we obtain essentially the same rate as for i.i.d. processes, and for least squares and quantile regression; it turns out that the resulting learning rates match, up to some arbitrarily small extra term in the exponent, the optimal rates for i.i.d. processes.
"A Bernstein-type inequality for some mixing processes and dynamical systems with an application to learning." Ann. Statist. 45 (2) 708 - 743, April 2017. https://doi.org/10.1214/16-AOS1465