Electronic Journal of Statistics

Estimating beta-mixing coefficients via histograms

Daniel J. McDonald, Cosma Rohilla Shalizi, and Mark Schervish

Full-text: Open access


The literature on statistical learning for time series often assumes asymptotic independence or “mixing” of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing coefficients from data. Additionally, for many common classes of processes (Markov processes, ARMA processes, etc.) general functional forms for various mixing rates are known, but not specific coefficients. We present the first estimator for beta-mixing coefficients based on a single stationary sample path and show that it is risk consistent. Since mixing rates depend on infinite-dimensional dependence, we use a Markov approximation based on only a finite memory length $d$. We present convergence rates for the Markov approximation and show that as $d\rightarrow\infty$, the Markov approximation converges to the true mixing coefficient. Our estimator is constructed using $d$-dimensional histogram density estimates. Allowing asymptotics in the bandwidth as well as the dimension, we prove $L^{1}$ concentration for the histogram as an intermediate step. Simulations wherein the mixing rates are calculable and a real-data example demonstrate our methodology.

Article information

Electron. J. Statist. Volume 9, Number 2 (2015), 2855-2883.

Received: December 2014
First available in Project Euclid: 31 December 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Density estimation dependence time-series total-variation mixing absolutely regular processes histograms


McDonald, Daniel J.; Shalizi, Cosma Rohilla; Schervish, Mark. Estimating beta-mixing coefficients via histograms. Electron. J. Statist. 9 (2015), no. 2, 2855--2883. doi:10.1214/15-EJS1094. https://projecteuclid.org/euclid.ejs/1451577417.

Export citation


  • [1] Athreya, K. B. and Pantula, S. G. (1986). A note on strong mixing of ARMA processes., Statistics & Probability Letters 4 187–190.
  • [2] Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates., The Annals of Statistics 1 1071–1095.
  • [3] Bosq, D. (1998)., Nonparametric Statistics for Stochastic Processes: Estimation and Prediction, 2nd ed. Springer Verlag, New York.
  • [4] Bradley, R. C. (1983). Absolute regularity and functions of Markov chains., Stochastic Processes and their Applications 14 67–77.
  • [5] Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions., Probability Surveys 2 107–144.
  • [6] Carrasco, M. and Chen, X. (2002). Mixing and moment properties of various GARCH and stochastic volatility models., Econometric Theory 18 17–39.
  • [7] Corless, R. M., Gonnet, G. H., Hare, D. E. G., Jeffrey, D. J. and Knuth, D. E. (1996). On the Lambert $W$ function., Advances in Computational Mathematics 5 329–359.
  • [8] Davydov, Y. A. (1973). Mixing conditions for Markov chains., Theory of Probability and its Applications 18 312–328.
  • [9] Dedecker, J., Doukhan, P., Lang, G., Leon R., J. R., Louhichi, S. and Prieur, C. (2007)., Weak Dependence: With Examples and Applications. Springer Verlag, New York.
  • [10] Devroye, L. and Györfi, L. (1985)., Nonparametric Density Estimation: The $L_1$ View. John Wiley & Sons, Inc., New York.
  • [11] Doukhan, P. (1994)., Mixing: Properties and Examples. Springer Verlag, New York.
  • [12] Eberlein, E. (1984). Weak convergence of partial sums of absolutely regular sequences., Statistics & Probability Letters 2 291–293.
  • [13] Freedman, D. and Diaconis, P. (1981a). On the histogram as a density estimator: $L_2$ theory., Probability Theory and Related Fields 57 453–476.
  • [14] Freedman, D. and Diaconis, P. (1981b). On the maximum deviation between the histogram and the underlying density., Probability Theory and Related Fields 58 139–167.
  • [15] Fryzlewicz, P. and Subba Rao, S. (2011). Mixing properties of ARCH and time-varying ARCH processes., Bernoulli 17 320–346.
  • [16] Gneiting, T., Balabdaoui, F. and Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69 243–268.
  • [17] Halmos, P. R. (1974)., Measure Theory. Graduate Texts in Mathematics. Springer-Verlag, New York.
  • [18] Hansen, L. P. and Heckman, J. J. (1996). The empirical foundations of calibration., The Journal of Economic Perspectives 87–104.
  • [19] Kantz, H. and Schreiber, T. (2004)., Nonlinear time series analysis 7. Cambridge university press.
  • [20] Karandikar, R. L. and Vidyasagar, M. (2009). Probably Approximately Correct Learning with Beta-Mixing Input Sequences. submitted for, publication.
  • [21] McDiarmid, C. (1989). On the Method of Bounded Differences. In, Surveys in Combinatorics (J. Siemons, ed.) 148–188. Cambridge University Press.
  • [22] Meir, R. (2000). Nonparametric time series prediction through adaptive model selection., Machine Learning 39 5–34.
  • [23] Mohri, M. and Rostamizadeh, A. (2010). Stability bounds for stationary $\varphi$-mixing and $\beta$-mixing processes., Journal of Machine Learning Research 11 789–814.
  • [24] Mokkadem, A. (1988). Mixing properties of ARMA processes., Stochastic Processes and their Applications 29 309–315.
  • [25] Nobel, A. B. (2006). Hypothesis testing for families of ergodic processes., Bernoulli 12 251–269.
  • [26] Pham, T. D. and Tran, L. T. (1985). Some mixing properties of time series models., Stochastic processes and their applications 19 297–303.
  • [27] Schervish, M. J. (1995)., Theory of statistics. Springer Series in Statistics. Springer Verlag, New York.
  • [28] Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a density and its derivatives., The Annals of Statistics 6 177–184.
  • [29] Steinwart, I. and Anghel, M. (2009). Consistency of support vector machines for forecasting the evolution of an unknown ergodic dynamical system from observations with unknown noise., The Annals of Statistics 37 841–875.
  • [30] Tran, L. T. (1989). The $L_1$ convergence of kernel density estimates under dependence., The Canadian Journal of Statistics/La Revue Canadienne de Statistique 17 197–208.
  • [31] Tran, L. T. (1994). Density estimation for time series by histograms., Journal of statistical planning and inference 40 61–79.
  • [32] Vapnik, V. N. (2000)., The Nature of Statistical Learning Theory, 2nd ed. Springer Verlag, New York.
  • [33] Vidyasagar, M. (1997)., A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems. Springer Verlag, Berlin.
  • [34] Volkonskii, V. and Rozanov, Y. A. (1959). Some limit theorems for random functions. I., Theory of Probability and its Applications 4 178–197.
  • [35] Weiss, B. (1973). Subshifts of finite type and sofic systems., Monatshefte für Mathematik 77 462–474.
  • [36] Withers, C. S. (1981). Conditions for linear processes to be strong-mixing., Probability Theory and Related Fields 57 477–480.
  • [37] Woodroofe, M. (1967). On the maximum deviation of the sample density., The Annals of Mathematical Statistics 38 475–481.
  • [38] Yu, B. (1993). Density estimation in the $L_\infty$ norm for dependent data with applications to the Gibbs sampler., Annals of Statistics 21 711–735.
  • [39] Yu, B. (1994). Rates of convergence for empirical processes of stationary mixing sequences., The Annals of Probability 22 94–116.