The Annals of Statistics

Estimation of large covariance and precision matrices from temporally dependent observations

Abstract

We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the temporally dependent data. In particular, the rates of convergence are obtained for the generalized thresholding estimation of covariance and correlation matrices, and for the constrained $\ell_{1}$ minimization and the $\ell_{1}$ penalized likelihood estimation of precision matrix. Properties of sparsistency and sign-consistency are also established. A gap-block cross-validation method is proposed for the tuning parameter selection, which performs well in simulations. As a motivating example, we study the brain functional connectivity using resting-state fMRI time series data with long-range temporal dependence.

Article information

Source
Ann. Statist., Volume 47, Number 3 (2019), 1321-1350.

Dates
Revised: December 2017
First available in Project Euclid: 13 February 2019

https://projecteuclid.org/euclid.aos/1550026839

Digital Object Identifier
doi:10.1214/18-AOS1716

Mathematical Reviews number (MathSciNet)
MR3911114

Subjects
Primary: 62H12: Estimation
Secondary: 62H35: Image analysis

Citation

Shu, Hai; Nan, Bin. Estimation of large covariance and precision matrices from temporally dependent observations. Ann. Statist. 47 (2019), no. 3, 1321--1350. doi:10.1214/18-AOS1716. https://projecteuclid.org/euclid.aos/1550026839

References

• [1] Athreya, K. B. and Lahiri, S. N. (2006). Measure Theory and Probability Theory. Springer, New York.
• [2] Bai, J. and Ng, S. (2005). Tests for skewness, kurtosis, and normality for time series data. J. Bus. Econom. Statist. 23 49–60.
• [3] Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer, New York.
• [4] Bai, Z. D. and Yin, Y. Q. (1993). Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 21 1275–1294.
• [5] Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
• [6] Basu, S. and Michailidis, G. (2015). Regularized estimation in sparse high-dimensional time series models. Ann. Statist. 43 1535–1567.
• [7] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
• [8] Bhattacharjee, M. and Bose, A. (2014). Consistency of large dimensional sample covariance matrix under weak dependence. Stat. Methodol. 20 11–26.
• [9] Bhattacharjee, M. and Bose, A. (2014). Estimation of autocovariance matrices for infinite dimensional vector linear process. J. Time Series Anal. 35 262–281.
• [10] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• [11] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [12] Billingsley, P. (1995). Probability and Measure, 3rd ed. Wiley, New York.
• [13] Bochud, T. and Challet, D. (2007). Optimal approximations of power laws with exponentials: Application to volatility models with long memory. Quant. Finance 7 585–589.
• [14] Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107–144. Update of, and a supplement to, the 1986 original.
• [15] Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, 2nd ed. Springer, New York.
• [16] Buckner, R. L., Sepulcre, J., Talukdar, T., Krienen, F. M., Liu, H., Hedden, T., Andrews-Hanna, J. R., Sperling, R. A. and Johnson, K. A. (2009). Cortical hubs revealed by intrinsic functional connectivity: Mapping, assessment of stability, and relation to Alzheimer’s disease. J. Neurosci. 29 1860–1873.
• [17] Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106 672–684.
• [18] Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• [19] Cai, T. T., Liu, W. and Zhou, H. H. (2016). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Statist. 44 455–488.
• [20] Cai, T. T. and Yuan, M. (2012). Adaptive covariance matrix estimation through block thresholding. Ann. Statist. 40 2014–2042.
• [21] Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
• [22] Cai, T. T. and Zhou, H. H. (2012). Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist. 40 2389–2420.
• [23] Cao, G., Bachega, L. R. and Bouman, C. A. (2011). The sparse matrix transform for covariance estimation and analysis of high dimensional signals. IEEE Trans. Image Process. 20 625–640.
• [24] Chen, X., Xu, M. and Wu, W. B. (2013). Covariance and precision matrix estimation for high-dimensional time series. Ann. Statist. 41 2994–3021.
• [25] Chen, X., Xu, M. and Wu, W. B. (2016). Regularized estimation of linear functionals of precision matrices for high-dimensional time series. IEEE Trans. Signal Process. 64 6459–6470.
• [26] Chlebus, E. (2009). An approximate formula for a partial sum of the divergent $p$-series. Appl. Math. Lett. 22 732–737.
• [27] Ciuciu, P., Abry, P. and He, B. J. (2014). Interplay between functional connectivity and scale-free dynamics in intrinsic fMRI networks. NeuroImage 95 248–263.
• [28] Cole, M. W., Pathak, S. and Schneider, W. (2010). Identifying the brain’s most globally connected regions. Neuroimage 49 3132–3148.
• [29] Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Mathematical Series 9. Princeton Univ. Press, Princeton, NJ.
• [30] Demko, S., Moss, W. F. and Smith, P. W. (1984). Decay rates for inverses of band matrices. Math. Comp. 43 491–499.
• [31] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
• [32] Fang, Y., Wang, B. and Feng, Y. (2016). Tuning-parameter selection in regularized estimations of large covariance matrices. J. Stat. Comput. Simul. 86 494–509.
• [33] Fomin, V. (1999). Optimal Filtering. Vol. II: Spatio-Temporal Fields. Mathematics and Its Applications 481. Kluwer Academic, Dordrecht.
• [34] Foucart, S. and Rauhut, H. (2013). A Mathematical Introduction to Compressive Sensing. Birkhäuser/Springer, New York.
• [35] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
• [36] Geweke, J. and Porter-Hudak, S. (1983). The estimation and application of long memory time series models. J. Time Series Anal. 4 221–238.
• [37] Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD.
• [38] Granger, C. W. J. and Joyeux, R. (1980). An introduction to long-memory time series models and fractional differencing. J. Time Series Anal. 1 15–29.
• [39] Harrison, L., Penny, W. D. and Friston, K. (2003). Multivariate autoregressive modeling of fMRI time series. Neuroimage 19 1477–1491.
• [40] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
• [41] He, B. J. (2011). Scale-free properties of the functional magnetic resonance imaging signal during rest and task. J. Neurosci. 31 13786–13795.
• [42] Hinich, M. J. (1982). Testing for Gaussianity and linearity of a stationary time series. J. Time Series Anal. 3 169–176.
• [43] Hosking, J. R. M. (1981). Fractional differencing. Biometrika 68 165–176.
• [44] Hsieh, C.-J., Sustik, M. A., Dhillon, I. S. and Ravikumar, P. (2014). QUIC: Quadratic approximation for sparse inverse covariance estimation. J. Mach. Learn. Res. 15 2911–2947.
• [45] Hu, T.-C., Rosalsky, A. and Volodin, A. (2008). On convergence properties of sums of dependent random variables under second moment and covariance restrictions. Statist. Probab. Lett. 78 1999–2005.
• [46] Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
• [47] Jiang, T. (2004). The limiting distributions of eigenvalues of sample correlation matrices. Sankhyā 66 35–48.
• [48] Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
• [49] Li, X., Zhao, T., Yuan, X. and Liu, H. (2015). The flare package for high dimensional linear regression and precision matrix estimation in R. J. Mach. Learn. Res. 16 553–557.
• [50] Liu, H., Aue, A. and Paul, D. (2015). On the Marčenko–Pastur law for linear time series. Ann. Statist. 43 675–712.
• [51] Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10 2295–2328.
• [52] Mandelbrot, B. B. and Van Ness, J. W. (1968). Fractional Brownian motions, fractional noises and applications. SIAM Rev. 10 422–437.
• [53] Manolakis, D. G., Ingle, V. K. and Kogon, S. M. (2005). Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering, and Array Processing 46. Artech House, Norwood, MA.
• [54] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• [55] Palma, W. (2007). Long-Memory Time Series: Theory and Methods. Wiley-Interscience, Hoboken, NJ.
• [56] Power, J. D., Cohen, A. L., Nelson, S. M., Wig, G. S., Barnes, K. A., Church, J. A., Vogel, A. C., Laumann, T. O., Miezin, F. M., Schlaggar, B. L. et al. (2011). Functional network organization of the human brain. Neuron 72 665–678.
• [57] Priestley, M. B. and Subba Rao, T. (1969). A test for non-stationarity of time-series. J. Roy. Statist. Soc. Ser. B 31 140–149.
• [58] Racine, J. (2000). Consistent cross-validatory model-selection for dependent data: Hv-block cross-validation. J. Econometrics 99 39–61.
• [59] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_{1}$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
• [60] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
• [61] Rothman, A. J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104 177–186.
• [62] Rothman, A. J., Levina, E. and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97 539–550.
• [63] Rudelson, M. and Vershynin, R. (2013). Hanson–Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18 no. 82, 9.
• [64] Ryali, S., Chen, T., Supekar, K. and Menon, V. (2012). Estimation of functional connectivity in fMRI data using stability selection-based sparse partial correlation with elastic net penalty. NeuroImage 59 3852–3861.
• [65] Shu, H. and Nan, B. (2019). Supplement to “Estimation of large covariance and precision matrices from temporally dependent observations.” DOI:10.1214/18-AOS1716SUPP.
• [66] Sripada, C., Angstadt, M., Kessler, D., Phan, K. L., Liberzon, I., Evans, G. W., Welsh, R. C., Kim, P. and Swain, J. E. (2014). Volitional regulation of emotions produces distributed alterations in connectivity between visual, attention control, and default networks. NeuroImage 89 110–121.
• [67] Syed, M. N., Principe, J. C. and Pardalos, P. M. (2012). Correntropy in data classification. In Dynamics of Information Systems: Mathematical Foundations. Springer Proc. Math. Stat. 20 81–117. Springer, New York.
• [68] Tagliazucchi, E., von Wegner, F., Morzelewski, A., Brodbeck, V., Jahnke, K. and Laufs, H. (2013). Breakdown of long-range temporal dependence in default mode and attention networks during deep sleep. Proc. Natl. Acad. Sci. USA 110 15419–15424.
• [69] Taqqu, M. S. (2003). Fractional Brownian motion and long-range dependence. In Theory and Applications of Long-Range Dependence 5–38. Birkhäuser, Boston, MA.
• [70] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
• [71] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
• [72] Wu, W.-B. and Wu, Y. N. (2016). Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron. J. Stat. 10 352–379.
• [73] Wu, W. B. (2005). Nonlinear system theory: Another look at dependence. Proc. Natl. Acad. Sci. USA 102 14150–14154.
• [74] Wu, W. B. and Pourahmadi, M. (2009). Banding sample autocovariance matrices of stationary processes. Statist. Sinica 19 1755–1768.
• [75] Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11 2261–2286.
• [76] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
• [77] Zhou, S. (2014). Gemini: Graph estimation with matrix variate normal instances. Ann. Statist. 42 532–562.

Supplemental materials

• Supplement to “Estimation of large covariance and precision matrices from temporally dependent observations”. The Supplementary Material contains technical preparations, detailed proofs of the technical lemmas given in the Appendix and all the theorems in the main text, useful numerical considerations and additional results of the rfMRI data analysis.