## Electronic Journal of Statistics

### Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation

#### Abstract

This is an expository paper that reviews recent developments on optimal estimation of structured high-dimensional covariance and precision matrices. Minimax rates of convergence for estimating several classes of structured covariance and precision matrices, including bandable, Toeplitz, sparse, and sparse spiked covariance matrices as well as sparse precision matrices, are given under the spectral norm loss. Data-driven adaptive procedures for estimating various classes of matrices are presented. Some key technical tools including large deviation results and minimax lower bound arguments that are used in the theoretical analyses are discussed. In addition, estimation under other losses and a few related problems such as Gaussian graphical models, sparse principal component analysis, factor models, and hypothesis testing on the covariance structure are considered. Some open problems on estimating high-dimensional covariance and precision matrices and their functionals are also discussed.

#### Article information

Source
Electron. J. Statist., Volume 10, Number 1 (2016), 1-59.

Dates
First available in Project Euclid: 17 February 2016

https://projecteuclid.org/euclid.ejs/1455715952

Digital Object Identifier
doi:10.1214/15-EJS1081

Mathematical Reviews number (MathSciNet)
MR3466172

Zentralblatt MATH identifier
1331.62272

Subjects
Primary: 62H12: Estimation
Secondary: 62F12: Asymptotic properties of estimators 62G09: Resampling methods

#### Citation

Cai, T. Tony; Ren, Zhao; Zhou, Harrison H. Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electron. J. Statist. 10 (2016), no. 1, 1--59. doi:10.1214/15-EJS1081. https://projecteuclid.org/euclid.ejs/1455715952

#### References

• Agarwal, A., S. Negahban, and M. J. Wainwright (2012). Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions., The Annals of Statistics 40(2), 1171–1197.
• Amini, A. A. and M. J. Wainwright (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components., The Annals of Statistics 37(5), 2877–2921.
• Anderson, T. W. (2003)., An Introduction to Multivariate Statistical Analysis (3rd ed.). Wiley.
• Assouad, P. (1983). Deux remarques sur l’estimation., Comptes rendus des séances de l’Académie des sciences. Série 1, Mathématique 296(23), 1021–1024.
• Bai, Z., D. Jiang, J.-F. Yao, and S. Zheng (2009). Corrections to LRT on large-dimensional covariance matrix by RMT., The Annals of Statistics 37(6B), 3822–3840.
• Baik, J., G. Ben Arous, and S. Péché (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices., The Annals of Probability 33(5), 1643–1697.
• Banerjee, O., L. El Ghaoui, and A. d’Aspremont (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data., The Journal of Machine Learning Research 9, 485–516.
• Basu, S. and G. Michailidis (2015). Regularized estimation in sparse high-dimensional time series models., The Annals of Statistics 43(4), 1535–1567.
• Berthet, Q. and P. Rigollet (2013). Optimal detection of sparse principal components in high dimension., The Annals of Statistics 41(4), 1780–1815.
• Bickel, P. J. and E. Levina (2004). Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations., Bernoulli 10(6), 989–1010.
• Bickel, P. J. and E. Levina (2008a). Regularized estimation of large covariance matrices., The Annals of Statistics 36(1), 199–227.
• Bickel, P. J. and E. Levina (2008b). Covariance regularization by thresholding., The Annals of Statistics 36(6), 2577–2604.
• Bickel, P. J. and Y. Ritov (1988). Estimating integrated squared density derivatives: Sharp best order of convergence estimates., Sankhyā: The Indian Journal of Statistics, Series A, 381–393.
• Bickel, P. J., Y. Ritov, and A. B. Tsybakov (2009). Simultaneous analysis of Lasso and Dantzig selector., The Annals of Statistics 37(4), 1705–1732.
• Birke, M. and H. Dette (2005). A note on testing the covariance matrix for large dimension., Statistics & Probability Letters 74(3), 281–289.
• Birnbaum, A., I. M. Johnstone, B. Nadler, and D. Paul (2013). Minimax bounds for sparse PCA with noisy high-dimensional data., The Annals of Statistics 41(3), 1055–1084.
• Cai, T. T. and T. Jiang (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices., The Annals of Statistics 39(3), 1496–1525.
• Cai, T. T., T. Liang, and H. H. Zhou (2015). Law of log determinant of sample covariance matrix and optimal estimation of differential entropy for high-dimensional Gaussian distributions., Journal of Multivariate Analysis 137, 161–172.
• Cai, T. T. and W. Liu (2011a). Adaptive thresholding for sparse covariance matrix estimation., Journal of the American Statistical Association 106(494), 672–684.
• Cai, T. T. and W. Liu (2011b). A direct estimation approach to sparse linear discriminant analysis., Journal of the American Statistical Association 106(496), 1566–1577.
• Cai, T. T., W. Liu, and X. Luo (2011). A constrained $\ell _1$ minimization approach to sparse precision matrix estimation., Journal of the American Statistical Association 106(494), 594–607.
• Cai, T. T., W. Liu, and Y. Xia (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings., Journal of the American Statistical Association 108(501), 265–277.
• Cai, T. T., W. Liu, and H. H. Zhou (2012). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation., arXiv preprint arXiv:1212.2882.
• Cai, T. T. and Z. Ma (2013). Optimal hypothesis testing for high dimensional covariance matrices., Bernoulli 19(5B), 2359–2388.
• Cai, T. T., Z. Ma, and Y. Wu (2013). Sparse PCA: Optimal rates and adaptive estimation., The Annals of Statistics 41(6), 3074–3110.
• Cai, T. T., Z. Ma, and Y. Wu (2015). Optimal estimation and rank detection for sparse spiked covariance matrices., Probability Theory and Related Fields 161(3-4), 781–815.
• Cai, T. T., Z. Ren, and H. H. Zhou (2013). Optimal rates of convergence for estimating Toeplitz covariance matrices., Probability Theory and Related Fields 156(1-2), 101–143.
• Cai, T. T. and M. Yuan (2012). Adaptive covariance matrix estimation through block thresholding., The Annals of Statistics 40(4), 2014–2042.
• Cai, T. T., C. H. Zhang, and H. H. Zhou (2010). Optimal rates of convergence for covariance matrix estimation., The Annals of Statistics 38(4), 2118–2144.
• Cai, T. T. and H. H. Zhou (2012a). Minimax estimation of large covariance matrices under $\ell_1$ norm., Statistica Sinica 22(4), 1319–1349.
• Cai, T. T. and H. H. Zhou (2012b). Optimal rates of convergence for sparse covariance matrix estimation., The Annals of Statistics 40(5), 2389–2420.
• Candès, E. J., X. Li, Y. Ma, and J. Wright (2011). Robust principal component analysis?, Journal of the ACM (JACM) 58(3), 11.
• Chamberlain, G. and M. Rothschild (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets., Econometrica 51(5), 1281–304.
• Chandrasekaran, V., P. A. Parrilo, and A. S. Willsky (2012). Latent variable graphical model selection via convex optimization., The Annals of Statistics 40(4), 1935–1967.
• Chen, S. X., L. X. Zhang, and P. S. Zhong (2010). Tests for high-dimensional covariance matrices., Journal of the American Statistical Association 105(490), 810–819.
• Chen, X., M. Xu, and W. B. Wu (2013). Covariance and precision matrix estimation for high-dimensional time series., The Annals of Statistics 41(6), 2994–3021.
• d’Aspremont, A., O. Banerjee, and L. El Ghaoui (2008). First-order methods for sparse covariance selection., SIAM Journal on Matrix Analysis and Applications 30(1), 56–66.
• d’Aspremont, A., L. El Ghaoui, M. I. Jordan, and G. R. Lanckriet (2007). A direct formulation for sparse PCA using semidefinite programming., SIAM Review 49(3), 434–448.
• Davidson, K. R. and S. J. Szarek (2001). Local operator theory, random matrices and Banach spaces., Handbook of the Geometry of Banach Spaces 1, 317–366.
• Davis, C. and W. M. Kahan (1970). The rotation of eigenvectors by a perturbation. III., SIAM Journal on Numerical Analysis 7(1), 1–46.
• Donoho, D. L. and R. C. Liu (1991). Geometrizing rates of convergence, II., The Annals of Statistics 19(2), 633–667.
• El Karoui, N. (2003). On the largest eigenvalue of Wishart matrices with identity covariance when n, p and p/n tend to infinity., arXiv preprint math/0309355.
• El Karoui, N. (2007). Tracy-Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices., The Annals of Probability 35(2), 663–714.
• El Karoui, N. (2008). Operator norm consistent estimation of large dimensional sparse covariance matrices., The Annals of Statistics 36(6), 2717–2756.
• El Karoui, N. and H. Kösters (2011). Geometric sensitivity of random matrix results: Consequences for shrinkage estimators of covariance and related statistical methods., arXiv preprint arXiv:1105.1404.
• Engle, R. and M. Watson (1981). A one-factor multivariate time series model of metropolitan wage rates., Journal of the American Statistical Association 76(376), 774–781.
• Fan, J. (1991). On the estimation of quadratic functionals., The Annals of Statistics 19(3), 1273–1294.
• Fan, J., Y. Fan, and J. Lv (2008). High dimensional covariance matrix estimation using a factor model., Journal of Econometrics 147(1), 186–197.
• Fan, J., Y. Liao, and M. Mincheva (2011). High dimensional covariance matrix estimation in approximate factor models., The Annals of Statistics 39(6), 3320–3356.
• Fan, J., Y. Liao, and M. Mincheva (2013). Large covariance estimation by thresholding principal orthogonal complements., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75(4), 603–680.
• Franaszczuk, P., K. Blinowska, and M. Kowalczyk (1985). The application of parametric multichannel spectral estimates in the study of electrical brain activity., Biological Cybernetics 51(4), 239–247.
• Friedman, J., T. Hastie, H. Hofling, and R. Tibshirani (2007). Pathwise coordinate optimization., The Annals of Applied Statistics 1(2), 302–332.
• Friedman, J., T. Hastie, and R. Tibshirani (2008). Sparse inverse covariance estimation with the graphical Lasso., Biostatistics 9(3), 432–441.
• Friston, K. J., P. Jezzard, and R. Turner (1994). Analysis of functional MRI time-series., Human Brain Mapping 1(2), 153–171.
• Fuhrmann, D. R. (1991). Application of Toeplitz covariance estimation to adaptive beamforming and detection., IEEE Transactions on Signal Processing 39, 2194–2198.
• Furrer, R. and T. Bengtsson (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants., Journal of Multivariate Analysis 98(2), 227–255.
• Furrer, R., M. G. Genton, and D. Nychka (2006). Covariance tapering for interpolation of large spatial datasets., Journal of Computational and Graphical Statistics 15(3), 502–523.
• Gao, C., Z. Ma, and H. H. Zhou (2014). Sparse CCA: Adaptive estimation and computational barriers., arXiv preprint arXiv:1409.8565.
• Gaspari, G. and S. E. Cohn (1999). Construction of correlation functions in two and three dimensions., Quarterly Journal of the Royal Meteorological Society 125(554), 723–757.
• Goldfarb, D. and G. Iyengar (2003). Robust portfolio selection problems., Mathematics of Operations Research 28(1), 1–38.
• Golubev, G. K., M. Nussbaum, and H. H. Zhou (2010). Asymptotic equivalence of spectral density estimation and Gaussian white noise., The Annals of Statistics 38(1), 181–214.
• Grenander, U. and G. Szegö (1958)., Toeplitz Forms and Their Applications, Volume 321. Univ of California Press.
• Hamill, T. M., J. S. Whitaker, and C. Snyder (2001). Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter., Monthly Weather Review 129(11), 2776–2790.
• Houtekamer, P. L. and H. L. Mitchell (2001). A sequential ensemble Kalman filter for atmospheric data assimilation., Monthly Weather Review 129(1), 123–137.
• Hsieh, C.-J., I. S. Dhillon, P. K. Ravikumar, and M. A. Sustik (2011). Sparse inverse covariance matrix estimation using quadratic approximation. In, Advances in Neural Information Processing Systems, pp. 2330–2338.
• Hsieh, C.-J., M. A. Sustik, I. Dhillon, P. Ravikumar, and R. Poldrack (2013). BIG & QUIC: Sparse inverse covariance estimation for a million variables. In, Advances in Neural Information Processing Systems, pp. 3165–3173.
• Huang, J. Z., N. Liu, M. Pourahmadi, and L. Liu (2006). Covariance matrix selection and estimation via penalised normal likelihood., Biometrika 93(1), 85–98.
• Javanmard, A. and A. Montanari (2014). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory., Information Theory, IEEE Transactions on 60(10), 6522–6554.
• Jiang, D., T. Jiang, and F. Yang (2012). Likelihood ratio tests for covariance matrices of high-dimensional normal distributions., Journal of Statistical Planning and Inference 142(8), 2241–2256.
• Jiang, T. (2004). The asymptotic distributions of the largest entries of sample correlation matrices., The Annals of Applied Probability 14(2), 865–880.
• Johansson, K. (2000). Shape fluctuations and random matrices., Communications in Mathematical Physics 209(2), 437–476.
• John, S. (1971). Some optimal multivariate tests., Biometrika 58(1), 123–127.
• Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal component analysis., The Annals of Statistics 29(2), 295–327.
• Johnstone, I. M. and A. Y. Lu (2009). On consistency and sparsity for principal components analysis in high dimensions., Journal of the American Statistical Association 104(486), 682–693.
• Jolliffe, I. T., N. T. Trendafilov, and M. Uddin (2003). A modified principal component technique based on the LASSO., Journal of Computational and Graphical Statistics 12(3), 531–547.
• Knowles, A. and J. Yin (2014). Anisotropic local laws for random matrices., arXiv preprint arXiv:1410.3516.
• Lam, C. and J. Fan (2009). Sparsistency and rates of convergence in large covariance matrix estimation., The Annals of Statistics 37(6B), 4254–4278.
• Lauritzen, S. L. (1996)., Graphical Models. Oxford University Press.
• Le Cam, L. (1973). Convergence of estimates under dimensionality restrictions., The Annals of Statistics 1(1), 38–53.
• Ledoit, O. and M. Wolf (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size., The Annals of Statistics 30(4), 1081–1102.
• Lee, J. O. and K. Schnelli (2014). Tracy-Widom distribution for the largest eigenvalue of real sample covariance matrices with general population., arXiv preprint arXiv:1409.4979.
• Li, D., W. D. Liu, and A. Rosalsky (2010). Necessary and sufficient conditions for the asymptotic distribution of the largest entry of a sample correlation matrix., Probability Theory and Related Fields 148(1-2), 5–35.
• Li, D., Y. Qi, and A. Rosalsky (2012). On Jiang’s asymptotic distribution of the largest entry of a sample correlation matrix., Journal of Multivariate Analysis 111, 256–270.
• Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control., The Annals of Statistics 41(6), 2948–2978.
• Liu, W.-D., Z. Lin, and Q.-M. Shao (2008). The asymptotic distribution and Berry–Esseen bound of a new test for independence in high dimension with an application to stochastic optimization., The Annals of Applied Probability 18(6), 2337–2366.
• Ma, Z. (2012). Accuracy of the Tracy–Widom limits for the extreme eigenvalues in white Wishart matrices., Bernoulli 18(1), 322–359.
• Ma, Z. (2013). Sparse principal component analysis and iterative thresholding., The Annals of Statistics 41(2), 772–801.
• Mai, Q., H. Zou, and M. Yuan (2012). A direct approach to sparse discriminant analysis in ultra-high dimensions., Biometrika 99(1), 29–42.
• McMurry, T. L. and D. N. Politis (2010). Banded and tapered estimates for autocovariance matrices and the linear process bootstrap., Journal of Time Series Analysis 31(6), 471–482.
• Meinshausen, N. and P. Bühlmann (2006). High dimensional graphs and variable selection with the Lasso., The Annals of Statistics 34(3), 1436–1462.
• Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach., The Annals of Statistics 36(6), 2791–2817.
• Nagao, H. (1973). On some test criteria for covariance matrix., The Annals of Statistics 1(4), 700–709.
• Onatski, A., M. Moreira, and M. Hallin (2013). Asymptotic power of sphericity tests for high-dimensional data., The Annals of Statistics 41(3), 1204–1231.
• Pang, H., H. Liu, and R. Vanderbei (2014). The FASTCLIME package for linear programming and large-scale precision matrix estimation in R., The Journal of Machine Learning Research 15(1), 489–493.
• Parzen, E. (1957). On consistent estimates of the spectrum of a stationary time series., The Annals of Mathematical Statistics 28(2), 329–348.
• Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model., Statistica Sinica 17(4), 1617.
• Peche, S. (2009). Universality results for the largest eigenvalues of some sample covariance matrix ensembles., Probability Theory and Related Fields 143(3-4), 481–516.
• Qiu, Y. and S. X. Chen (2012). Test for bandedness of high-dimensional covariance matrices and bandwidth estimation., The Annals of Statistics 40(3), 1285–1314.
• Quah, D. (2000). Internet cluster emergence., European Economic Review 44(4), 1032–1044.
• Ravikumar, P., M. J. Wainwright, G. Raskutti, and B. Yu (2011). High-dimensional covariance estimation by minimizing $\ell _1$ penalized log-determinant divergence., Electronic Journal of Statistics 5, 935–980.
• Ren, Z., T. Sun, C.-H. Zhang, and H. H. Zhou (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical model., The Annals of Statistics 43(3), 991–1026.
• Ren, Z. and H. H. Zhou (2012). Discussion: Latent variable graphical model selection via convex optimization., The Annals of Statistics 40(4), 1989–1996.
• Rigollet, P. and A. B. Tsybakov (2012). Comment: Minimax estimation of large covariance matrices under $\ell _1$-norm., Statistica Sinica 22(4), 1358–1367.
• Rohde, A. and A. B. Tsybakov (2011). Estimation of high-dimensional low-rank matrices., The Annals of Statistics 39(2), 887–930.
• Ross, S. A. (1976). The arbitrage theory of capital asset pricing., Journal of Economic Theory 13(3), 341–360.
• Ross, S. A. (1977). The capital asset pricing model (CAPM), short-sale restrictions and related issues., The Journal of Finance 32(1), 177–183.
• Rothman, A. J., P. J. Bickel, E. Levina, and J. Zhu (2008). Sparse permutation invariant covariance estimation., Electronic Journal of Statistics 2, 494–515.
• Rothman, A. J., E. Levina, and J. Zhu (2009). Generalized thresholding of large covariance matrices., Journal of the American Statistical Association 104(485), 177–186.
• Rudelson, M. and S. Zhou (2013). Reconstruction from anisotropic random measurements., Information Theory, IEEE Transactions on 59(6), 3434–3447.
• Runge, J., V. Petoukhov, and J. Kurths (2014). Quantifying the strength and delay of climatic interactions: The ambiguities of cross correlation and a novel measure based on graphical models., Journal of Climate 27(2), 720–739.
• Samarov, A. (1977). Lower bound on the risk for spectral density estimates., Problemy Peredachi Informatsii 13(1), 67–72.
• =6pt=6ptShao, Q.-M. (1999). A Cramér type large deviation result for Student’s t-statistic., Journal of Theoretical Probability 12(2), 385–398.
• Shao, Q.-M. and W.-X. Zhou (2014). Necessary and sufficient conditions for the asymptotic distributions of coherence of ultra-high dimensional random matrices., The Annals of Probability 42(2), 623–648.
• Shen, D., H. Shen, and J. Marron (2013). Consistency of sparse PCA in high dimension, low sample size contexts., Journal of Multivariate Analysis 115, 317–333.
• Shen, H. and J. Z. Huang (2008). Sparse principal component analysis via regularized low rank matrix approximation., Journal of Multivariate Analysis 99(6), 1015–1034.
• Soshnikov, A. (2002). A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices., Journal of Statistical Physics 108(5-6), 1033–1056.
• Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high dimensional data., Journal of the Japan Statistical Society 35(2), 251–272.
• Sun, T. and C.-H. Zhang (2012). Scaled sparse linear regression., Biometrika 99(4), 879–898.
• Sun, T. and C.-H. Zhang (2013). Sparse matrix inversion with scaled Lasso., The Journal of Machine Learning Research 14(1), 3385–3418.
• Tao, M., Y. Wang, and H. H. Zhou (2013). Optimal sparse volatility matrix estimation for high-dimensional Itô processes with measurement errors., The Annals of Statistics 41(4), 1816–1864.
• Tsybakov, A. B. (2009)., Introduction to Nonparametric Estimation. Springer.
• van de Geer, S. and P. Bühlmann (2009). On the conditions used to prove oracle results for the Lasso., Electronic Journal of Statistics 3, 1360–1392.
• van de Geer, S., P. Bühlmann, Y. Ritov, and R. Dezeure (2014). On asymptotically optimal confidence regions and tests for high-dimensional models., The Annals of Statistics 42(3), 1166–1202.
• Vanderberghe, L., S. Boyd, and S. P. Wu (1998). Determinant maximization with linear matrix inequality constraints., SIAM Journal on Matrix Analysis and Applications 19(2), 499–533.
• Visser, H. and J. Molenaar (1995). Trend estimation and regression analysis in climatological time series: An application of structural time series models and the Kalman filter., Journal of Climate 8(5), 969–979.
• Vu, V. Q. and J. Lei (2013). Minimax sparse principal subspace estimation in high dimensions., The Annals of Statistics 41(6), 2905–2947.
• Wachter, K. W. (1976). Probability plotting points for principal components. In, Ninth Interface Symposium Computer Science and Statistics, pp. 299–308. Prindle, Weber and Schmidt, Boston.
• Wachter, K. W. (1978). The strong limits of random matrix spectra for sample matrices of independent elements., The Annals of Probability 6(1), 1–18.
• Wang, T., Q. Berthet, and R. J. Samworth (2014). Statistical and computational trade-offs in estimation of sparse principal components., arXiv preprint arXiv:1408.5369.
• Wille, A., P. Zimmermann, E. Vranová, A. Fürholz, O. Laule, S. Bleuler, L. Hennig, A. Prelic, P. von Rohr, L. Thiele, et al. (2004). Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana., Genome Biology 5(11), R92.
• Witten, D., J. Friedman, and N. Simon (2011). New insights and faster computations for the graphical Lasso., Journal of Computational and Graphical Statistics 20(4), 892–900.
• Witten, D., R. Tibshirani, and T. Hastie (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis., Biostatistics 10(3), 515–534.
• Wu, W. B. (2005). Nonlinear system theory: Another look at dependence., Proceedings of the National Academy of Sciences of the United States of America 102(40), 14150–14154.
• Wu, W. B. and M. Pourahmadi (2003). Nonparametric estimation of large covariance matrices of longitudinal data., Biometrika 90(4), 813–844.
• Wu, W. B. and M. Pourahmadi (2009). Banding sample autocovariance matrices of stationary processes., Statistica Sinica 19(4), 1755–1768.
• Xiao, H. and W. B. Wu (2011). Simultaneous inference of covariances., arXiv preprint arXiv:1109.0524.
• Xiao, H. and W. B. Wu (2012). Covariance matrix estimation for stationary time series., The Annals of Statistics 40(1), 466–493.
• Xiao, L. and F. Bunea (2014). On the theoretic and practical merits of the banding estimator for large covariance matrices., arXiv preprint arXiv:1402.0844.
• Yang, Y. and A. Barron (1999). Information-theoretic determination of minimax rates of convergence., The Annals of Statistics 27(5), 1564–1599.
• Ye, F. and C.-H. Zhang (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_q$ loss in $\ell _r$ balls., The Journal of Machine Learning Research 11, 3519–3540.
• Yu, B. (1997). Assouad, Fano, and Le Cam., Festschrift for Lucien Le Cam, 423–435.
• Yuan, M. (2010). Sparse inverse covariance matrix estimation via linear programming., The Journal of Machine Learning Research 11, 2261–2286.
• Yuan, M. and Y. Lin (2007). Model selection and estimation in the Gaussian graphical model., Biometrika 94(1), 19–35.
• Zhang, C.-H. and S. S. Zhang (2014). Confidence intervals for low dimensional parameters in high dimensional linear models., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(1), 217–242.
• Zhang, C.-H. and T. Zhang (2012). A general theory of concave regularization for high-dimensional sparse estimation problems., Statistical Science 27(4), 576–593.
• Zheng, S., Z. Bai, and J. Yao (2015). Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing., The Annals of Statistics 43(2), 546–591.
• Zhou, W. (2007). Asymptotic distribution of the largest off-diagonal entry of correlation matrices., Transactions of the American Mathematical Society 359(11), 5345–5363.
• Zou, H., T. Hastie, and R. Tibshirani (2006). Sparse principal component analysis., Journal of Computational and Graphical Statistics 15(2), 265–286.