The Annals of Statistics

Optimal rates of convergence for sparse covariance matrix estimation

T. Tony Cai and Harrison H. Zhou

Full-text: Open access

Abstract

This paper considers estimation of sparse covariance matrices and establishes the optimal rate of convergence under a range of matrix operator norm and Bregman divergence losses. A major focus is on the derivation of a rate sharp minimax lower bound. The problem exhibits new features that are significantly different from those that occur in the conventional nonparametric function estimation problems. Standard techniques fail to yield good results, and new tools are thus needed.

We first develop a lower bound technique that is particularly well suited for treating “two-directional” problems such as estimating sparse covariance matrices. The result can be viewed as a generalization of Le Cam’s method in one direction and Assouad’s Lemma in another. This lower bound technique is of independent interest and can be used for other matrix estimation problems.

We then establish a rate sharp minimax lower bound for estimating sparse covariance matrices under the spectral norm by applying the general lower bound technique. A thresholding estimator is shown to attain the optimal rate of convergence under the spectral norm. The results are then extended to the general matrix $\ell_{w}$ operator norms for $1\le w\le\infty$. In addition, we give a unified result on the minimax rate of convergence for sparse covariance matrix estimation under a class of Bregman divergence losses.

Article information

Source
Ann. Statist., Volume 40, Number 5 (2012), 2389-2420.

Dates
First available in Project Euclid: 4 February 2013

Permanent link to this document
https://projecteuclid.org/euclid.aos/1359987525

Digital Object Identifier
doi:10.1214/12-AOS998

Mathematical Reviews number (MathSciNet)
MR3097607

Zentralblatt MATH identifier
1373.62247

Subjects
Primary: 62H12: Estimation
Secondary: 62F12: Asymptotic properties of estimators 62G09: Resampling methods

Keywords
Assouad’s lemma Bregman divergence covariance matrix estimation Frobenius norm Le Cam’s method minimax lower bound spectral norm optimal rate of convergence thresholding

Citation

Cai, T. Tony; Zhou, Harrison H. Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist. 40 (2012), no. 5, 2389--2420. doi:10.1214/12-AOS998. https://projecteuclid.org/euclid.aos/1359987525


Export citation

References

  • Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • Assouad, P. (1983). Deux remarques sur l’estimation. C. R. Acad. Sci. Paris Sér. I Math. 296 1021–1024.
  • Bickel, P. J. and Levina, E. (2008a). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Bickel, P. J. and Levina, E. (2008b). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Brègman, L. M. (1967). A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7 200–217.
  • Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106 672–684.
  • Cai, T. T., Liu, W. and Zhou, H. H. (2011). Optimal estimation of large sparse precision matrices. Unpublished manuscript.
  • Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
  • Cai, T. T. and Zhou, H. H. (2009). Covariance matrix estimation under the $\ell_1$ norm (with discussion). Statist. Sinica 22 1319–1378.
  • Cai, T. T. and Zhou, H. H. (2012). Supplement to “Optimal rates of convergence for sparse covariance matrix estimation.” DOI:10.1214/12-AOS998SUPP.
  • Censor, Y. and Zenios, S. A. (1997). Parallel Optimization: Theory, Algorithms, and Applications. Oxford Univ. Press, New York.
  • Dhillon, I. S. and Tropp, J. A. (2007). Matrix nearness problems with Bregman divergences. SIAM J. Matrix Anal. Appl. 29 1120–1146.
  • Donoho, D. L. and Liu, R. C. (1991). Geometrizing rates of convergence. II. Ann. Statist. 19 633–667.
  • El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • Kulis, B., Sustik, M. A. and Dhillon, I. S. (2009). Low-rank kernel learning with Bregman matrix divergences. J. Mach. Learn. Res. 10 341–376.
  • Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
  • Le Cam, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1 38–53.
  • Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, New York.
  • Ravikumar, P., Wainwright, M., Raskutti, G. and Yu, B. (2008). High-dimensional covariance estimation by minimizing $l_1$-penalized log-determinant divergence. Technical Report 797, Dept. Statistics, UC Berkeley.
  • Rothman, A. J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104 177–186.
  • Saulis, L. and Statulevičius, V. A. (1991). Limit Theorems for Large Deviations. Mathematics and Its Applications (Soviet Series) 73. Kluwer Academic, Dordrecht.
  • Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
  • Whittle, P. (1960). Bounds for the moments of linear and quadratic forms in independent variables. Theory Probab. Appl. 5 302–305.
  • Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G. Yang, eds.) 423–435. Springer, New York.

Supplemental materials

  • Supplementary material: Supplement to “Optimal rates of convergence for sparse covariance matrix estimation”. In this supplement we prove the additional technical lemmas used in the proof of Lemma 6.