Electronic Journal of Statistics

Sparse permutation invariant covariance estimation

Adam J. Rothman, Peter J. Bickel, Elizaveta Levina, and Ji Zhu

Full-text: Open access


The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in high-dimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lasso-type penalty. We establish a rate of convergence in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlation-based version of the method exhibits better rates in the operator norm. We also derive a fast iterative algorithm for computing the estimator, which relies on the popular Cholesky decomposition of the inverse but produces a permutation-invariant estimator. The method is compared to other estimators on simulated data and on a real data example of tumor tissue classification using gene expression data.

Article information

Electron. J. Statist. Volume 2 (2008), 494-515.

First available: 26 June 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H20: Measures of association (correlation, canonical correlation, etc.)
Secondary: 62H12: Estimation

Covariance matrix High dimension low sample size large p small n Lasso Sparsity Cholesky decomposition


Rothman, Adam J.; Bickel, Peter J.; Levina, Elizaveta; Zhu, Ji. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics 2 (2008), 494--515. doi:10.1214/08-EJS176. http://projecteuclid.org/euclid.ejs/1214491853.

Export citation


  • Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays., Proc Natl Acad Sci USA, 96(12):6745–6750.
  • Bazaraa, M. S., Sherali, H. D., and Shetty, C. M. (2006)., Nonlinear Programming: Theory and Algorithms. Wiley, New Jersey, 3rd edition.
  • Bickel, P. J. and Levina, E. (2004). Some theory for Fisher’s linear discriminant function, “naive Bayes”, and some alternatives when there are many more variables than observations., Bernoulli, 10(6):989–1010.
  • Bickel, P. J. and Levina, E. (2007). Covariance regularization by thresholding., Ann. Statist. To appear.
  • Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices., Ann. Statist., 36(1):199–227.
  • Chaudhuri, S., Drton, M., and Richardson, T. S. (2007). Estimation of a covariance matrix with zeros., Biometrika, 94(1):199–216.
  • d’Aspremont, A., Banerjee, O., and El Ghaoui, L. (2008). First-order methods for sparse covariance selection., SIAM Journal on Matrix Analysis and its Applications, 30(1):56–66.
  • Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss., Ann. Statist., 13(4):1581–1591.
  • Drton, M. and Perlman, M. D. (2008). A SINful approach to Gaussian graphical model selection., J. Statist. Plann. Inference, 138(4):1179–1200.
  • El Karoui, N. (2007). Operator norm consistent estimation of large dimensional sparse covariance matrices., Ann. Statist. To appear.
  • Fan, J., Fan, Y., and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model., Journal of Econometrics. To appear.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., J. Amer. Statist. Assoc., 96(456):1348–1360.
  • Friedman, J., Hastie, T., and Tibshirani, R. (2007). Pathwise coordinate optimization., Annals of Applied Statistics, 1(2):302–332.
  • Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso., Biostatistics. Pre-published online, DOI 10.1093/biostatistics/kxm045.
  • Fu, W. (1998). Penalized regressions: the bridge versus the lasso., Journal of Computational and Graphical Statistics, 7(3):397–416.
  • Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants., Journal of Multivariate Analysis, 98(2):227–255.
  • Golub, G. H. and Van Loan, C. F. (1989)., Matrix Computations. The John Hopkins University Press, Baltimore, Maryland, 2nd edition.
  • Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix., Ann. Statist., 8(3):586–597.
  • Huang, J., Liu, N., Pourahmadi, M., and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood., Biometrika, 93(1):85–98.
  • Hunter, D. R. and Li, R. (2005). Variable selection using mm algorithms., Ann. Statist., 33(4):1617–1642.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis., Ann. Statist., 29(2):295–327.
  • Johnstone, I. M. and Lu, A. Y. (2004). Sparse principal components analysis. Unpublished, manuscript.
  • Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm., J. Mach. Learn. Res., 8:613–636.
  • Lam, C. and Fan, J. (2007). Sparsistency and rates of convergence in large covariance matrices estimation., Manuscript.
  • Ledoit, O. and Wolf, M. (2003). A well-conditioned estimator for large-dimensional covariance matrices., Journal of Multivariate Analysis, 88:365–411.
  • Levina, E., Rothman, A. J., and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty., Annals of Applied Statistics, 2(1):245–263.
  • Lin, S. P. and Perlman, M. D. (1985). A Monte Carlo comparison of four estimators for a covariance matrix. In Krishnaiah, P. R., editor, Multivariate Analysis, volume 6, pages 411–429. Elsevier Science Publishers.
  • Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979)., Multivariate Analysis. Academic Press, New York.
  • Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the Lasso., Ann. Statist., 34(3):1436–1462.
  • Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model., Stat. Sinica, 17(4):1617–1642.
  • Saulis, L. and Statulevičius, V. A. (1991)., Limit Theorems for Large Deviations. Kluwer Academic Publishers, Dordrecht.
  • Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data., J. Amer. Statist. Assoc., 97(460):1141–1153.
  • Wang, L., Zhu, J., and Zou, H. (2007). Hybrid huberized support vector machines for microarray classification. In, ICML ’07: Proceedings of the 24th International Conference on Machine Learning, pages 983–990, New York, NY, USA. ACM Press.
  • Wong, F., Carter, C., and Kohn, R. (2003). Efficient estimation of covariance selection models., Biometrika, 90:809–830.
  • Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data., Biometrika, 90:831–844.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model., Biometrika, 94(1):19–35.