The Annals of Applied Statistics

Sparse estimation of large covariance matrices via a nested Lasso penalty

Elizaveta Levina, Adam Rothman, and Ji Zhu

Full-text: Open access


The paper proposes a new covariance estimator for large covariance matrices when the variables have a natural ordering. Using the Cholesky decomposition of the inverse, we impose a banded structure on the Cholesky factor, and select the bandwidth adaptively for each row of the Cholesky factor, using a novel penalty we call nested Lasso. This structure has more flexibility than regular banding, but, unlike regular Lasso applied to the entries of the Cholesky factor, results in a sparse estimator for the inverse of the covariance matrix. An iterative algorithm for solving the optimization problem is developed. The estimator is compared to a number of other covariance estimators and is shown to do best, both in simulations and on a real data example. Simulations show that the margin by which the estimator outperforms its competitors tends to increase with dimension.

Article information

Ann. Appl. Stat. Volume 2, Number 1 (2008), 245-263.

First available in Project Euclid: 24 March 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Covariance matrix high dimension low sample size large p small n Lasso sparsity Cholesky decomposition


Levina, Elizaveta; Rothman, Adam; Zhu, Ji. Sparse estimation of large covariance matrices via a nested Lasso penalty. Ann. Appl. Stat. 2 (2008), no. 1, 245--263. doi:10.1214/07-AOAS139.

Export citation


  • Adam, B., Qu, Y., Davis, J., Ward, M., Clements, M., Cazares, L., Semmes, O., Schellhammer, P., Yasui, Y., Feng, Z. and Wright, G. (2002). Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men., Cancer Research 62 3609–3614.
  • Anderson, T. W. (1958)., An Introduction to Multivariate Statistical Analysis. Wiley, New York.
  • Banerjee, O., d’Aspremont, A. and El Ghaoui, L. (2006). Sparse covariance selection via robust maximum likelihood estimation. In, Proceedings of ICML.
  • Bickel, P. J. and Levina, E. (2004). Some theory for Fisher’s linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations., Bernoulli 10 989–1010.
  • Bickel, P. J. and Levina, E. (2007). Regularized estimation of large covariance matrices., Ann. Statist. To appear.
  • Dempster, A. (1972). Covariance selection., Biometrics 28 157–175.
  • Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss., Ann. Statist. 13 1581–1591.
  • Diggle, P. and Verbyla, A. (1998). Nonparametric estimation of covariance structure in longitudinal data., Biometrics 54 401–415.
  • Djavan, B., Zlotta, A., Kratzik, C., Remzi, M., Seitz, C., Schulman, C. and Marberger, M. (1999). Psa, psa density, psa density of transition zone, free/total psa ratio, and psa velocity for early detection of prostate cancer in men with serum psa 2.5 to 4.0 ng/ml., Urology 54 517–522.
  • Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model., J. Econometrics. To appear.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., J. Amer. Statist. Assoc. 96 1348–1360.
  • Friedman, J. (1989). Regularized discriminant analysis., J. Amer. Statist. Assoc. 84 165–175.
  • Friedman, J., Hastie, T., Höfling, H. G. and Tibshirani, R. (2007). Pathwise coordinate optimization., Ann. Appl. Statist. 1 302–332.
  • Fu, W. (1998). Penalized regressions: The bridge versus the lasso., J. Comput. Graph. Statist. 7 397–416.
  • Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants., J. Multivariate Anal. 98 227–255.
  • Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix., Ann. Statist. 8 586–597.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001)., The Elements of Statistical Learning. Springer, Berlin.
  • Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems., Technometrics 12 55–67.
  • Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood., Biometrika 93 85–98.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis., Ann. Statist. 29 295–327.
  • Johnstone, I. M. and Lu, A. Y. (2007). Sparse principal components analysis., J. Amer. Statist. Assoc. To appear.
  • Ledoit, O. and Wolf, M. (2003). A well-conditioned estimator for large-dimensional covariance matrices., J. Multivariate Anal. 88 365–411.
  • Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979)., Multivariate Analysis. Academic Press, New York.
  • Pannek, J. and Partin, A. (1998). The role of psa and percent free psa for staging and prognosis prediction in clinically localized prostate cancer., Semin. Urol. Oncol. 16 100–105.
  • Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation., Biometrika 86 677–690.
  • Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data., J. Amer. Statist. Assoc. 97 1141–1153.
  • Stamey, T., Johnstone, I., McNeal, J., Lu, A. and Yemoto, C. (2002). Preoperative serum prostate specific antigen levels between 2 and 22 ng/ml correlate poorly with post-radical prostatectomy cancer morphology: Prostate specific antigen cure rates appear constant between 2 and 9 ng/ml., J. Urol. 167 103–111.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2003). Class prediction by nearest shrunken centroids, with applications to DNA microarrays., Statist. Sci. 18 104–117.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso., J. Roy. Statist. Soc. Ser. B 67 91–108.
  • Wong, F., Carter, C. and Kohn, R. (2003). Efficient estimation of covariance selection models., Biometrika 90 809–830.
  • Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data., Biometrika 90 831–844.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model., Biometrika 94 19–35.
  • Zhao, P., Rocha, G. and Yu, B. (2006). Grouped and hierarchical model selection through composite absolute penalties. Technical Report 703, Dept. Statistics, UC, Berkeley.