Electronic Journal of Statistics

Adaptive estimation of covariance matrices via Cholesky decomposition

Nicolas Verzelen

Full-text: Open access


This paper studies the estimation of a large covariance matrix. We introduce a novel procedure called ChoSelect based on the Cholesky factor of the inverse covariance. This method uses a dimension reduction strategy by selecting the pattern of zero of the Cholesky factor. Alternatively, ChoSelect can be interpreted as a graph estimation procedure for directed Gaussian graphical models. Our approach is particularly relevant when the variables under study have a natural ordering (e.g. time series) or more generally when the Cholesky factor is approximately sparse. ChoSelect achieves non-asymptotic oracle inequalities with respect to the Kullback-Leibler entropy. Moreover, it satisfies various adaptive properties from a minimax point of view. We also introduce and study a two-stage procedure that combines ChoSelect with the Lasso. This last method enables the practitioner to choose his own trade-off between statistical efficiency and computational complexity. Moreover, it is consistent under weaker assumptions than the Lasso. The practical performances of the different procedures are assessed on numerical examples.

Article information

Electron. J. Statist. Volume 4 (2010), 1113-1150.

First available in Project Euclid: 28 October 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation
Secondary: 62F35: Robustness and adaptive procedures 62J05: Linear regression

Covariance matrix banding Cholesky decomposition directed graphical models penalized criterion minimax rate of estimation


Verzelen, Nicolas. Adaptive estimation of covariance matrices via Cholesky decomposition. Electron. J. Statist. 4 (2010), 1113--1150. doi:10.1214/10-EJS580. http://projecteuclid.org/euclid.ejs/1288271686.

Export citation


  • [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In, Second International Symposium on Information Theory (Tsahkadsor, 1971). Akadémiai Kiadó, Budapest, 267–281.
  • [2] Bach, F. (2008). model consistent lasso estimation through the bootstrap. In, Twenty-fifth International Conference on Machine Learning (ICML).
  • [3] Banerjee, O., El Ghaoui, L., and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data., J. Mach. Learn. Res. 9, 485–516.
  • [4] Baraud, Y., Giraud, C., and Huet, S. (2009). Gaussian model selection with an unknown variance., Ann. Statist. 37, 2, 630–672.
  • [5] Bickel, P. J. and Levina, E. (2008a). Covariance regularization by thresholding., Ann. Statist. 36, 6, 2577–2604.
  • [6] Bickel, P. J. and Levina, E. (2008b). Regularized estimation of large covariance matrices., Ann. Statist. 36, 1, 199–227.
  • [7] Birgé, L. (2005). A new lower bound for multiple hypothese testing., IEEE Trans. Inf. Theory 51, 4, 1611–1615.
  • [8] Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: exponentntial bounds and rates of convergence., Bernoulli 4, 3, 329–375.
  • [9] Birge, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection., Probab. Theory Related Fields 138, 1-2, 33–73.
  • [10] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression., Ann. Statist. 32, 2, 407–499.
  • [11] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices., Ann. Statist. 36, 6, 2717–2756.
  • [12] Fan, J., Feng, Y., and Wu, Y. (2009). Network exploration via thea daptive lasso and scad penalties., Ann. Appl. Stat 3, 2, 521–541.
  • [13] Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso., Biostatistics 9, 3, 432–441.
  • [14] Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants., J. Multivariate Anal. 98, 2, 227–255.
  • [15] Huang, J., Liu, N., Pourahmadi, M., and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood., Biometrika 93, 1, 85–98.
  • [16] Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal components analysis., Ann. Statist. 29, 2, 295–327.
  • [17] Johnstone, I. and Lu, A. (2004). Sparse principal components analysis. Tech. rep., Stanford, university.
  • [18] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm., J. Mach. Learn. Res. 8, 613–636.
  • [19] Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation., Ann. Statist. 37, 6B, 4254–4278. http://dx.doi.org/10.1214/09-AOS720.
  • [20] Lauritzen, S. L. (1996)., Graphical Models. Oxford University Press, New York.
  • [21] Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices., J. Multivariate Anal. 88, 2, 365–411.
  • [22] Levina, E., Rothman, A., and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested lasso penalty., Ann. Appl. Stat 2, 1, 245–263.
  • [23] Massart, P. (2007)., Concentration Inequalities and Model Selection, École d’été de probabilités de Saint Flour XXXIII. Lecture Notes in Mathematics, Vol. 1896. Springer-Verlag.
  • [24] McQuarrie, A. D. R. and Tsai, C.-L. (1998)., Regression and Time Series Model Selection. World Scientific.
  • [25] Meinshausen, N. and Bühlmann, P. (2010). Stability selection., J. R. Stat. Soc. Ser. B Stat. Methodol. 72, 4, 417–473.
  • [26] Rosen, D. V. (1988). Moments for the inverted wishart distribution., Scand. J. Statist. 15, 2, 97–109.
  • [27] Rothman, A., Bickel, P., Levina, E., and Zhu, J. (2008). Sparse permutation invariant covariance estimation., Electron. J. Stat. 2, 494–515.
  • [28] Verzelen, N. (2009). Technical Appendix to “Adaptive estimation of covariance matrices via cholesky, decomposition”.
  • [29] Verzelen, N. (2010). High-dimensional gaussian model selection on a gaussian design., Ann. Inst. H. Poincaré Probab. Statist. 46, 2, 480–524.
  • [30] Wagaman, A. and Levina, E. (2009). Discovering sparse covariance structures with the isomap., Journal of Computational and Graphical Statistics 18, 3, 551–572.
  • [31] Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data., Biometrika 90, 4, 831–844.
  • [32] Yu, B. (1997). Assouad, Fano, and Le Cam. In, Festschrift for Lucien Le Cam. Springer, New York, 423–435.
  • [33] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model., Biometrika 94, 19–35.
  • [34] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression., Ann. Statist. 36, 4, 1567–1594. http://dx.doi.org/10.1214/07-AOS520.
  • [35] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., J. Mach. Learn. Res. 7, 2541–2563.

Supplemental materials