The Annals of Statistics

High-dimensional graphs and variable selection with the Lasso

Nicolai Meinshausen and Peter Bühlmann

Full-text: Open access


The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.

Article information

Ann. Statist. Volume 34, Number 3 (2006), 1436-1462.

First available in Project Euclid: 10 July 2006

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62H20: Measures of association (correlation, canonical correlation, etc.) 62F12: Asymptotic properties of estimators

Linear regression covariance selection Gaussian graphical models penalized regression


Meinshausen, Nicolai; Bühlmann, Peter. High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 (2006), no. 3, 1436--1462. doi:10.1214/009053606000000281.

Export citation


  • Buhl, S. (1993). On the existence of maximum-likelihood estimators for graphical Gaussian models. Scand. J. Statist. 20 263--270.
  • Chen, S., Donoho, D. and Saunders, M. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129--159.
  • Dempster, A. (1972). Covariance selection. Biometrics 28 157--175.
  • Drton, M. and Perlman, M. (2004). Model selection for Gaussian concentration graphs. Biometrika 91 591--602.
  • Edwards, D. (2000). Introduction to Graphical Modelling, 2nd ed. Springer, New York.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407--499.
  • Frank, I. and Friedman, J. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109--148.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of over-parametrization. Bernoulli 10 971--988.
  • Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R. and Kadie, C. (2000). Dependency networks for inference, collaborative filtering and data visualization. J. Machine Learning Research 1 49--75.
  • Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric regression. Ann. Statist. 28 681--712.
  • Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356--1378.
  • Lauritzen, S. (1996). Graphical Models. Clarendon Press, Oxford.
  • Osborne, M., Presnell, B. and Turlach, B. (2000). On the lasso and its dual. J. Comput. Graph. Statist. 9 319--337.
  • Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486--494.
  • Speed, T. and Kiiveri, H. (1986). Gaussian Markov distributions over finite graphs. Ann. Statist. 14 138--150.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267--288.
  • van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.