Annals of Statistics

High-dimensional generalized linear models and the lasso

Sara A. van de Geer

Full-text: Open access


We consider high-dimensional generalized linear models with Lipschitz loss functions, and prove a nonasymptotic oracle inequality for the empirical risk minimizer with Lasso penalty. The penalty is based on the coefficients in the linear predictor, after normalization with the empirical norm. The examples include logistic regression, density estimation and classification with hinge loss. Least squares regression is also discussed.

Article information

Ann. Statist., Volume 36, Number 2 (2008), 614-645.

First available in Project Euclid: 13 March 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression

Lasso oracle inequality sparsity


van de Geer, Sara A. High-dimensional generalized linear models and the lasso. Ann. Statist. 36 (2008), no. 2, 614--645. doi:10.1214/009053607000000929.

Export citation


  • Bousquet, O. (2002). A Bennet concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–550.
  • Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via 1-penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory, COLT 2006. Lecture Notes in Comput. Sci. 4005 379–391. Springer, Berlin.
  • Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007a). Sparse density estimation with 1 penalties. COLT 2007 4539 530–543.
  • Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007b). Sparsity oracle inequalities for the Lasso. Electron. J. Statist. 1 169–194.
  • Dahinden, C., Parmigiani, G., Emerick, M. C. and Bühlmann, P. (2008). Penalized likelihood for sparse contingency tables with an application to full length cDNA libraries. BMC Bioinformatics. To appear.
  • Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41 613–627.
  • Donoho, D. L. (2006a). For most large underdetermined systems of equations, the minimal 1-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 907–934.
  • Donoho, D. L. (2006b). For most large underdetermined systems of linear equations, the minimal 1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797–829.
  • Greenshtein, E. (2006). Best subset selection, persistence in high dimensional statistical learning and optimization under 1 constraint. Ann. Statist. 34 2367–2386.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Data Mining, Inference and Prediction. Springer, New York.
  • Ledoux, M. (1996). Talagrand deviation inequalities for product measures. ESAIM Probab. Statist. 1 63–87.
  • Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer, Berlin.
  • Loubes, J.-M. and van de Geer, S. (2002). Adaptive estimation in regression, using soft thresholding type penalties. Statist. Neerl. 56 453–478.
  • Massart, P. (2000a). About the constants in Talagrand’s concentration inequalities for empirical processes. Ann. Probab. 28 863–884.
  • Massart, P. (2000b). Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse Math. (6) 9 245–303.
  • Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. Research Report 131, ETH Zürich. J. Roy. Statist. Soc. Ser. B. To appear.
  • Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Yu, B. (2007). Lasso-type recovery of sparse representations for high-dimensional data. Technical Report 720, Dept. Statistics, UC Berkeley.
  • Rockafeller, R. T. (1970). Convex Analysis. Princeton Univ. Press.
  • Tarigan, B. and van de Geer, S. A. (2006). Classifiers of support vector machine type with 1 complexity regularization. Bernoulli 12 1045–1076.
  • Tibshirani, R. (1996). Regression analysis and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.
  • van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press.
  • van de Geer, S. (2003). Adaptive quantile regression. In Recent Advances and Trends in Nonparametric Statistics (M. G. Akritas and D. N. Politis, eds.) 235–250. North-Holland, Amsterdam.
  • Zhang, C.-H. and Huang, J. (2006). Model-selection consistency of the Lasso in high-dimensional linear regression. Technical Report 2006-003, Dept. Statistics, Rutgers Univ.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.