Electronic Journal of Statistics

AIC for the Lasso in generalized linear models

Yoshiyuki Ninomiya and Shuichi Kawano

Full-text: Open access


The Lasso is a popular regularization method that can simultaneously do estimation and model selection. It contains a regularization parameter, and several information criteria have been proposed for selecting its proper value. While any of them would assure consistency in model selection, we have no appropriate rule to choose between the criteria. Meanwhile, a finite correction to the AIC has been provided in a Gaussian regression setting. The finite correction is theoretically assured from the viewpoint not of the consistency but of minimizing the prediction error and does not have the above-mentioned difficulty. Our aim is to derive such a criterion for the Lasso in generalized linear models. Towards this aim, we derive a criterion from the original definition of the AIC, that is, an asymptotically unbiased estimator of the Kullback-Leibler divergence. This becomes the finite correction in the Gaussian regression setting, and so our criterion can be regarded as its generalization. Our criterion can be easily obtained and requires fewer computational tasks than does cross-validation, but simulation studies and real data analyses indicate that its performance is almost the same as or superior to that of cross-validation. Moreover, our criterion is extended for a class of other regularization methods.

Article information

Electron. J. Statist., Volume 10, Number 2 (2016), 2537-2560.

Received: March 2016
First available in Project Euclid: 9 September 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J07: Ridge regression; shrinkage estimators 62J12: Generalized linear models
Secondary: 62E20: Asymptotic distribution theory 62F12: Asymptotic properties of estimators

Convexity lemma information criterion Kullback-Leibler divergence statistical asymptotic theory tuning parameter variable selection


Ninomiya, Yoshiyuki; Kawano, Shuichi. AIC for the Lasso in generalized linear models. Electron. J. Statist. 10 (2016), no. 2, 2537--2560. doi:10.1214/16-EJS1179. https://projecteuclid.org/euclid.ejs/1473431413

Export citation


  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In, 2nd International Symposium on Information Theory, eds. B. N. Petrov and F. Csaki, Budapest: Akademiai Kiado 716–723.
  • Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: a large sample study., Ann. Statist. 10 1100–1120.
  • Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression., Ann. Statist. 35 1674–1697. DOI: 10.1214/009053606000001587.
  • Chételat, D., Lederer, J. and Salmon, J. (2014). Optimal two-step prediction in regression., arXiv preprint arXiv:1410.5014.
  • Claeskens, G. and Hjort, N. L. (2003). The focused information criterion., J. Amer. Statist. Assoc. 98 900–945. With discussions and a rejoinder by the authors. DOI: 10.1198/016214503000000819.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression., Ann. Statist. 32 407–499. With discussion, and a rejoinder by the authors. DOI: 10.1214/009053604000000067.
  • Fan, Y. and Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood., J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 531–552. DOI: 10.1111/rssb.12001.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso., Biostatistics 9 432–441.
  • Geyer, C. J. (1996). On the asymptotics of convex stochastic optimization., Unpublished manuscript.
  • Hjort, N. L. and Pollard, D. (1993). Asymptotics for Minimisers of Convex Processes., Unpublished manuscript.
  • Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples., Biometrika 76 297–307. DOI: 10.1093/biomet/76.2.297.
  • Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression., J. Mach. Learn. Res. 15 2869–2909.
  • Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators., Ann. Statist. 28 1356–1378. DOI: 10.1214/aos/1015957397.
  • Konishi, S. and Kitagawa, G. (1996). Generalised information criteria in model selection., Biometrika 83 875–890. DOI: 10.1093/biomet/83.4.875.
  • Konishi, S. and Kitagawa, G. (2008)., Information criteria and statistical modeling. Springer Series in Statistics. Springer, New York. DOI: 10.1007/978-0-387-71887-3.
  • Kullback, S. and Leibler, R. A. (1951). On information and sufficiency., Ann. Math. Statistics 22 79–86.
  • Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2013). Exact post-selection inference, with application to the lasso., arXiv preprint arXiv:1311.6238.
  • Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the lasso., Ann. Statist. 42 413–468. DOI: 10.1214/13-AOS1175.
  • Lv, J. and Liu, J. S. (2014). Model selection principles in misspecified models., J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 141–167. DOI: 10.1111/rssb.12023.
  • McCullagh, P. and Nelder, J. A. (1983)., Generalized linear models. Monographs on Statistics and Applied Probability. Chapman & Hall, London.
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection., J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417–473. DOI: 10.1111/j.1467-9868.2010.00740.x.
  • Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data., Ann. Statist. 37 246–270. DOI: 10.1214/07-AOS582.
  • Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators., Econometric Theory 7 186–199. DOI: 10.1017/S0266466600004394.
  • Rockafellar, R. T. (1970)., Convex analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton, N.J.
  • Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A. and Nolan, G. P. (2005). Causal protein-signaling networks derived from multiparameter single-cell data., Science 308 523–529.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit., J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583–639. DOI: 10.1111/1467-9868.00353.
  • Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution., Ann. Statist. 9 1135–1151.
  • Stone, M. (1974). Cross-validation and multinomial prediction., Biometrika 61 509–515.
  • Sugiura, N. (1978). Further analysts of the data by Akaike’s information criterion and the finite corrections., Comm. Statist. Theory Methods 7 13–26.
  • Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression., Biometrika 99 879–898. DOI: 10.1093/biomet/ass043.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso., Ann. Statist. 39 1335–1371. DOI: 10.1214/11-AOS878.
  • van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso., Ann. Statist. 36 614–645. DOI: 10.1214/009053607000000929.
  • Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using-constrained quadratic programming (Lasso)., IEEE Trans. Inform. Theory 55 2183–2202.
  • Wang, H., Li, B. and Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters., J. R. Stat. Soc. Ser. B Stat. Methodol. 71 671–683. DOI: 10.1111/j.1467-9868.2008.00693.x.
  • Xie, M. and Yang, Y. (2003). Asymptotics for generalized estimating equations with large cluster sizes., Ann. Statist. 31 310–347. DOI: 10.1214/aos/1046294467.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model., Biometrika 94 19–35. DOI: 10.1093/biomet/asm018.
  • Zhang, Y., Li, R. and Tsai, C.-L. (2010). Regularization parameter selections via generalized information criterion., J. Amer. Statist. Assoc. 105 312–323. With supplementary material available online. DOI: 10.1198/jasa.2009.tm08013.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., J. Mach. Learn. Res. 7 2541–2563.
  • Zou, H. (2006). The adaptive lasso and its oracle properties., J. Amer. Statist. Assoc. 101 1418–1429. DOI: 10.1198/016214506000000735.
  • Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso., Ann. Statist. 35 2173–2192. DOI: 10.1214/009053607000000127.