Annals of Statistics

A weight-relaxed model averaging approach for high-dimensional generalized linear models

Tomohiro Ando and Ker-chau Li

Full-text: Open access


Model averaging has long been proposed as a powerful alternative to model selection in regression analysis. However, how well it performs in high-dimensional regression is still poorly understood. Recently, Ando and Li [J. Amer. Statist. Assoc. 109 (2014) 254–265] introduced a new method of model averaging that allows the number of predictors to increase as the sample size increases. One notable feature of Ando and Li’s method is the relaxation on the total model weights so that weak signals can be efficiently combined from high-dimensional linear models. It is natural to ask if Ando and Li’s method and results can be extended to nonlinear models. Because all candidate models should be treated as working models, the existence of a theoretical target of the quasi maximum likelihood estimator under model misspecification needs to be established first. In this paper, we consider generalized linear models as our candidate models. We establish a general result to show the existence of pseudo-true regression parameters under model misspecification. We derive proper conditions for the leave-one-out cross-validation weight selection to achieve asymptotic optimality. Technically, the pseudo true target parameters between working models are not linearly linked. To overcome the encountered difficulties, we employ a novel strategy of decomposing and bounding the bias and variance terms in our proof. We conduct simulations to illustrate the merits of our model averaging procedure over several existing methods, including the lasso and group lasso methods, the Akaike and Bayesian information criterion model-averaging methods and some other state-of-the-art regularization methods.

Article information

Ann. Statist., Volume 45, Number 6 (2017), 2654-2679.

Received: December 2015
Revised: September 2016
First available in Project Euclid: 15 December 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J12: Generalized linear models
Secondary: 62F99: None of the above, but in this section

Asymptotic optimality high-dimensional regression models model averaging model misspecification


Ando, Tomohiro; Li, Ker-chau. A weight-relaxed model averaging approach for high-dimensional generalized linear models. Ann. Statist. 45 (2017), no. 6, 2654--2679. doi:10.1214/17-AOS1538.

Export citation


  • Akaike, H. (1978). On the likelihood of a time series model. J. R. Stat. Soc., Ser. D Stat. 27 217–235.
  • Akaike, H. (1979). A Bayesian extension of the minimum AIC procedure of autoregressive model fitting. Biometrika 66 237–242.
  • Ando, T. (2009). Bayesian portfolio selection using multifactor model. Int. J. Forecast. 25 550–566.
  • Ando, T. and Li, K.-C. (2014). A model-averaging approach for high-dimensional regression. J. Amer. Statist. Assoc. 109 254–265.
  • Ando, T. and Li, K. (2017). Supplement to “A weight-relaxed model averaging approach for high-dimensional generalized linear models.” DOI:10.1214/17-AOS1538SUPP.
  • Ando, T. and Tsay, R. (2010). Predictive likelihood for Bayesian model selection and averaging. Int. J. Forecast. 26 744–763.
  • Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 232–253.
  • Bühlmann, P., Kalisch, M. and Maathuis, M. K. (2010). Variable selection in high-dimensional linear models: Partially faithful distributions and the PC-simple algorithm. Biometrika 97 261–278.
  • Charkhi, A., Claeskens, G. and Hansen, B. E. (2016). Minimum mean squared error model averaging in likelihood models. Statist. Sinica 26 809–840.
  • Chung, T. S., Rust, R. T. and Wedel, M. (2009). My mobile music: An adaptive personalization system for digital audio players. Marketing Sci. 28 52–68.
  • Claeskens, G. and Hjort, N. L. (2003). The focused information criterion. J. Amer. Statist. Assoc. 98 900–945.
  • Eklund, J. and Karlsson, S. (2007). Forecast combination and model averaging using predictive measures. Econometric Rev. 26 329–363.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Flynn, C. J., Hurvich, C. M. and Simonoff, J. S. (2013). Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. J. Amer. Statist. Assoc. 108 1031–1043.
  • Hansen, B. E. (2007). Least squares model averaging. Econometrica 75 1175–1189.
  • Hansen, B. E. and Racine, J. S. (2012). Jackknife model averaging. J. Econometrics 167 38–46.
  • Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98 879–899.
  • Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statist. Sci. 14 382–417.
  • Kass, R. and Raftery, A. (1995). Bayes factors and model uncertainty. J. Amer. Statist. Assoc. 90 773–795.
  • Lee, Y. S. (2014). Management of a periodic-review inventory system using Bayesian model averaging when new marketing efforts are made. Int. J. Production Econ. 158 278–289.
  • Li, K.-C. (1986). Asymptotic optimality of $C_{L}$ and generalized cross-validation in ridge regression with application to spline smoothing. Ann. Statist. 14 1011–1112.
  • Li, K.-C. (1987). Asymptotic optimality for $C_{p}$, $C_{L}$, cross-validation and generalized cross-validation: Discrete index set. Ann. Statist. 15 958–975.
  • Lv, J. and Liu, J. S. (2014). Model selection principles in misspecified models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 141–167.
  • Madigan, D. and Raftery, A. E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Amer. Statist. Assoc. 89 1535–1546.
  • Min, K.-C. and Zellner, A. (1992). Bayesian and non-Bayesian methods for combining models and forecasts with applications to forecasting international growth rates. J. Econometrics 56 89–118.
  • Montgomery, J. M. and Nyhan, B. (2010). Bayesian model averaging: Theoretical developments and practical applications. Polit. Anal. 18 245–270.
  • Moro, S., Laureano, R. and Cortez, P. (2011). Using data mining for bank direct marketing: An application of the CRISP-DM methodology. In Proceedings of the European Simulation and Modelling Conference 117–121.
  • Ouysse, R. and Kohn, R. (2010). Bayesian variable selection and model averaging in the arbitrage pricing theory model. Comput. Statist. Data Anal. 54 3249–3268.
  • Raftery, A. E., Madigan, D. and Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. J. Amer. Statist. Assoc. 92 179–191.
  • Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040–1053.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • Wan, A. T. K., Zhang, X. and Zou, G. (2010). Least squares model averaging by Mallows criterion. J. Econometrics 156 277–283.
  • White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1–25.
  • Yeung, K. E., Bumgarner, R. E. and Raftery, A. E. (2005). Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21 2394–2402.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
  • Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zhang, Y., Li, R. and Tsai, C.-L. (2010). Regularization parameter selections via generalized information criterion. J. Amer. Statist. Assoc. 105 312–323.

Supplemental materials

  • Supplementary material. Due to space constraints, the proof of the claims (4.8) and (4.9), the proof of Lemma 3, and further simulation studies are relegated to the supplementary document. Supplementary document also contains Theorem 3 and Lemma 4.