The Annals of Statistics

Consistent model selection criteria for quadratically supported risks

Yongdai Kim and Jong-June Jeon

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


In this paper, we study asymptotic properties of model selection criteria for high-dimensional regression models where the number of covariates is much larger than the sample size. In particular, we consider a class of loss functions called the class of quadratically supported risks which is large enough to include the quadratic loss, Huber loss, quantile loss and logistic loss. We provide sufficient conditions for the model selection criteria, which are applicable to the class of quadratically supported risks. Our results extend most previous sufficient conditions for model selection consistency. In addition, sufficient conditions for pathconsistency of the Lasso and nonconvex penalized estimators are presented. Here, pathconsistency means that the probability of the solution path that includes the true model converges to 1. Pathconsistency makes it practically feasible to apply consistent model selection criteria to high-dimensional data. The data-adaptive model selection procedure is proposed which is selection consistent and performs well for finite samples. Results of simulation studies as well as real data analysis are presented to compare the finite sample performances of the proposed data-adaptive model selection criterion with other competitors.

Article information

Ann. Statist., Volume 44, Number 6 (2016), 2467-2496.

Received: April 2015
Revised: November 2015
First available in Project Euclid: 23 November 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Generalized information criteria high dimension model selection quadratically supported risks selection consistency


Kim, Yongdai; Jeon, Jong-June. Consistent model selection criteria for quadratically supported risks. Ann. Statist. 44 (2016), no. 6, 2467--2496. doi:10.1214/15-AOS1413.

Export citation


  • [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) 267–281. Akadémiai Kiadó, Budapest.
  • [2] Belloni, A. and Chernozhukov, V. (2011). $\ell_{1}$-penalized quantile regression in high-dimensional sparse models. Ann. Statist. 39 82–130.
  • [3] Broman, K. W. and Speed, T. P. (2002). A model selection approach for the identification of quantitative trait loci in experimental crosses. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 641–656.
  • [4] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Springer, Heidelberg.
  • [5] Casella, G., Girón, F. J., Martínez, M. L. and Moreno, E. (2009). Consistency of Bayesian procedures for variable selection. Ann. Statist. 37 1207–1228.
  • [6] Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 759–771.
  • [7] Chen, J. and Chen, Z. (2012). Extended BIC for small-$n$-large-$P$ sparse GLM. Statist. Sinica 22 555–574.
  • [8] Craven, P. and Wahba, G. (1978/79). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377–403.
  • [9] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [10] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • [11] Fan, Y. and Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 531–552.
  • [12] Foster, D. and George, E. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
  • [13] Geer, S. A. (2000). Empirical Processes in M-estimation 6, Cambridge University Press, Cambridge.
  • [14] He, X. and Shi, P. (1994). Convergence rate of $B$-spline estimators of nonparametric conditional quantile functions. J. Nonparametr. Statist. 3 299–308.
  • [15] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statist. Sinica 18 1603–1618.
  • [16] Kim, Y., Choi, H. and Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc. 103 1665–1673.
  • [17] Kim, Y. and Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika 99 315–325.
  • [18] Kim, Y., Kwon, S. and Choi, H. (2012). Consistent model selection criteria on high dimensions. J. Mach. Learn. Res. 13 1037–1057.
  • [19] Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge.
  • [20] Lee, E. R., Noh, H. and Park, B. U. (2014). Model selection via Bayesian information criterion for quantile regression models. J. Amer. Statist. Assoc. 109 216–229.
  • [21] Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. Ann. Statist. 17 1009–1052.
  • [22] Portnoy, S. (1985). Asymptotic behavior of $M$ estimators of $p$ regression parameters when $p^{2}/n$ is large. II. Normal approximation. Ann. Statist. 13 1403–1417.
  • [23] Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths. Ann. Statist. 35 1012–1030.
  • [24] Scheetz, T. E., Kim, K.-Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudtson, K. L., Dorrance, A. M., DiBona, G. F., Huang, J., Casavant, T. L., Sheffield, V. C. and Stone, E. M. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. USA 103 14429–14434.
  • [25] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • [26] Shao, J. (1997). An asymptotic theory for linear model selection. Statist. Sinica 7 221–264.
  • [27] Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B 36 111–147.
  • [28] Wang, H., Li, B. and Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 671–683.
  • [29] Wang, L., Kim, Y. and Li, R. (2013). Calibrating nonconvex penalized regression in ultra-high dimension. Ann. Statist. 41 2505–2536.
  • [30] Wang, L., Wu, Y. and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Amer. Statist. Assoc. 107 214–222.
  • [31] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • [32] Zhang, Y. and Shen, X. (2010). Model selection procedure for high-dimensional data. Stat. Anal. Data Min. 3 350–358.