## Electronic Journal of Statistics

### Prediction weighted maximum frequency selection

#### Abstract

Shrinkage estimators that possess the ability to produce sparse solutions have become increasingly important to the analysis of today’s complex datasets. Examples include the LASSO, the Elastic-Net and their adaptive counterparts. Estimation of penalty parameters still presents difficulties however. While variable selection consistent procedures have been developed, their finite sample performance can often be less than satisfactory. We develop a new strategy for variable selection using the adaptive LASSO and adaptive Elastic-Net estimators with $p_{n}$ diverging. The basic idea first involves using the trace paths of their LARS solutions to bootstrap estimates of maximum frequency (MF) models conditioned on dimension. Conditioning on dimension effectively mitigates overfitting, however to deal with underfitting, these MFs are then prediction-weighted, and it is shown that not only can consistent model selection be achieved, but that attractive convergence rates can as well, leading to excellent finite sample performance. Detailed numerical studies are carried out on both simulated and real datasets. Extensions to the class of generalized linear models are also detailed.

#### Article information

Source
Electron. J. Statist., Volume 11, Number 1 (2017), 640-681.

Dates
First available in Project Euclid: 3 March 2017

https://projecteuclid.org/euclid.ejs/1488531638

Digital Object Identifier
doi:10.1214/17-EJS1240

Mathematical Reviews number (MathSciNet)
MR3619319

Zentralblatt MATH identifier
1359.62298

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators

#### Citation

Liu, Hongmei; Rao, J. Sunil. Prediction weighted maximum frequency selection. Electron. J. Statist. 11 (2017), no. 1, 640--681. doi:10.1214/17-EJS1240. https://projecteuclid.org/euclid.ejs/1488531638

#### References

• [1] Breiman, L. (1995). Better subset regression using the nonnegative garrote., Technometrics 37, 4, 373–384.
• [2] Chen, J. and Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces., Biometrika 95, 3, 759–771.
• [3] Chen, J. and Chen, Z. (2012). Extended bic for small-n-large-p sparse glm., Statistica Sinica 22, 2, 555.
• [4] Dey, T., Ishwaran, H., and Rao, J. S. (2008). An in-depth look at highest posterior model selection., Econometric Theory 24, 02, 377–403.
• [5] Efron, B. (1979). Computers and the theory of statistics: thinking the unthinkable., SIAM review 21, 4, 460–480.
• [6] Efron, B. and Efron, B. (1982)., The jackknife, the bootstrap and other resampling plans. Vol. 38. SIAM.
• [7] Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., and others. (2004). Least angle regression., The Annals of statistics 32, 2, 407–499.
• [8] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American statistical Association 96, 456, 1348–1360.
• [9] Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery., arXiv preprint math/0602133.
• [10] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 5, 849–911.
• [11] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space., Statistica Sinica 20, 1, 101.
• [12] Fan, Y. and Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, 3, 531–552.
• [13] Feng, Y. and Yu, Y. (2013). Consistent cross-validation for tuning parameter selection in high-dimensional variable selection., arXiv preprint arXiv:1308.5390.
• [14] Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent., Journal of statistical software 33, 1, 1.
• [15] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., and others. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring., Science 286, 5439, 531–537.
• [16] Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems., Technometrics 12, 1, 55–67.
• [17] McCullagh, P. and Nelder, J. A. (1989)., Generalized linear models. Vol. 37. CRC press.
• [18] Meinshausen, N. and Bühlmann, P. (2010). Stability selection., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 4, 417–473.
• [19] Rao, J. S. and Tibshirani, R. (1997). The out-of-bootstrap method for model averaging and selection., University of Toronto.
• [20] Shao, J. (1993). Linear model selection by cross-validation., Journal of the American statistical Association 88, 422, 486–494.
• [21] Shao, J. (1996). Bootstrap model selection., Journal of the American Statistical Association 91, 434, 655–665.
• [22] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
• [23] Wang, H. and Leng, C. (2007). Unified lasso estimation by least squares approximation., Journal of the American Statistical Association 102, 479.
• [24] Wang, H., Li, B., and Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71, 3, 671–683.
• [25] Zhang, P. (1993). Model selection via multifold cross validation., The Annals of Statistics, 299–313.
• [26] Zou, H. (2006). The adaptive lasso and its oracle properties., Journal of the American statistical association 101, 476, 1418–1429.
• [27] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67, 2, 301–320.
• [28] Zou, H., Hastie, T., Tibshirani, R., and others. (2007). On the “degrees of freedom” of the lasso., The Annals of Statistics 35, 5, 2173–2192.
• [29] Zou, H. and Zhang, H. H. (2009). On the adaptive elastic-net with a diverging number of parameters., Annals of statistics 37, 4, 1733.