Electronic Journal of Statistics

BS-SIM: An effective variable selection method for high-dimensional single index model

Longjie Cheng, Peng Zeng, and Yu Zhu

Full-text: Open access

Abstract

The single index model is an intuitive extension of the linear regression model. It has become increasingly popular due to its flexibility in modeling. Similar to the linear regression model, the set of predictors for the single index model can contain a large number of irrelevant variables. Therefore, it is important to select the relevant variables when fitting the single index model. However, the problem of variable selection for high-dimensional single index model is not well settled in the literature. In this work, we combine the idea of applying cubic B-splines for estimating the single index model with the idea of using the family of the smooth integration of counting and absolute deviation (SICA) penalty functions for variable selection. We propose a new method to simultaneously perform parameter estimation and model selection for the single index model. This method is referred to as the B-spline and SICA method for the single index model, or in short, BS-SIM. We develop a coordinate descent algorithm to efficiently implement BS-SIM. We also show that under certain conditions, the proposed method can consistently estimate the true index and select the true model. Simulations with various settings and a real data analysis are conducted to demonstrate the estimation accuracy, selection consistency and computational efficiency of BS-SIM.

Article information

Source
Electron. J. Statist., Volume 11, Number 2 (2017), 3522-3548.

Dates
Received: June 2015
First available in Project Euclid: 6 October 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1507255613

Digital Object Identifier
doi:10.1214/17-EJS1329

Mathematical Reviews number (MathSciNet)
MR3709862

Zentralblatt MATH identifier
1373.62248

Subjects
Primary: 62H12: Estimation
Secondary: 62G08: Nonparametric regression

Keywords
Single index model variable selection regression spline LASSO SICA

Rights
Creative Commons Attribution 4.0 International License.

Citation

Cheng, Longjie; Zeng, Peng; Zhu, Yu. BS-SIM: An effective variable selection method for high-dimensional single index model. Electron. J. Statist. 11 (2017), no. 2, 3522--3548. doi:10.1214/17-EJS1329. https://projecteuclid.org/euclid.ejs/1507255613


Export citation

References

  • [1] Candes, E. J. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion)., The Annals of Statistics, Vol. 35, 2313–2404.
  • [2] de Boor, C. (2001)., A Practical Guide to Splines. Springer-Verlag, New York.
  • [3] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association, 96, 1348–1360.
  • [4] Fan, Y. and Tang, C.Y. (2013). Tuning parameter selection in high dimensional penalized likelihood, Journal of the Royal Statistical Society, Series B, 75, Part 3, 531–552.
  • [5] Härdle, W. and Stoker, T. M. (1989). Investigating smooth multiple regression by the method of average derivatives., Journal of the American Statistical Association, 84, 986-995.
  • [6] Kong E. and Xia Y.C. (2007). Variable selction for the single-index model., Biometrika, 94: 217–229.
  • [7] Liu, Z.J., Xiao, M., Balint, K., Smalley, K.S., Brafford, P., Qiu,R., Pinnix, C.C., Li, X., and Herlyn, M. (2006). Notch1 signaling promotes primary melanoma progression by activating mitogen-activated protein kinase/phosphatidylinositol 3-kinase-Akt pathways and up-regulating N-cadherin expression., Cancer Research, 66, 4182–4190.
  • [8] Lu, M., Breyssens, H., Salter, V., Zhong, S., Hu, Y., Baer, C., Ratnayaka, I., Sullivan, A., Brown, N.R., Endicott, J., Knapp, S., Kessler, B.M., Middleton, M.R., Siebold, C., Jones, E.Y., Sviderskaya, E.V., Cebon, J., John, T, Caballero, O.L., Goding, C.R., and Lu, X. (2013). Restoring p53 Function in Human Melanoma Cells by Inhibiting MDM2 and Cyclin B1/CDK1-Phosphorylated Nuclear iASPP., Cancer cell, 23(5):618-33.
  • [9] Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares., The Annals of Statistics, Vol. 37, No. 6A, 3498-3528.
  • [10] Ming, M. and He, Y.Y. (2009). PTEN: new insights into its regulation and function in skin cancer., Journal of Investigative Dermatology, 129, 2109–2112.
  • [11] Naik, P.A. and TSAI, C.-L. (2001). Single-index model selections., Biometrika, 88, 821–32.
  • [12] Nocedal, J. and Wright, S. (2006)., Numerical Optimization. Springer-Verlag, New York.
  • [13] Osborne, M.R., Presnell, B., and Turlach B.A. (1998). Knot selection for regression splines via the Lasso., Computing Science and Statistics, 30, 44–49.
  • [14] Peng, H. and Huang, T. (2011). Penalized least squares for single index models., Journal of Statistical Planning and Inference, 141, 1362–1379.
  • [15] Piccolo, S., Cordenonsi, M., and Dupont, S. (2013). Molecular pathways: YAP and TAZ take the center stage in organ growth and tumorigenesis., Clinical Cancer Research, 19(18):4925-30.
  • [16] Powell, J. L., Stock, J. H. and Stoker, T. M. (1989). Semiparametric estimation of index coefficients., Econometrica, 57, 1403-1430.
  • [17] Roesch, A., Becker, B., Meyer, S., Hafner, C., Wild, P.J., Landthaler, M., and Vogt, T. (2005). Overexpression and hyperphosphorylation of retinoblastoma protein in the progression of malignant melanoma., Modern Pathology, 18, 565–572.
  • [18] Shao, J. (1997). An asymptotic theory for linear model selection (with discussion)., Statistica Sinica, 7, 221-264.
  • [19] Schwarz, G. (1978). Estimating the dimension of a model., Annals of Statistics, 6 (2), 461–464.
  • [20] Tibshirani, R.J. (1996). Regression shrinkage and selection via the LASSO., Journal of the Royal Statistical Society, Series B, 58, 267-288.
  • [21] van de Geer, S. (2000)., Empirical Processes in M-Estimation. Cambridge University Press.
  • [22] Wang, H., Li, B. and Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters, Journal of the Royal Statistical Society, Series B, 71, Part 3, 671–683.
  • [23] Wang, L. and Yang, L. (2009). Spline estimation of single-index models., Statistica Sinica, 19, 765-783.
  • [24] Wang, Q. and Yin, X. (2008). A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse mave., Computational Statistics and Data Analysis, 52, 4512–4520.
  • [25] Wang, T., Xu, P., and Zhu, L. (2013). Penalized minimum average variance estimation., Statistica Sinica, 23, 543-569.
  • [26] Xia, Y. (2006). Asymptotic distribution for two estimators of the single-index model, Econometric Theory, 22, 1112–1137.
  • [27] Xia, Y., Tong, H., Li, W. K. and Zhu, L. (2002). An adaptive estimation of dimension reduction space (with discussion)., Journal of the Royal Statistical Society, Series B, 64, 363-410.
  • [28] Zeng, P., He, T. and Zhu, Y. (2012). A Lasso-Type Approach for Estimation and variable Selection in Single Index Models., Journal of Computational and Graphical Statistics, 21, 92-109.
  • [29] Zhao, P. and Yu, B. (2006). On Model Selection Consistency of Lasso., Journal of Machine Learning Research, 7, 2541-2563.
  • [30] Zhou, S. and Shen, X. (2001). Spatially Adaptive Regression Splines and Accurate Knot Selection Schemes., Journal of the American Statistical Association, Vol. 96, No. 453, 247-259.
  • [31] Zou, H. (2006). The adaptive Lasso and its oracle properties., Journal of the American Statistical Association, 101, 1418–1429.
  • [32] Cheng, L., Zeng, P. and Zhu, Y. (2017). Supplementary Material to “BS-SIM: An Effective Variable Selection Method for High-dimensional Single Index Model”. DOI:, 10.1214/17-EJS1329SUPP.

Supplemental materials

  • Supplementary Material to “BS-SIM: An Effective Variable Selection Method for High-dimensional Single Index Model”. The supplementary material contains the technical proofs and additional simulation results.