Electronic Journal of Statistics

Hypothesis testing sure independence screening for nonparametric regression

Adriano Zanin Zambom and Michael G. Akritas

Full-text: Open access

Abstract

In this paper we develop a sure independence screening method based on hypothesis testing (HT-SIS) in a general nonparametric regression model. The ranking utility is based on a powerful test statistic for the hypothesis of predictive significance of each available covariate. The sure screening property of HT-SIS is established, demonstrating that all active predictors will be retained with high probability as the sample size increases. The threshold parameter is chosen in a theoretically justified manner based on the desired false positive selection rate. Simulation results suggest that the proposed method performs competitively against procedures found in the literature of screening for several models, and outperforms them in some scenarios. A real dataset of microarray gene expressions is analyzed.

Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 767-792.

Dates
Received: January 2017
First available in Project Euclid: 3 March 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1520046228

Digital Object Identifier
doi:10.1214/18-EJS1405

Keywords
ANOVA false discovery rate lack-of-fit test multiple testing nonparametric regression

Rights
Creative Commons Attribution 4.0 International License.

Citation

Zambom, Adriano Zanin; Akritas, Michael G. Hypothesis testing sure independence screening for nonparametric regression. Electron. J. Statist. 12 (2018), no. 1, 767--792. doi:10.1214/18-EJS1405. https://projecteuclid.org/euclid.ejs/1520046228


Export citation

References

  • [1] Benjamini, Y. and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing., Journal of the Royal Statistical Society - B, vol. 57, pp. 289–300.
  • [2] Berry, A. C. (1941). The Accuracy of the Gaussian Approximation to the Sum of Independent Variates., Transactions of the American Mathematical Society, 49, 122–136
  • [3] Bunea, F., Wegkamp M. H. and Auguste, A. (2006). Consistent variable selection in high dimensional regression via multiple testing., Journal of Statistical Planning and Inference, vol. 136, pp. 4349–4364.
  • [4] Candes, E., and Tao, T. (2007). The Dantzig Selector: Statistical Estimation When p is Much Larger Than n (with discussion), The Annals of Statistics, vol. 35, pp. 2313–2404.
  • [5] Dvoretzky, A., Kiefer, J. and Wolfowitz, J. (1956), Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator., Annals of Mathematical Statistics, vol. 27, pp. 642–669.
  • [6] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least Angle Regression (with discussion)., The Annals of Statistics, vol. 32, pp. 409–499.
  • [7] Fan, J., and Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties, Journal of the American Statistical Association, vol. 96, pp. 1348–1360.
  • [8] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion)., Journal of the Royal Statistical Society - B, vol. 70, pp. 849–911.
  • [9] Fan, J., Samworth, R., and Wu, Y. (2009). Ultrahigh Dimensional Feature Selection: Beyond the Linear Model., Journal of Machine Learning Research, vol. 10, pp. 1829–1853.
  • [10] Fan, J., and Song, R. (2010). Sure Independence Screening in Generalized Linear Models With NP-Dimensionality., The Annals of Statistics, vol. 38, pp. 3567–3604.
  • [11] Fan, J., Feng, Y., and Song, R. (2011). Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models., Journal of the American Statistical Association, vol. 106, pp. 544–557.
  • [12] Gorst-Rasmussen, A. and Scheike, T. (2012). Independent screening for single-index hazard rate models with ultrahigh dimensional features., Journal of the Royal Statistical Society: Series B, vol. 75, pp. 217–245.
  • [13] Hall, P., and Miller,H. (2009). Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems., Journal of Computational and Graphical Statistics, vol. 18, pp. 533–550.
  • [14] He, X., Wang, L. and Hong, H. G. (2013). Quantile-adaptive model free variable screening for high-dimensional heterogeneous data., The Annals of Statistics, vol. 41, pp. 342–369.
  • [15] Li, R., Zhong, W. and Zhu, L. (2012). Feature Screening via Distance Correlation Learning., Journal of the American Statistical Association, vol. 107, pp. 1129–1139.
  • [16] Segal, M. R., Dahlquist, K. D., and Conklin, B. R. (2003). Regression Approach for Microarray Data Analysis., Journal of Computational Biology, vol. 10, pp. 961–980.
  • [17] Tibshirani, R. (1996). Regression Shrinkage and Selection via LASSO, Journal of the Royal Statistical Society, Series B, vol. 58, 267–288.
  • [18] Wang, H. (2012). Factor profiled sure independence screening., Biometrika, 99, 15–28.
  • [19] Wang, L., Akritas, M. G. and Keilegom, I. V. (2008). An ANOVA-type Nonparametric Diagnostic Test for Heterocedastic Regression Models., Journal of Nonparametric Statistics vol. 00, pp. 1–19.
  • [20] Zambom, A. Z. and Akritas, M. G. (2014) Nonparametric lack-of-fit testing and consistent variable selection., Statistica Sinica, 24, 1837–1858.
  • [21] Zhao, S. D. and Li, Y. (2012). Principled sure independence screening for Cox models with ultra-high-dimensional covariates., Journal of Multivariate Analysss, vol. 105, pp. 397–411.
  • [22] Zhong, W. (2014). Robust sure independence screening for ultrahigh dimensional non-normal data., Acta Mathematica Sinica, English Series, vol. 30, pp. 1885–1896.
  • [23] Zhu, L. P., Li, L., Li, R., and Zhu, L. X. (2011). Model-Free Feature Screening for Ultrahigh Dimensional Data., Journal of the American Statistical Association vol. 106, pp. 1464–1475.
  • [24] Zou, H., and Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net, Journal of the Royal Statistical Society, Series B, vol. 67, pp. 301–320.