Open Access
June 2018 Test for high-dimensional regression coefficients using refitted cross-validation variance estimation
Hengjian Cui, Wenwen Guo, Wei Zhong
Ann. Statist. 46(3): 958-988 (June 2018). DOI: 10.1214/17-AOS1573
Abstract

Testing a hypothesis for high-dimensional regression coefficients is of fundamental importance in the statistical theory and applications. In this paper, we develop a new test for the overall significance of coefficients in high-dimensional linear regression models based on an estimated U-statistics of order two. With the aid of the martingale central limit theorem, we prove that the asymptotic distributions of the proposed test are normal under two different distribution assumptions. Refitted cross-validation (RCV) variance estimation is utilized to avoid the overestimation of the variance and enhance the empirical power. We examine the finite-sample performances of the proposed test via Monte Carlo simulations, which show that the new test based on the RCV estimator achieves higher powers, especially for the sparse cases. We also demonstrate an application by an empirical analysis of a microarray data set on Yorkshire gilts.

References

1.

[1] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329. 0848.62030[1] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329. 0848.62030

2.

[2] Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.[2] Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.

3.

[3] Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835. 1183.62095 10.1214/09-AOS716 euclid.aos/1266586615[3] Chen, S. X. and Qin, Y. L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835. 1183.62095 10.1214/09-AOS716 euclid.aos/1266586615

4.

[4] Chen, S. X., Zhang, L. X. and Zhong, P. S. (2010). Tests for high dimensional covariance matrices. J. Amer. Statist. Assoc. 105 810–819. 1321.62086 10.1198/jasa.2010.tm09560[4] Chen, S. X., Zhang, L. X. and Zhong, P. S. (2010). Tests for high dimensional covariance matrices. J. Amer. Statist. Assoc. 105 810–819. 1321.62086 10.1198/jasa.2010.tm09560

5.

[5] Cui, H., Guo, W. and Zhong, W. (2018). Supplement to “Test for high-dimensional regression coefficients using refitted cross-validation variance estimation.”  DOI:10.1214/17-AOS1573SUPP.[5] Cui, H., Guo, W. and Zhong, W. (2018). Supplement to “Test for high-dimensional regression coefficients using refitted cross-validation variance estimation.”  DOI:10.1214/17-AOS1573SUPP.

6.

[6] Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 37–65.[6] Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 37–65.

7.

[7] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and it oracle properties. J. Amer. Statist. Assoc. 96 1348–1360. 1073.62547 10.1198/016214501753382273[7] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and it oracle properties. J. Amer. Statist. Assoc. 96 1348–1360. 1073.62547 10.1198/016214501753382273

8.

[8] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849–911.[8] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849–911.

9.

[9] Fang, K. T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and Related Distributions. Chapman & Hall, London. 0699.62048[9] Fang, K. T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and Related Distributions. Chapman & Hall, London. 0699.62048

10.

[10] Goeman, J. J., Finos, L. and van Houwelingen, J. C. (2011). Testing against a high dimensional alternative in the generalized linear model: Asymptotic alpha-level control. Biometrika 98 381–390. 1215.62068 10.1093/biomet/asr016[10] Goeman, J. J., Finos, L. and van Houwelingen, J. C. (2011). Testing against a high dimensional alternative in the generalized linear model: Asymptotic alpha-level control. Biometrika 98 381–390. 1215.62068 10.1093/biomet/asr016

11.

[11] Goeman, J. J., van de Geer, S. and van Houwelingen, J. C. (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 477–493. 1110.62002 10.1111/j.1467-9868.2006.00551.x[11] Goeman, J. J., van de Geer, S. and van Houwelingen, J. C. (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 477–493. 1110.62002 10.1111/j.1467-9868.2006.00551.x

12.

[12] Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Academic Press, New York. 0462.60045[12] Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Academic Press, New York. 0462.60045

13.

[13] Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc. 107 1129–1139. 06104681 10.1080/01621459.2012.695654[13] Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc. 107 1129–1139. 06104681 10.1080/01621459.2012.695654

14.

[14] Lkhagvadorj, S., Qu, L., Cai, W., Couture, O. P., Barb, C. R., Hausman, G. J., Nettleton, D., Anderson, L. L., Dekkers, J. C. M. and Tuggle, C. K. (2009). Microarray gene expression profiles of fasting induced changes in liver and adipose tissues of pigs expressing the melanocortin-4 receptor D298N variant. Physiol. Genomics 38 98–111.[14] Lkhagvadorj, S., Qu, L., Cai, W., Couture, O. P., Barb, C. R., Hausman, G. J., Nettleton, D., Anderson, L. L., Dekkers, J. C. M. and Tuggle, C. K. (2009). Microarray gene expression profiles of fasting induced changes in liver and adipose tissues of pigs expressing the melanocortin-4 receptor D298N variant. Physiol. Genomics 38 98–111.

15.

[15] Rao, C. R., Touteburg, H., Shalabh and Heumann, C. (2008). Linear Models and Generalizations. Springer, New York.[15] Rao, C. R., Touteburg, H., Shalabh and Heumann, C. (2008). Linear Models and Generalizations. Springer, New York.

16.

[16] Schmidt, R. (2001). Tail dependence for elliptically contoured distributions. Math. Methods Oper. Res. 55 301–327. 1015.62052 10.1007/s001860200191[16] Schmidt, R. (2001). Tail dependence for elliptically contoured distributions. Math. Methods Oper. Res. 55 301–327. 1015.62052 10.1007/s001860200191

17.

[17] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 99 386–402. 1148.62042 10.1016/j.jmva.2006.11.002[17] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 99 386–402. 1148.62042 10.1016/j.jmva.2006.11.002

18.

[18] Tibshirani, R. (1996). Regression shrinkage and selection via LASSO. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288. 0850.62538 10.1111/j.2517-6161.1996.tb02080.x[18] Tibshirani, R. (1996). Regression shrinkage and selection via LASSO. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288. 0850.62538 10.1111/j.2517-6161.1996.tb02080.x

19.

[19] Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. J. Amer. Statist. Assoc. 110 1658–1669. 1373.62280 10.1080/01621459.2014.988215[19] Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. J. Amer. Statist. Assoc. 110 1658–1669. 1373.62280 10.1080/01621459.2014.988215

20.

[20] Wang, S. and Cui, H. (2013). Generalized $F$ test for high dimensional linear regression coefficients. J. Multivariate Anal. 117 134–149.[20] Wang, S. and Cui, H. (2013). Generalized $F$ test for high dimensional linear regression coefficients. J. Multivariate Anal. 117 134–149.

21.

[21] Wang, S. and Cui, H. (2015). A new test for part of high dimensional regression coefficients. J. Multivariate Anal. 137 187–203. 1329.62098 10.1016/j.jmva.2015.02.014[21] Wang, S. and Cui, H. (2015). A new test for part of high dimensional regression coefficients. J. Multivariate Anal. 137 187–203. 1329.62098 10.1016/j.jmva.2015.02.014

22.

[22] Yata, K. and Aoshima, M. (2013). Correlation tests for high-dimensional data using extended cross-data-matrix methodology. J. Multivariate Anal. 117 313–331. 1277.62150 10.1016/j.jmva.2013.03.007[22] Yata, K. and Aoshima, M. (2013). Correlation tests for high-dimensional data using extended cross-data-matrix methodology. J. Multivariate Anal. 117 313–331. 1277.62150 10.1016/j.jmva.2013.03.007

23.

[23] Zhang, C. H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594. 1142.62044 10.1214/07-AOS520 euclid.aos/1216237292[23] Zhang, C. H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594. 1142.62044 10.1214/07-AOS520 euclid.aos/1216237292

24.

[24] Zhong, P. S. and Chen, S. X. (2011). Tests for high-dimensional regression coefficients with factorial designs. J. Amer. Statist. Assoc. 106 260–274. 1396.62110 10.1198/jasa.2011.tm10284[24] Zhong, P. S. and Chen, S. X. (2011). Tests for high-dimensional regression coefficients with factorial designs. J. Amer. Statist. Assoc. 106 260–274. 1396.62110 10.1198/jasa.2011.tm10284

25.

[25] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429. 1171.62326 10.1198/016214506000000735[25] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429. 1171.62326 10.1198/016214506000000735
Copyright © 2018 Institute of Mathematical Statistics
Hengjian Cui, Wenwen Guo, and Wei Zhong "Test for high-dimensional regression coefficients using refitted cross-validation variance estimation," The Annals of Statistics 46(3), 958-988, (June 2018). https://doi.org/10.1214/17-AOS1573
Received: 1 February 2016; Published: June 2018
Vol.46 • No. 3 • June 2018
Back to Top