Bernoulli

  • Bernoulli
  • Volume 19, Number 4 (2013), 1212-1242.

Statistical significance in high-dimensional linear models

Peter Bühlmann

Full-text: Open access

Abstract

We propose a method for constructing $p$-values for general hypotheses in a high-dimensional linear model. The hypotheses can be local for testing a single regression parameter or they may be more global involving several up to all parameters. Furthermore, when considering many hypotheses, we show how to adjust for multiple testing taking dependence among the $p$-values into account. Our technique is based on Ridge estimation with an additional correction term due to a substantial projection bias in high dimensions. We prove strong error control for our $p$-values and provide sufficient conditions for detection: for the former, we do not make any assumption on the size of the true underlying regression coefficients while regarding the latter, our procedure might not be optimal in terms of power. We demonstrate the method in simulated examples and a real data application.

Article information

Source
Bernoulli, Volume 19, Number 4 (2013), 1212-1242.

Dates
First available in Project Euclid: 27 August 2013

Permanent link to this document
https://projecteuclid.org/euclid.bj/1377612849

Digital Object Identifier
doi:10.3150/12-BEJSP11

Mathematical Reviews number (MathSciNet)
MR3102549

Zentralblatt MATH identifier
1273.62173

Keywords
global testing lasso multiple testing ridge regression variable selection Westfall–Young permutation procedure

Citation

Bühlmann, Peter. Statistical significance in high-dimensional linear models. Bernoulli 19 (2013), no. 4, 1212--1242. doi:10.3150/12-BEJSP11. https://projecteuclid.org/euclid.bj/1377612849


Export citation

References

  • Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bickel, P.J., Klaassen, C.A.J., Ritov, Y. and Wellner, J.A. (1998). Efficient and Adaptive Estimation for Semiparametric Models. New York: Springer.
  • Bühlmann, P. (2006). Boosting for high-dimensional linear models. Ann. Statist. 34 559–583.
  • Bühlmann, P., Kalisch, M. and Maathuis, M.H. (2010). Variable selection in high-dimensional linear models: Partially faithful distributions and the PC-simple algorithm. Biometrika 97 261–278.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
  • Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • Dettling, M. (2004). BagBoosting for tumor classification with gene expression data. Bioinformatics 20 3583–3593.
  • El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757–2790.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer Series in Statistics. New York: Springer.
  • Huang, J., Ma, S. and Zhang, C.H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statist. Sinica 18 1603–1618.
  • Ingster, Y.I., Tsybakov, A.B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
  • Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
  • Koltchinskii, V. (2009a). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799–828.
  • Koltchinskii, V. (2009b). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7–57.
  • Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417–473.
  • Meinshausen, N., Maathuis, M. and Bühlmann, P. (2011). Asymptotic optimality of the Westfall–Young permutation procedure for multiple testing under dependence. Ann. Statist. 39 3369–3391.
  • Meinshausen, N., Meier, L. and Bühlmann, P. (2009). $p$-values for high-dimensional regression. J. Amer. Statist. Assoc. 104 1671–1681.
  • Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • Raskutti, G., Wainwright, M.J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
  • Shao, J. and Deng, X. (2012). Estimation in high-dimensional linear models with deterministic design matrices. Ann. Statist. 40 812–831.
  • Sun, T. and Zhang, C.H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tropp, J.A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50 2231–2242.
  • van de Geer, S. (2007). The deterministic Lasso. In JSM Proceedings, 2007 140. American Statistical Association.
  • van de Geer, S.A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
  • van de Geer, S.A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
  • van de Geer, S., Bühlmann, P. and Zhou, S. (2011). The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. J. Stat. 5 688–749.
  • Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing (Y. Eldar and G. Kutyniok, eds.) 210–268. Cambridge: Cambridge Univ. Press.
  • Wainwright, M.J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. J. Amer. Statist. Assoc. 104 1512–1524.
  • Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
  • Westfall, P. and Young, S. (1993). Resampling-based Multiple Testing: Examples and Methods for $P$-value Adjustment. New York: John Wiley & Sons.
  • Zhang, C.H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zhang, C.H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • Zhang, C.H. and Zhang, S. (2011). Confidence intervals for low-dimensional parameters with high-dimensional data. Available at arXiv:1110.2563v1.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.