The Annals of Statistics

Exact post-selection inference, with application to the lasso

Jason D. Lee, Dennis L. Sun, Yuekai Sun, and Jonathan E. Taylor

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We develop a general approach to valid inference after model selection. At the core of our framework is a result that characterizes the distribution of a post-selection estimator conditioned on the selection event. We specialize the approach to model selection by the lasso to form valid confidence intervals for the selected coefficients and test whether all relevant variables have been included in the model.

Article information

Source
Ann. Statist. Volume 44, Number 3 (2016), 907-927.

Dates
Received: January 2015
Revised: September 2015
First available in Project Euclid: 11 April 2016

Permanent link to this document
https://projecteuclid.org/euclid.aos/1460381681

Digital Object Identifier
doi:10.1214/15-AOS1371

Mathematical Reviews number (MathSciNet)
MR3485948

Zentralblatt MATH identifier
1341.62061

Subjects
Primary: 62F03: Hypothesis testing 62J07: Ridge regression; shrinkage estimators
Secondary: 62E15: Exact distribution theory

Keywords
Lasso confidence interval hypothesis test model selection

Citation

Lee, Jason D.; Sun, Dennis L.; Sun, Yuekai; Taylor, Jonathan E. Exact post-selection inference, with application to the lasso. Ann. Statist. 44 (2016), no. 3, 907--927. doi:10.1214/15-AOS1371. https://projecteuclid.org/euclid.aos/1460381681


Export citation

References

  • Benjamini, Y., Heller, R. and Yekutieli, D. (2009). Selective inference in complex research. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4255–4271.
  • Benjamini, Y. and Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters. J. Amer. Statist. Assoc. 100 71–93.
  • Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837.
  • Cox, D. R. (1975). A note on data-splitting for the evaluation of significance levels. Biometrika 62 441–444.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Fisher, R. (1956). On a test of significance in Pearson’s Biometrika Tables (No. 11). J. Roy. Statist. Soc. Ser. B. 18 56–60.
  • Fithian, W., Sun, D. and Taylor, J. (2014). Optimal inference after model selection. Preprint. Available at arXiv:1410.2597.
  • Javanmard, A. and Montanari, A. (2013). Confidence intervals and hypothesis testing for high-dimensional regression. Preprint. Available at arXiv:1306.3171.
  • Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
  • Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21–59.
  • Leeb, H. and Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist. 34 2554–2591.
  • Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • Lockhart, R., Taylor, J., Tibshirani, R. and Tibshirani, R. (2014). A significance test for the lasso (with discussion). Ann. Statist. 42 413–468.
  • Miller, A. (2002). Subset Selection in Regression, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
  • Pötscher, B. M. (1991). Effects of model selection on inference. Econometric Theory 7 163–185.
  • Pötscher, B. M. and Schneider, U. (2010). Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4 334–360.
  • Robinson, G. K. (1979). Conditional properties of statistical procedures. Ann. Statist. 7 742–755.
  • Sampson, A. R. and Sill, M. W. (2005). Drop-the-losers design: Normal case. Biom. J. 47 257–268.
  • Sill, M. W. and Sampson, A. R. (2009). Drop-the-losers design: Binomial case. Comput. Statist. Data Anal. 53 586–595.
  • Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the $q$-value. Ann. Statist. 31 2013–2035.
  • Taylor, J., Lockhart, R., Tibshirani, R. J. and Tibshirani, R. (2014). Post-selection adaptive inference for least angle regression and the lasso. Preprint. Available at arXiv:1401.3889.
  • Tian, X. and Taylor, J. (2015). Asymptotics of selective inference. Preprint. Available at arXiv:1501.03588.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tibshirani, R. J. (2013). The lasso problem and uniqueness. Electron. J. Stat. 7 1456–1490.
  • Tibshirani, R. J., Rinaldo, A., Tibshirani, R. and Wasserman, L. (2015). Uniform asymptotic inference and the bootstrap after model selection. Preprint. Available at arXiv:1506.06266.
  • van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. Preprint. Available at arXiv:1303.0518.
  • Weinstein, A., Fithian, W. and Benjamini, Y. (2013). Selection adjusted confidence intervals with more power to determine the sign. J. Amer. Statist. Assoc. 108 165–176.
  • Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.
  • Zhong, H. and Prentice, R. L. (2008). Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9 621–634.
  • Zollner, S. and Pritchard, J. K. (2007). Overcoming the winner’s curse: Estimating penetrance parameters from case-control data. Am. J. Hum. Genet. 80 605–615.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.