The Annals of Statistics

Valid confidence intervals for post-model-selection predictors

François Bachoc, Hannes Leeb, and Benedikt M. Pötscher

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We consider inference post-model-selection in linear regression. In this setting, Berk et al. [Ann. Statist. 41 (2013a) 802–837] recently introduced a class of confidence sets, the so-called PoSI intervals, that cover a certain nonstandard quantity of interest with a user-specified minimal coverage probability, irrespective of the model selection procedure that is being used. In this paper, we generalize the PoSI intervals to confidence intervals for post-model-selection predictors.

Article information

Source
Ann. Statist., Volume 47, Number 3 (2019), 1475-1504.

Dates
Received: January 2016
Revised: May 2018
First available in Project Euclid: 13 February 2019

Permanent link to this document
https://projecteuclid.org/euclid.aos/1550026846

Digital Object Identifier
doi:10.1214/18-AOS1721

Mathematical Reviews number (MathSciNet)
MR3911119

Zentralblatt MATH identifier
07053515

Subjects
Primary: 62F25: Tolerance and confidence regions
Secondary: 62J05: Linear regression

Keywords
Inference post-model-selection confidence intervals optimal post-model-selection predictors nonstandard targets linear regression

Citation

Bachoc, François; Leeb, Hannes; Pötscher, Benedikt M. Valid confidence intervals for post-model-selection predictors. Ann. Statist. 47 (2019), no. 3, 1475--1504. doi:10.1214/18-AOS1721. https://projecteuclid.org/euclid.aos/1550026846


Export citation

References

  • Andrews, D. W. K. and Guggenberger, P. (2009). Hybrid and size-corrected subsampling methods. Econometrica 77 721–762.
  • Bachoc, F., Leeb, H. and Pötscher, B. M. (2019). Supplement to “Valid confidence intervals for post-model-selection predictors.” DOI:10.1214/18-AOS1721SUPP.
  • Belloni, A., Chernozhukov, V. and Hansen, C. (2011). Inference for high-dimensional sparse econometric models. In Advances in Economics and Econometrics. 10th World Congress of the Econometric Society, Vol. III 245–295.
  • Belloni, A., Chernozhukov, V. and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81 608–650.
  • Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013a). Valid post-selection inference. Ann. Statist. 41 802–837.
  • Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013b). Valid post-selection inference. Unpublished version. Available at http://www-stat.wharton.upenn.edu/~lzhao/papers/MyPublication/24PoSI-submit.pdf.
  • Castera, L., Chan, H. L. Y., Arrese, M., Afdhal, N., Bedossa, P., Friedrich-Rust, M., Han, K.-H. and Pinzani, M. (2015). EASL-ALEH clinical practice guidelines: Non-invasive tests for evaluation of liver disease severity and prognosis. J. Hepatol. 63 237–264.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fithian, W., Sun, D. and Taylor, J. (2015). Optimal inference after model selection. Available at arXiv:1410.2597.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Jaupi, L. (2014). Variable selection methods for multivariate process monitoring. In Proceedings of the World Congress of Engineering 2014, Vol. II (S. I. Ao, L. Gelman, D. Hukins, A. Hunter and A. M. Korsunsky, eds.) 1116–1120.
  • Kabaila, P. and Leeb, H. (2006). On the large-sample minimal coverage probability of confidence intervals after model selection. J. Amer. Statist. Assoc. 101 619–629.
  • Lee, J. D. and Taylor, J. (2014). Exact post model selection inference for marginal screening. In Advances in Neural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence and K. Q. Weinberger, eds.) 136–144. Curran Associates, Red Hook, NY.
  • Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist. 44 907–927.
  • Leeb, H. (2009). Conditional predictive inference post model selection. Ann. Statist. 37 2838–2876.
  • Leeb, H. and Pötscher, B. M. (2003). The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19 100–142.
  • Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21–59.
  • Leeb, H. and Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist. 34 2554–2591.
  • Leeb, H. and Pötscher, B. M. (2017). Testing in the presence of nuisance parameters: Some comments on tests post-model-selection and random critical values. In Big and Complex Data Analysis (S. E. Ahmed, ed.) 69–82. Springer, Cham.
  • Leeb, H., Pötscher, B. M. and Ewald, K. (2015). On various confidence intervals post-model-selection. Statist. Sci. 30 216–227.
  • Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the lasso. Ann. Statist. 42 413–468.
  • Pötscher, B. M. (2009). Confidence sets based on sparse estimators are necessarily large. Sankhyā 71 1–18.
  • Pötscher, B. M. and Schneider, U. (2010). Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4 334–360.
  • Rawlings, J. O., Pantula, S. G. and Dickey, D. A. (1998). Applied Regression Analysis: A Research Tool, 2nd ed. Springer, New York.
  • Scheffé, H. (1959). The Analysis of Variance. Wiley, New York.
  • Schneider, U. (2016). Confidence sets based on thresholding estimators in high-dimensional Gaussian regression models. Econometric Rev. 35 1412–1455.
  • Souders, T. M. and Stenbakken, G. N. (1991). Cutting the high cost of testing. IEEE Spectrum 28 48–51.
  • Tian, X. and Taylor, J. (2015). Asymptotics of selective inferene. Available at arXiv:1501.03588.
  • Tibshirani, R. J., Rinaldo, A., Tibshirani, R. and Wasserman, L. (2015). Uniform asymptotic inference and the bootstrap after model selection. Available at arXiv:1506.06266.
  • Tibshirani, R. J., Taylor, J., Lockhart, R. and Tibshirani, R. (2016). Exact post-selection inference for sequential regression procedures. J. Amer. Statist. Assoc. 111 600–620.
  • van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
  • Wasserman, L. (2014). Discussion: “A significance test for the lasso” [MR3210970]. Ann. Statist. 42 501–508.
  • Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
  • Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zhang, K. (2013). Rank-extreme association of Gaussian vectors and low-rank detection. Available at arXiv:1306.0623.
  • Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.

Supplemental materials

  • Appendix: Proofs, algorithms, comments, details and extensions. The Appendix contains the following material: comments on the assumptions made on the error variance; proofs of the results given in Sections 2 and 3; additional material for Sections 2 and 3; descriptions of the algorithms for computing the PoSI confidence intervals; details concerning the numerical calculations for Section 4; additional simulation results.