Statistical Science

On Various Confidence Intervals Post-Model-Selection

Hannes Leeb, Benedikt M. Pötscher, and Karl Ewald

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We compare several confidence intervals after model selection in the setting recently studied by Berk et al. [ Ann. Statist. 41 (2013) 802–837], where the goal is to cover not the true parameter but a certain nonstandard quantity of interest that depends on the selected model. In particular, we compare the PoSI-intervals that are proposed in that reference with the “naive” confidence interval, which is constructed as if the selected model were correct and fixed a priori (thus ignoring the presence of model selection). Overall, we find that the actual coverage probabilities of all these intervals deviate only moderately from the desired nominal coverage probability. This finding is in stark contrast to several papers in the existing literature, where the goal is to cover the true parameter.

Article information

Source
Statist. Sci. Volume 30, Number 2 (2015), 216-227.

Dates
First available in Project Euclid: 3 June 2015

Permanent link to this document
https://projecteuclid.org/euclid.ss/1433341479

Digital Object Identifier
doi:10.1214/14-STS507

Mathematical Reviews number (MathSciNet)
MR3353104

Zentralblatt MATH identifier
1332.62154

Keywords
Confidence intervals model selection nonstandard coverage target AIC BIC Lasso

Citation

Leeb, Hannes; Pötscher, Benedikt M.; Ewald, Karl. On Various Confidence Intervals Post-Model-Selection. Statist. Sci. 30 (2015), no. 2, 216--227. doi:10.1214/14-STS507. https://projecteuclid.org/euclid.ss/1433341479.


Export citation

References

  • Andrews, D. W. K. and Guggenberger, P. (2009). Hybrid and size-corrected subsampling methods. Econometrica 77 721–762.
  • Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837.
  • Bickel, P. J. and Doksum, K. A. (1977). Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day, Oakland, CA.
  • Brown, L. (1967). The conditional level of Student’s $t$ test. Ann. Math. Stat. 38 1068–1071.
  • Buehler, R. J. and Feddersen, A. P. (1963). Note on a conditional property of Student’s $t$. Ann. Math. Stat. 34 1098–1100.
  • Craven, P. and Wahba, G. (1978/79). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377–403.
  • Dijkstra, T. K. and Veldkamp, J. H. (1988). Data-driven selection of regressors and the bootstrap. In Lecture Notes in Econom. and Math. Systems 307 17–38. Springer, New York.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Ewald, K. (2012). On the influence of model selection on confidence regions for marginal associations in the linear model. Master’s thesis, Univ. Vienna.
  • Kabaila, P. (1998). Valid confidence intervals in regression after variable selection. Econometric Theory 14 463–482.
  • Kabaila, P. (2009). The coverage properties of confidence regions after model selection. Int. Stat. Rev. 77 405–414.
  • Kabaila, P. and Leeb, H. (2006). On the large-sample minimal coverage probability of confidence intervals after model selection. J. Amer. Statist. Assoc. 101 619–629.
  • Leeb, H. (2006). The distribution of a linear predictor after model selection: Unconditional finite-sample distributions and asymptotic approximations. In Optimality. Institute of Mathematical Statistics Lecture Notes—Monograph Series 49 291–311. IMS, Beachwood, OH.
  • Leeb, H. (2008). Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process. Bernoulli 14 661–690.
  • Leeb, H. and Pötscher, B. M. (2003). The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19 100–142.
  • Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21–59.
  • Leeb, H. and Pötscher, B. M. (2006a). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist. 34 2554–2591.
  • Leeb, H. and Pötscher, B. M. (2006b). Performance limits for estimators of the risk or distribution of shrinkage-type estimators, and some general lower risk-bound results. Econometric Theory 22 69–97.
  • Leeb, H. and Pötscher, B. M. (2008a). Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory 24 338–376.
  • Leeb, H. and Pötscher, B. M. (2008b). Model selection. In Handbook of Financial Time Series (T. G. Andersen, R. A. Davis, J.-P. Kreiß and Th. Mikosch, eds.) 785–821. Springer, New York.
  • Olshen, R. A. (1973). The conditional level of the $F$-test. J. Amer. Statist. Assoc. 68 692–698.
  • Pötscher, B. M. (1991). Effects of model selection on inference. Econometric Theory 7 163–185.
  • Pötscher, B. M. (2006). The distribution of model averaging estimators and an impossibility result regarding its estimation. In Time Series and Related Topics. Institute of Mathematical Statistics Lecture Notes—Monograph Series 52 113–129. IMS, Beachwood, OH.
  • Pötscher, B. M. (2009). Confidence sets based on sparse estimators are necessarily large. Sankhyā 71 1–18.
  • Pötscher, B. M. and Leeb, H. (2009). On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. J. Multivariate Anal. 100 2065–2082.
  • Pötscher, B. M. and Schneider, U. (2009). On the distribution of the adaptive LASSO estimator. J. Statist. Plann. Inference 139 2775–2790.
  • Pötscher, B. M. and Schneider, U. (2010). Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4 334–360.
  • Pötscher, B. M. and Schneider, U. (2011). Distributional results for thresholding estimators in high-dimensional Gaussian regression models. Electron. J. Stat. 5 1876–1934.
  • Rawlings, J. O., Pantula, S. G. and Dickey, D. A. (1998). Applied Regression Analysis: A Research Tool, 2nd ed. Springer, New York.
  • Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann. Statist. 7 1019–1033.
  • Sen, P. K. and Saleh, A. K. M. E. (1987). On preliminary test and shrinkage $M$-estimation in linear models. Ann. Statist. 15 1580–1592.
  • Tukey, J. W. (1967). Discussion of “Topics in the investigation of linear relations fitted by the method of least squares” by F. J. Anscombe. J. Roy. Statist. Soc. Ser. B 29 47–48.