On Various Confidence Intervals Post-Model-Selection

Hannes Leeb; Benedikt M. Pötscher; Karl Ewald

doi:10.1214/14-STS507

May 2015 On Various Confidence Intervals Post-Model-Selection

Hannes Leeb, Benedikt M. Pötscher, Karl Ewald

Statist. Sci. 30(2): 216-227 (May 2015). DOI: 10.1214/14-STS507

Abstract

We compare several confidence intervals after model selection in the setting recently studied by Berk et al. [ Ann. Statist. 41 (2013) 802–837], where the goal is to cover not the true parameter but a certain nonstandard quantity of interest that depends on the selected model. In particular, we compare the PoSI-intervals that are proposed in that reference with the “naive” confidence interval, which is constructed as if the selected model were correct and fixed a priori (thus ignoring the presence of model selection). Overall, we find that the actual coverage probabilities of all these intervals deviate only moderately from the desired nominal coverage probability. This finding is in stark contrast to several papers in the existing literature, where the goal is to cover the true parameter.

References

1.

Andrews, D. W. K. and Guggenberger, P. (2009). Hybrid and size-corrected subsampling methods. Econometrica 77 721–762. MR2531360 10.3982/ECTA7015Andrews, D. W. K. and Guggenberger, P. (2009). Hybrid and size-corrected subsampling methods. Econometrica 77 721–762. MR2531360 10.3982/ECTA7015

2.

Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837. MR3099122 10.1214/12-AOS1077 euclid.aos/1369836961 Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837. MR3099122 10.1214/12-AOS1077 euclid.aos/1369836961

3.

Bickel, P. J. and Doksum, K. A. (1977). Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day, Oakland, CA. MR443141Bickel, P. J. and Doksum, K. A. (1977). Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day, Oakland, CA. MR443141

4.

Brown, L. (1967). The conditional level of Student’s $t$ test. Ann. Math. Stat. 38 1068–1071. MR214210 10.1214/aoms/1177698776 euclid.aoms/1177698776 Brown, L. (1967). The conditional level of Student’s $t$ test. Ann. Math. Stat. 38 1068–1071. MR214210 10.1214/aoms/1177698776 euclid.aoms/1177698776

5.

Buehler, R. J. and Feddersen, A. P. (1963). Note on a conditional property of Student’s $t$. Ann. Math. Stat. 34 1098–1100. MR150864 10.1214/aoms/1177704034 euclid.aoms/1177704034 Buehler, R. J. and Feddersen, A. P. (1963). Note on a conditional property of Student’s $t$. Ann. Math. Stat. 34 1098–1100. MR150864 10.1214/aoms/1177704034 euclid.aoms/1177704034

6.

Craven, P. and Wahba, G. (1978/79). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377–403. MR516581 10.1007/BF01404567Craven, P. and Wahba, G. (1978/79). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377–403. MR516581 10.1007/BF01404567

7.

Dijkstra, T. K. and Veldkamp, J. H. (1988). Data-driven selection of regressors and the bootstrap. In Lecture Notes in Econom. and Math. Systems 307 17–38. Springer, New York.Dijkstra, T. K. and Veldkamp, J. H. (1988). Data-driven selection of regressors and the bootstrap. In Lecture Notes in Econom. and Math. Systems 307 17–38. Springer, New York.

8.

Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499. MR2060166 10.1214/009053604000000067 euclid.aos/1083178935 Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499. MR2060166 10.1214/009053604000000067 euclid.aos/1083178935

9.

Ewald, K. (2012). On the influence of model selection on confidence regions for marginal associations in the linear model. Master’s thesis, Univ. Vienna.Ewald, K. (2012). On the influence of model selection on confidence regions for marginal associations in the linear model. Master’s thesis, Univ. Vienna.

10.

Kabaila, P. (1998). Valid confidence intervals in regression after variable selection. Econometric Theory 14 463–482. MR1650037 10.1017/S0266466698144031Kabaila, P. (1998). Valid confidence intervals in regression after variable selection. Econometric Theory 14 463–482. MR1650037 10.1017/S0266466698144031

11.

Kabaila, P. (2009). The coverage properties of confidence regions after model selection. Int. Stat. Rev. 77 405–414.Kabaila, P. (2009). The coverage properties of confidence regions after model selection. Int. Stat. Rev. 77 405–414.

12.

Kabaila, P. and Leeb, H. (2006). On the large-sample minimal coverage probability of confidence intervals after model selection. J. Amer. Statist. Assoc. 101 619–629. MR2256178 10.1198/016214505000001140Kabaila, P. and Leeb, H. (2006). On the large-sample minimal coverage probability of confidence intervals after model selection. J. Amer. Statist. Assoc. 101 619–629. MR2256178 10.1198/016214505000001140

13.

Leeb, H. (2006). The distribution of a linear predictor after model selection: Unconditional finite-sample distributions and asymptotic approximations. In Optimality. Institute of Mathematical Statistics Lecture Notes—Monograph Series 49 291–311. IMS, Beachwood, OH. MR2338549 10.1214/074921706000000518Leeb, H. (2006). The distribution of a linear predictor after model selection: Unconditional finite-sample distributions and asymptotic approximations. In Optimality. Institute of Mathematical Statistics Lecture Notes—Monograph Series 49 291–311. IMS, Beachwood, OH. MR2338549 10.1214/074921706000000518

14.

Leeb, H. (2008). Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process. Bernoulli 14 661–690. MR2537807 10.3150/08-BEJ127 euclid.bj/1219669625 Leeb, H. (2008). Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process. Bernoulli 14 661–690. MR2537807 10.3150/08-BEJ127 euclid.bj/1219669625

15.

Leeb, H. and Pötscher, B. M. (2003). The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19 100–142. MR1965844 10.1017/S0266466603191050Leeb, H. and Pötscher, B. M. (2003). The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19 100–142. MR1965844 10.1017/S0266466603191050

16.

Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21–59. MR2153856 10.1017/S0266466605050036Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21–59. MR2153856 10.1017/S0266466605050036

17.

Leeb, H. and Pötscher, B. M. (2006a). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist. 34 2554–2591. MR2291510 10.1214/009053606000000821 euclid.aos/1169571807 Leeb, H. and Pötscher, B. M. (2006a). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist. 34 2554–2591. MR2291510 10.1214/009053606000000821 euclid.aos/1169571807

18.

Leeb, H. and Pötscher, B. M. (2006b). Performance limits for estimators of the risk or distribution of shrinkage-type estimators, and some general lower risk-bound results. Econometric Theory 22 69–97. MR2212693 10.1017/S0266466606060038Leeb, H. and Pötscher, B. M. (2006b). Performance limits for estimators of the risk or distribution of shrinkage-type estimators, and some general lower risk-bound results. Econometric Theory 22 69–97. MR2212693 10.1017/S0266466606060038

19.

Leeb, H. and Pötscher, B. M. (2008a). Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory 24 338–376. MR2422862Leeb, H. and Pötscher, B. M. (2008a). Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory 24 338–376. MR2422862

20.

Leeb, H. and Pötscher, B. M. (2008b). Model selection. In Handbook of Financial Time Series (T. G. Andersen, R. A. Davis, J.-P. Kreiß and Th. Mikosch, eds.) 785–821. Springer, New York.Leeb, H. and Pötscher, B. M. (2008b). Model selection. In Handbook of Financial Time Series (T. G. Andersen, R. A. Davis, J.-P. Kreiß and Th. Mikosch, eds.) 785–821. Springer, New York.

21.

Olshen, R. A. (1973). The conditional level of the $F$-test. J. Amer. Statist. Assoc. 68 692–698. MR359198Olshen, R. A. (1973). The conditional level of the $F$-test. J. Amer. Statist. Assoc. 68 692–698. MR359198

22.

Pötscher, B. M. (1991). Effects of model selection on inference. Econometric Theory 7 163–185. MR1128410 10.1017/S0266466600004382Pötscher, B. M. (1991). Effects of model selection on inference. Econometric Theory 7 163–185. MR1128410 10.1017/S0266466600004382

23.

Pötscher, B. M. (2006). The distribution of model averaging estimators and an impossibility result regarding its estimation. In Time Series and Related Topics. Institute of Mathematical Statistics Lecture Notes—Monograph Series 52 113–129. IMS, Beachwood, OH. MR2427842Pötscher, B. M. (2006). The distribution of model averaging estimators and an impossibility result regarding its estimation. In Time Series and Related Topics. Institute of Mathematical Statistics Lecture Notes—Monograph Series 52 113–129. IMS, Beachwood, OH. MR2427842

24.

Pötscher, B. M. (2009). Confidence sets based on sparse estimators are necessarily large. Sankhyā 71 1–18. MR2579644Pötscher, B. M. (2009). Confidence sets based on sparse estimators are necessarily large. Sankhyā 71 1–18. MR2579644

25.

Pötscher, B. M. and Leeb, H. (2009). On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. J. Multivariate Anal. 100 2065–2082. MR2543087 10.1016/j.jmva.2009.06.010Pötscher, B. M. and Leeb, H. (2009). On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. J. Multivariate Anal. 100 2065–2082. MR2543087 10.1016/j.jmva.2009.06.010

26.

Pötscher, B. M. and Schneider, U. (2009). On the distribution of the adaptive LASSO estimator. J. Statist. Plann. Inference 139 2775–2790. MR2523666 10.1016/j.jspi.2009.01.003Pötscher, B. M. and Schneider, U. (2009). On the distribution of the adaptive LASSO estimator. J. Statist. Plann. Inference 139 2775–2790. MR2523666 10.1016/j.jspi.2009.01.003

27.

Pötscher, B. M. and Schneider, U. (2010). Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4 334–360. MR2645488 10.1214/09-EJS523 euclid.ejs/1268655653 Pötscher, B. M. and Schneider, U. (2010). Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4 334–360. MR2645488 10.1214/09-EJS523 euclid.ejs/1268655653

28.

Pötscher, B. M. and Schneider, U. (2011). Distributional results for thresholding estimators in high-dimensional Gaussian regression models. Electron. J. Stat. 5 1876–1934. MR2970179 10.1214/11-EJS659 euclid.ejs/1325264852 Pötscher, B. M. and Schneider, U. (2011). Distributional results for thresholding estimators in high-dimensional Gaussian regression models. Electron. J. Stat. 5 1876–1934. MR2970179 10.1214/11-EJS659 euclid.ejs/1325264852

29.

Rawlings, J. O., Pantula, S. G. and Dickey, D. A. (1998). Applied Regression Analysis: A Research Tool, 2nd ed. Springer, New York. MR1631919Rawlings, J. O., Pantula, S. G. and Dickey, D. A. (1998). Applied Regression Analysis: A Research Tool, 2nd ed. Springer, New York. MR1631919

30.

Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann. Statist. 7 1019–1033. MR536504 10.1214/aos/1176344785 euclid.aos/1176344785 Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann. Statist. 7 1019–1033. MR536504 10.1214/aos/1176344785 euclid.aos/1176344785

31.

Sen, P. K. and Saleh, A. K. M. E. (1987). On preliminary test and shrinkage $M$-estimation in linear models. Ann. Statist. 15 1580–1592. MR913575 10.1214/aos/1176350611 euclid.aos/1176350611 Sen, P. K. and Saleh, A. K. M. E. (1987). On preliminary test and shrinkage $M$-estimation in linear models. Ann. Statist. 15 1580–1592. MR913575 10.1214/aos/1176350611 euclid.aos/1176350611

32.

Tukey, J. W. (1967). Discussion of “Topics in the investigation of linear relations fitted by the method of least squares” by F. J. Anscombe. J. Roy. Statist. Soc. Ser. B 29 47–48. MR212941Tukey, J. W. (1967). Discussion of “Topics in the investigation of linear relations fitted by the method of least squares” by F. J. Anscombe. J. Roy. Statist. Soc. Ser. B 29 47–48. MR212941

Citation Download Citation

Hannes Leeb, Benedikt M. Pötscher, and Karl Ewald "On Various Confidence Intervals Post-Model-Selection," Statistical Science 30(2), 216-227, (May 2015). https://doi.org/10.1214/14-STS507

Published: May 2015

Access the abstract

JOURNAL ARTICLE
12 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY