The Annals of Statistics

Can one estimate the conditional distribution of post-model-selection estimators?

Hannes Leeb and Benedikt M. Pötscher

Full-text: Open access

Abstract

We consider the problem of estimating the conditional distribution of a post-model-selection estimator where the conditioning is on the selected model. The notion of a post-model-selection estimator here refers to the combined procedure resulting from first selecting a model (e.g., by a model selection criterion such as AIC or by a hypothesis testing procedure) and then estimating the parameters in the selected model (e.g., by least-squares or maximum likelihood), all based on the same data set. We show that it is impossible to estimate this distribution with reasonable accuracy even asymptotically. In particular, we show that no estimator for this distribution can be uniformly consistent (not even locally). This follows as a corollary to (local) minimax lower bounds on the performance of estimators for this distribution. Similar impossibility results are also obtained for the conditional distribution of linear functions (e.g., predictors) of the post-model-selection estimator.

Article information

Source
Ann. Statist. Volume 34, Number 5 (2006), 2554-2591.

Dates
First available in Project Euclid: 23 January 2007

Permanent link to this document
http://projecteuclid.org/euclid.aos/1169571807

Digital Object Identifier
doi:10.1214/009053606000000821

Mathematical Reviews number (MathSciNet)
MR2291510

Zentralblatt MATH identifier
1106.62029

Subjects
Primary: 62F10: Point estimation 62F12: Asymptotic properties of estimators 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators 62C05: General considerations

Keywords
Inference after model selection post-model-selection estimator pre-test estimator selection of regressors Akaike’s information criterion AIC thresholding model uncertainty consistency uniform consistency lower risk bound

Citation

Leeb, Hannes; Pötscher, Benedikt M. Can one estimate the conditional distribution of post-model-selection estimators?. Ann. Statist. 34 (2006), no. 5, 2554--2591. doi:10.1214/009053606000000821. http://projecteuclid.org/euclid.aos/1169571807.


Export citation

References

  • Ahmed, S. E. and Basu, A. K. (2000). Least squares, preliminary test and Stein-type estimation in general vector AR$(p)$ models. Statist. Neerlandica 54 47--66.
  • Bauer, P., Pötscher, B. M. and Hackl, P. (1988). Model selection by multiple test procedures. Statistics 19 39--44.
  • Danilov, D. L. and Magnus, J. R. (2004). On the harm that ignoring pre-testing can cause. J. Econometrics 122 27--46.
  • Dijkstra, T. K. and Veldkamp, J. H. (1988). Data-driven selection of regressors and the bootstrap. In On Model Uncertainty and Its Statistical Implications (T. K. Dijkstra, ed.) 17--38. Springer, Berlin.
  • Dukić, V. M. and Peña, E. A. (2005). Variance estimation in a model with Gaussian submodels. J. Amer. Statist. Assoc. 100 296--309.
  • Freedman, D. A., Navidi, W. and Peters, S. C. (1988). On the impact of variable selection in fitting regression equations. In On Model Uncertainty and Its Statistical Implications (T. K. Dijkstra, ed.) 1--16. Springer, Berlin.
  • Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98 879--899.
  • Kabaila, P. (1995). The effect of model selection on confidence regions and prediction regions. Econometric Theory 11 537--549.
  • Kapetanios, G. (2001). Incorporating lag order selection uncertainty in parameter inference for AR models. Econom. Lett. 72 137--144.
  • Leeb, H. (2005). The distribution of a linear predictor after model selection: Conditional finite-sample distributions and asymptotic approximations. J. Statist. Plann. Inference 134 64--89.
  • Leeb, H. (2006). The distribution of a linear predictor after model selection: Unconditional finite-sample distributions and asymptotic approximations. In Optimality: The Second Erich L. Lehmann Symposium (J. Rojo, ed.) 291--311. IMS, Beachwood, OH.
  • Leeb, H. and Pötscher, B. M. (2003). The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19 100--142.
  • Leeb, H. and Pötscher, B. M. (2003). Can one estimate the unconditional distribution of post-model-selection estimators? Working paper, Dept. Statistics, Univ. Vienna.
  • Leeb, H. and Pötscher, B. M. (2003). Can one estimate the conditional distribution of post-model-selection estimators? Working paper, Dept. Statistics, Univ. Vienna.
  • Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21--59.
  • Leeb, H. and Pötscher, B. M. (2006). Performance limits for estimators of the risk or distribution of shrinkage-type estimators, and some general lower risk-bound results. Econometric Theory 22 69--97.
  • Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation, 2nd ed. Springer, New York.
  • Pötscher, B. M. (1991). Effects of model selection on inference. Econometric Theory 7 163--185.
  • Pötscher, B. M. (1995). Comment on ``The effect of model selection on confidence regions and prediction regions,'' by P. Kabaila. Econometric Theory 11 550--559.
  • Pötscher, B. M. and Novák, A. J. (1998). The distribution of estimators after model selection: Large and small sample results. J. Statist. Comput. Simulation 60 19--56.
  • Rao, C. R. and Wu, Y. (2001). On model selection (with discussion). In Model Selection (P. Lahiri, ed.) 1--64. IMS, Beachwood, OH.
  • Robinson, G. K. (1979). Conditional properties of statistical procedures. Ann. Statist. 7 742--755.
  • Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann. Statist. 7 1019--1033.
  • Sen, P. K. and Saleh, A. K. M. E. (1987). On preliminary test and shrinkage $M$-estimation in linear models. Ann. Statist. 15 1580--1592.