The Annals of Statistics

Prediction when fitting simple models to high-dimensional data

Lukas Steinberger and Hannes Leeb

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We study linear subset regression in the context of a high-dimensional linear model. Consider $y=\vartheta +\theta 'z+\epsilon $ with univariate response $y$ and a $d$-vector of random regressors $z$, and a submodel where $y$ is regressed on a set of $p$ explanatory variables that are given by $x=M'z$, for some $d\times p$ matrix $M$. Here, “high-dimensional” means that the number $d$ of available explanatory variables in the overall model is much larger than the number $p$ of variables in the submodel. In this paper, we present Pinsker-type results for prediction of $y$ given $x$. In particular, we show that the mean squared prediction error of the best linear predictor of $y$ given $x$ is close to the mean squared prediction error of the corresponding Bayes predictor $\mathbb{E}[y\|x]$, provided only that $p/\log d$ is small. We also show that the mean squared prediction error of the (feasible) least-squares predictor computed from $n$ independent observations of $(y,x)$ is close to that of the Bayes predictor, provided only that both $p/\log d$ and $p/n$ are small. Our results hold uniformly in the regression parameters and over large collections of distributions for the design variables $z$.

Article information

Ann. Statist., Volume 47, Number 3 (2019), 1408-1442.

Received: May 2016
Revised: April 2017
First available in Project Euclid: 13 February 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H99: None of the above, but in this section
Secondary: 62F99: None of the above, but in this section 62G99: None of the above, but in this section

Pinsker theorem best linear predictor Bayes predictor linear subset regression non-Gaussian data high-dimensional models small sample size


Steinberger, Lukas; Leeb, Hannes. Prediction when fitting simple models to high-dimensional data. Ann. Statist. 47 (2019), no. 3, 1408--1442. doi:10.1214/18-AOS1719.

Export citation


  • Abadie, A., Imbens, G. W. and Zheng, F. (2014). Inference for misspecified models with fixed regressors. J. Amer. Statist. Assoc. 109 1601–1614.
  • Bachoc, F., Leeb, H. and Pötscher, B. M. (2015). Valid confidence intervals for post-model-selection prediction. Arxiv preprint. Available at arXiv:1412.4605.
  • Beran, R. and Dümbgen, L. (1998). Modulation of estimators and confidence sets. Ann. Statist. 26 1826–1856.
  • Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837.
  • Brannath, W. and Scharpenberg, M. (2014). Interpretation of linear regression coefficients under mean model miss-specification. Arxiv preprint. Available at arXiv:1409.8544.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Springer, Heidelberg.
  • Buja, A. R., Brown, L. D., George, E., Pitkin, E., Traskin, M., Zhan, K. and Zhao, L. (2014). A conspiracy of random predictors and model violations against classical inference in regression. Arxiv preprint. Available at arXiv:1404.1578.
  • Diaconis, P. and Freedman, D. (1984). Asymptotics of graphical projection pursuit. Ann. Statist. 12 793–815.
  • Dümbgen, L. and Del Conte-Zerial, P. (2013). On low-dimensional projections of high-dimensional distributions. In From Probability to Statistics and Back: High-Dimensional Models and Processes. Inst. Math. Stat. (IMS) Collect. 9 91–104. IMS, Beachwood, OH.
  • Eaton, M. L. (1986). A characterization of spherical distributions. J. Multivariate Anal. 20 272–276.
  • El Karoui, N. (2010). The spectrum of kernel random matrices. Ann. Statist. 38 1–50.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Hall, P. and Li, K.-C. (1993). On almost linearity of low-dimensional projections from high-dimensional data. Ann. Statist. 21 867–889.
  • Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics 221–233. Univ. California Press, Berkeley, CA.
  • Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2016). Exact post-selection inference, with application to the lasso. Ann. Statist. 44 907–927.
  • Leeb, H. (2008). Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process. Bernoulli 14 661–690.
  • Leeb, H. (2009). Conditional predictive inference post model selection. Ann. Statist. 37 2838–2876.
  • Leeb, H. (2013). On the conditional distributions of low-dimensional projections from high-dimensional data. Ann. Statist. 41 464–483.
  • Leeb, H., Pötscher, B. M. and Ewald, K. (2015). On various confidence intervals post-model-selection. Statist. Sci. 30 216–227.
  • Pinsker, M. S. (1980). Optimal filtration of square-integrable signals in Gaussian noise. Probl. Inf. Transm. 16 120–133.
  • Rosenthal, H. P. (1970). On the subspaces of $L^{p}$ ($p>2$) spanned by sequences of independent random variables. Israel J. Math. 8 273–303.
  • Srivastava, N. and Vershynin, R. (2013). Covariance estimation for distributions with $2+\varepsilon$ moments. Ann. Probab. 41 3081–3111.
  • Steinberger, L. (2015). Statistical inference in high-dimensional linear regression based on simple working models. Ph.D. thesis, Univ. Vienna.
  • Steinberger, L. and Leeb, H. (2018). On conditional moments of high-dimensional random vectors given lower-dimensional projections. Bernoulli 24 565–591.
  • Taylor, J., Lockhart, R., Tibshirani, R. J. and Tibshirani, R. (2014). Exact post-selection inference for forward stepwise least angle regression. Arxiv preprint. Available at arXiv:1401.3889.