Open Access
Translator Disclaimer
June 2019 Prediction when fitting simple models to high-dimensional data
Lukas Steinberger, Hannes Leeb
Ann. Statist. 47(3): 1408-1442 (June 2019). DOI: 10.1214/18-AOS1719


We study linear subset regression in the context of a high-dimensional linear model. Consider $y=\vartheta +\theta 'z+\epsilon $ with univariate response $y$ and a $d$-vector of random regressors $z$, and a submodel where $y$ is regressed on a set of $p$ explanatory variables that are given by $x=M'z$, for some $d\times p$ matrix $M$. Here, “high-dimensional” means that the number $d$ of available explanatory variables in the overall model is much larger than the number $p$ of variables in the submodel. In this paper, we present Pinsker-type results for prediction of $y$ given $x$. In particular, we show that the mean squared prediction error of the best linear predictor of $y$ given $x$ is close to the mean squared prediction error of the corresponding Bayes predictor $\mathbb{E}[y\|x]$, provided only that $p/\log d$ is small. We also show that the mean squared prediction error of the (feasible) least-squares predictor computed from $n$ independent observations of $(y,x)$ is close to that of the Bayes predictor, provided only that both $p/\log d$ and $p/n$ are small. Our results hold uniformly in the regression parameters and over large collections of distributions for the design variables $z$.


Download Citation

Lukas Steinberger. Hannes Leeb. "Prediction when fitting simple models to high-dimensional data." Ann. Statist. 47 (3) 1408 - 1442, June 2019.


Received: 1 May 2016; Revised: 1 April 2017; Published: June 2019
First available in Project Euclid: 13 February 2019

zbMATH: 07053513
MathSciNet: MR3911117
Digital Object Identifier: 10.1214/18-AOS1719

Primary: 62H99
Secondary: 62F99 , 62G99

Keywords: Bayes predictor , best linear predictor , high-dimensional models , linear subset regression , non-Gaussian data , Pinsker theorem , small sample size

Rights: Copyright © 2019 Institute of Mathematical Statistics


Vol.47 • No. 3 • June 2019
Back to Top