Abstract
In a linear model, consider the class of estimators that are equivariant with respect to linear transformations of the predictor basis. Each of these estimators determines an equivariant linear prediction rule. Equivariant prediction rules may be appropriate in settings where sparsity assumptions (like those common in high-dimensional data analysis) are untenable or little is known about the relevance of the given predictor basis, insofar as it relates to the outcome. In this paper, we study the out-of-sample prediction error associated with equivariant estimators in high-dimensional linear models with Gaussian predictors and errors. We show that non-trivial equivariant prediction is impossible when the number of predictors $d$ is greater than the number of observations $n$. For $d/n\to \rho \in[0,1)$, we show that a James-Stein estimator (a scalar multiple of the ordinary least squares estimator) is asymptotically optimal for equivariant out-of-sample prediction, and derive a closed-form expression for its asymptotic predictive risk. Finally, we undertake a detailed comparative analysis involving the proposed James-Stein estimator and other well-known estimators for non-sparse settings, including the ordinary least squares estimator, ridge regression, and other James-Stein estimators for the linear model. Among other things, this comparative analysis sheds light on the role of the population-level predictor covariance matrix and reveals that other previously studied James-Stein estimators for the linear model are sub-optimal in terms of out-of-sample prediction error.
Citation
Lee H. Dicker. "Optimal equivariant prediction for high-dimensional linear models with arbitrary predictor covariance." Electron. J. Statist. 7 1806 - 1834, 2013. https://doi.org/10.1214/13-EJS826
Information