## Electronic Journal of Statistics

### Optimal equivariant prediction for high-dimensional linear models with arbitrary predictor covariance

Lee H. Dicker

#### Abstract

In a linear model, consider the class of estimators that are equivariant with respect to linear transformations of the predictor basis. Each of these estimators determines an equivariant linear prediction rule. Equivariant prediction rules may be appropriate in settings where sparsity assumptions (like those common in high-dimensional data analysis) are untenable or little is known about the relevance of the given predictor basis, insofar as it relates to the outcome. In this paper, we study the out-of-sample prediction error associated with equivariant estimators in high-dimensional linear models with Gaussian predictors and errors. We show that non-trivial equivariant prediction is impossible when the number of predictors $d$ is greater than the number of observations $n$. For $d/n\to \rho \in[0,1)$, we show that a James-Stein estimator (a scalar multiple of the ordinary least squares estimator) is asymptotically optimal for equivariant out-of-sample prediction, and derive a closed-form expression for its asymptotic predictive risk. Finally, we undertake a detailed comparative analysis involving the proposed James-Stein estimator and other well-known estimators for non-sparse settings, including the ordinary least squares estimator, ridge regression, and other James-Stein estimators for the linear model. Among other things, this comparative analysis sheds light on the role of the population-level predictor covariance matrix and reveals that other previously studied James-Stein estimators for the linear model are sub-optimal in terms of out-of-sample prediction error.

#### Article information

Source
Electron. J. Statist. Volume 7 (2013), 1806-1834.

Dates
First available in Project Euclid: 10 July 2013

https://projecteuclid.org/euclid.ejs/1373461822

Digital Object Identifier
doi:10.1214/13-EJS826

Mathematical Reviews number (MathSciNet)
MR3084672

Zentralblatt MATH identifier
1293.62154

#### Citation

Dicker, Lee H. Optimal equivariant prediction for high-dimensional linear models with arbitrary predictor covariance. Electron. J. Statist. 7 (2013), 1806--1834. doi:10.1214/13-EJS826. https://projecteuclid.org/euclid.ejs/1373461822

#### References

• Baranchik, A. (1973). Inadmissibility of maximum likelihood estimators in some multiple regression problems with three or more independent variables., Annals of Statistics 1 312–321.
• Beran, R. (1996). Stein estimation in high dimensions: A retrospective. In, Research developments in probability and statistics: Festschrift in honor of Madan L. Puri on the occasion of his 65th birthday. VSP International Science Publishers.
• Brandwein, A. and Strawderman, W. (1990). Stein estimation: The spherically symmetric case., Statistical Science 5 356–369.
• Breiman, L. and Freedman, D. (1983). How many variables should be entered in a regression equation?, Journal of the American Statistical Association 78 131–136.
• Brown, L. (1990). An ancillarity paradox which appears in multiple linear regression., Annals of Statistics 18 471–493.
• Bunea, F., Tsybakov, A. and Wegkamp, M. (2007a). Aggregation for gaussian regression., Annals of Statistics 35 1674–1697.
• Bunea, F., Tsybakov, A. and Wegkamp, M. (2007b). Sparsity oracle inequalities for the lasso., Electronic Journal of Statistics 1 169–194.
• Copas, J. (1983). Regression, prediction and shrinkage., Journal of the Royal Statistical Society: Series B (Methodological) 45 311–354.
• Dicker, L. (2013). Ridge regression and optimal dense estimation for high-dimensional linear models., Manuscript.
• Hoerl, A. and Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems., Technometrics 12 55–67.
• Huber, N. and Leeb, H. (2012). Shrinkage estimators for prediction out-of-sample: Conditional performance. ArXiv preprint, arXiv:1209.0899.
• Ismail, M., Lorch, L. and Muldoon, M. (1986). Completely monotonic functions associated with the gamma function and its $q$-analogues., Journal of Mathematical Analysis and Applications 116 1–9.
• James, W. and Stein, C. (1961). Estimation with quadratic loss. In, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press.
• Leeb, H. (2009). Conditional predictive inference post model selection., Annals of Statistics 37 2838–2876.
• Marchand, E. (1993). Estimation of a multivariate mean with constraints on the norm., Canadian Journal of Statistics 21 359–366.
• Muirhead, R. (1982)., Aspects of Multivariate Statistical Theory. John Wiley & Sons, Inc.
• Nussbaum, M. (1999). Minimax risk: Pinsker bound. In, Encyclopedia of Statistical Sciences, vol. 3. Wiley, New York, 451–460.
• Oman, S. (1984). A different empirical Bayes interpretation of ridge and Stein estimators., Journal of the Royal Statistical Society: Series B (Methodological) 544–557.
• Pinsker, M. (1980). Optimal filtration of functions from l2 in gaussian noise., Problems of Information Transmission 16 52–68.
• Stein, C. (1955). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In, Proceedings of the Third Berkeley symposium on mathematical statistics and probability, vol. 1.
• Stein, C. (1960). Multiple regression. In, Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford University Press.
• Sun, T. and Zhang, C. (2012). Scaled sparse linear regression., Biometrika 99 879–898.
• Takada, Y. (1979). A family of minimax estimators in some multiple regression problems., Annals of Statistics 7 1144–1147.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society: Series B (Methodological) 58 267–288.
• Tikhonov, A. (1943). On the stability of inverse problems., Dokl. Akad. Nauk SSSR 39 195–198.