Abstract
Consider $p \geqq 2$ random variables, and let $A_1, \cdots, A_p$ denote the hyperplanes corresponding to the linear regression of each variable onto the other $(p - 1)$ variables. Let $A_0$ denote the hyperplane which passes through the centroid of the distribution and is spanned by the direction vectors defining the first $(p - 1)$ principal components. A new optimality property of $A_0$ is established; $A_0$ is the best single approximation to $A_1, \cdots, A_p$ when each regression hyperplane is given a certain weighting inversely proportional to the variability associated with its orientation and its prediction rescaling. When $p > 2$ and $k = 1, \cdots, p - 2$, certain $k$-dimensional linear subspaces of $A_0$ are also shown to have regression optimality properties.
Citation
R. L. Obenchain. "Regression Optimality of Principal Components." Ann. Math. Statist. 43 (4) 1317 - 1319, August, 1972. https://doi.org/10.1214/aoms/1177692482
Information