This is a study of principal component analysis performed on a statistical sample. We assume that this data sample is made of independent copies of some random variable ranging in a separable real Hilbert space. This covers data in function spaces as well as data represented in reproducing kernel Hilbert spaces. Based on some new inequalities about the perturbation of nonnegative self-adjoint operators, we provide new bounds for the statistical fluctuations of the principal component representation with the draw of the statistical sample.
We suggest two kinds of improvements to decrease these fluctuations: the first is to use a robust estimate of the covariance operator, for which non-asymptotic bounds of the estimation error are available under weak polynomial moment assumptions. The second improvement is to use some modification of the projection on the principal components based on functional calculus applied to the covariance operator. Using this modified projection, we can obtain bounds that do not depend on the spectral gap but on some more favorable factor.
In appendix, we provide a new approach to the analysis of the relative positions of two orthogonal projections that is useful for our proofs and that has an interest of its own.
"Robust PCA and pairs of projections in a Hilbert space." Electron. J. Statist. 11 (2) 3903 - 3926, 2017. https://doi.org/10.1214/17-EJS1343