This paper presents R2PCA, a random consensus method for robust principal component analysis. R2PCA takes RANSAC’s principle of using as little data as possible one step further. It iteratively selects small subsets of the data to identify pieces of the principal components, to then stitch them together. We show that if the principal components are in general position and the errors are sufficiently sparse, R2PCA will exactly recover the principal components with probability $1$, in lieu of assumptions on coherence or the distribution of the sparse errors, and even under adversarial settings. R2PCA enjoys many advantages: it works well under noise, its computational complexity scales linearly in the ambient dimension, it is easily parallelizable, and due to its low sample complexity, it can be used in settings where data is so large it cannot even be stored in memory. We complement our theoretical findings with synthetic and real data experiments showing that R2PCA outperforms state-of-the-art methods in a broad range of settings.
"Random consensus robust PCA." Electron. J. Statist. 11 (2) 5232 - 5253, 2017. https://doi.org/10.1214/17-EJS1377SI