Abstract
Principal component analysis (PCA) is a commonly used statistical method in a wide range of applications. However, it does not work well when the number of features is larger than the sample size. We consider the estimation of the sparse principal subspace in the high dimensional setting with missing data motivated by the analysis of single-cell RNA sequence data. We propose a two step estimation procedure, and establish the rates of convergence for estimating the principal subspace. Simulated examples with various missing mechanisms show its competitive performance compared to existing sparse PCA methods. We apply the method to single-cell data and show that the proposed method can better distinguish cell types than other PCA methods.
Citation
Seyoung Park. Hongyu Zhao. "Sparse principal component analysis with missing observations." Ann. Appl. Stat. 13 (2) 1016 - 1042, June 2019. https://doi.org/10.1214/18-AOAS1220
Information