Open Access
June 2019 Sparse principal component analysis with missing observations
Seyoung Park, Hongyu Zhao
Ann. Appl. Stat. 13(2): 1016-1042 (June 2019). DOI: 10.1214/18-AOAS1220

Abstract

Principal component analysis (PCA) is a commonly used statistical method in a wide range of applications. However, it does not work well when the number of features is larger than the sample size. We consider the estimation of the sparse principal subspace in the high dimensional setting with missing data motivated by the analysis of single-cell RNA sequence data. We propose a two step estimation procedure, and establish the rates of convergence for estimating the principal subspace. Simulated examples with various missing mechanisms show its competitive performance compared to existing sparse PCA methods. We apply the method to single-cell data and show that the proposed method can better distinguish cell types than other PCA methods.

Citation

Download Citation

Seyoung Park. Hongyu Zhao. "Sparse principal component analysis with missing observations." Ann. Appl. Stat. 13 (2) 1016 - 1042, June 2019. https://doi.org/10.1214/18-AOAS1220

Information

Received: 1 June 2017; Revised: 1 September 2018; Published: June 2019
First available in Project Euclid: 17 June 2019

zbMATH: 1423.62057
MathSciNet: MR3963561
Digital Object Identifier: 10.1214/18-AOAS1220

Keywords: high dimensional , missing data , PCA , single-cell data

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.13 • No. 2 • June 2019
Back to Top