Open Access
December 2018 $e$PCA: High dimensional exponential family PCA
Lydia T. Liu, Edgar Dobriban, Amit Singer
Ann. Appl. Stat. 12(4): 2121-2150 (December 2018). DOI: 10.1214/18-AOAS1146


Many applications involve large datasets with entries from exponential family distributions. Our main motivating application is photon-limited imaging, where we observe images with Poisson distributed pixels. We focus on X-ray Free Electron Lasers (XFEL), a quickly developing technology whose goal is to reconstruct molecular structure. In XFEL, estimating the principal components of the noiseless distribution is needed for denoising and for structure determination. However, the standard method, Principal Component Analysis (PCA), can be inefficient in non-Gaussian noise.

Motivated by this application, we develop $e$PCA (exponential family PCA), a new methodology for PCA on exponential families. $e$PCA is a fast method that can be used very generally for dimension reduction and denoising of large data matrices with exponential family entries.

We conduct a substantive XFEL data analysis using $e$PCA. We show that $e$PCA estimates the PCs of the distribution of images more accurately than PCA and alternatives. Importantly, it also leads to better denoising. We also provide theoretical justification for our estimator, including the convergence rate and the Marchenko–Pastur law in high dimensions. An open-source implementation is available.


Download Citation

Lydia T. Liu. Edgar Dobriban. Amit Singer. "$e$PCA: High dimensional exponential family PCA." Ann. Appl. Stat. 12 (4) 2121 - 2150, December 2018.


Received: 1 September 2017; Revised: 1 November 2017; Published: December 2018
First available in Project Euclid: 13 November 2018

zbMATH: 07029449
MathSciNet: MR3875695
Digital Object Identifier: 10.1214/18-AOAS1146

Keywords: Denoising , Random matrix theory , shrinkage , XFEL imaging

Rights: Copyright © 2018 Institute of Mathematical Statistics

Vol.12 • No. 4 • December 2018
Back to Top