Open Access
April 2024 Inference for heteroskedastic PCA with missing data
Yuling Yan, Yuxin Chen, Jianqing Fan
Author Affiliations +
Ann. Statist. 52(2): 729-756 (April 2024). DOI: 10.1214/24-AOS2366


This paper studies how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly underexplored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general difficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a novel approach to performing valid inference on the principal subspace, on the basis of an estimator called HeteroPCA (Ann. Statist. 50 (2022b) 53–80). We develop nonasymptotic distributional guarantees for HeteroPCA, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Our inference procedures are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels.

Funding Statement

Y. Chen is supported in part by the Alfred P. Sloan Research Fellowship, the Google Research Scholar Award, the AFOSR YIP award FA9550-19-1-0030, by the ONR grant N00014-19-1-2120, by the ARO grant W911NF-20-1-0097 and by the NSF grants CCF-1907661, IIS-2218713, IIS-2218773, DMS-2014279, CCF-2221009.
J. Fan is supported in part by the ONR grants N00014-19-1-2120, N00014-22-1-2340, by the NSF grants DMS-1662139, DMS-1712591, DMS-2052926, DMS-2053832, DMS-2210833 and by the NIH grant 2R01-GM072611-15.
Y. Yan is supported in part by the Charlotte Elizabeth Procter Honorific Fellowship from Princeton University and the Norbert Wiener Postdoctoral Fellowship from MIT.


Yuxin Chen is the corresponding author.


Download Citation

Yuling Yan. Yuxin Chen. Jianqing Fan. "Inference for heteroskedastic PCA with missing data." Ann. Statist. 52 (2) 729 - 756, April 2024.


Received: 1 May 2022; Revised: 1 January 2024; Published: April 2024
First available in Project Euclid: 9 May 2024

Digital Object Identifier: 10.1214/24-AOS2366

Primary: 62H25
Secondary: 62E17

Keywords: Confidence regions , heteroskedastic data , missing data , Principal Component Analysis , subspace estimation , uncertainty quantification

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.52 • No. 2 • April 2024
Back to Top