The Annals of Statistics

Normal approximation and concentration of spectral projectors of sample covariance

Vladimir Koltchinskii and Karim Lounici

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Let $X,X_{1},\dots,X_{n}$ be i.i.d. Gaussian random variables in a separable Hilbert space $\mathbb{H}$ with zero mean and covariance operator $\Sigma=\mathbb{E}(X\otimes X)$, and let $\hat{\Sigma}:=n^{-1}\sum_{j=1}^{n}(X_{j}\otimes X_{j})$ be the sample (empirical) covariance operator based on $(X_{1},\dots,X_{n})$. Denote by $P_{r}$ the spectral projector of $\Sigma$ corresponding to its $r$th eigenvalue $\mu_{r}$ and by $\hat{P}_{r}$ the empirical counterpart of $P_{r}$. The main goal of the paper is to obtain tight bounds on

\[\sup_{x\in\mathbb{R}}\vert\mathbb{P} \{\frac{\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}-\mathbb{E}\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}}{\operatorname{Var}^{1/2}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})}\leq x\}-\Phi (x)\vert ,\] where $\Vert \cdot \Vert_{2}$ denotes the Hilbert–Schmidt norm and $\Phi$ is the standard normal distribution function. Such accuracy of normal approximation of the distribution of squared Hilbert–Schmidt error is characterized in terms of so-called effective rank of $\Sigma$ defined as ${\mathbf{r}}(\Sigma)=\frac{\operatorname{tr}(\Sigma)}{\Vert \Sigma \Vert_{\infty}}$, where $\operatorname{tr}(\Sigma)$ is the trace of $\Sigma$ and $\Vert \Sigma \Vert_{\infty}$ is its operator norm, as well as another parameter characterizing the size of $\operatorname{Var}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})$. Other results include nonasymptotic bounds and asymptotic representations for the mean squared Hilbert–Schmidt norm error $\mathbb{E}\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}$ and the variance $\operatorname{Var}(\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2})$, and concentration inequalities for $\Vert \hat{P}_{r}-P_{r}\Vert_{2}^{2}$ around its expectation.

Article information

Source
Ann. Statist., Volume 45, Number 1 (2017), 121-157.

Dates
Received: September 2015
Revised: January 2016
First available in Project Euclid: 21 February 2017

Permanent link to this document
https://projecteuclid.org/euclid.aos/1487667619

Digital Object Identifier
doi:10.1214/16-AOS1437

Mathematical Reviews number (MathSciNet)
MR3611488

Zentralblatt MATH identifier
1367.62175

Subjects
Primary: 62H12: Estimation

Keywords
Sample covariance spectral projectors effective rank principal component analysis concentration inequalities normal approximation perturbation theory

Citation

Koltchinskii, Vladimir; Lounici, Karim. Normal approximation and concentration of spectral projectors of sample covariance. Ann. Statist. 45 (2017), no. 1, 121--157. doi:10.1214/16-AOS1437. https://projecteuclid.org/euclid.aos/1487667619


Export citation

References

  • [1] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
  • [2] Bunea, F. and Xiao, L. (2015). On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. Bernoulli 21 1200–1230.
  • [3] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
  • [4] Dauxois, J., Pousse, A. and Romain, Y. (1982). Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. J. Multivariate Anal. 12 136–154.
  • [5] de Jong, P. (1987). A central limit theorem for generalized quadratic forms. Probab. Theory Related Fields 75 261–277.
  • [6] Eichelsbacher, P. and Thäle, C. (2014). New Berry–Esseen bounds for non-linear functionals of Poisson random measures. Electron. J. Probab. 19 no. 102, 25.
  • [7] Haeusler, E. (1988). On the rate of convergence in the central limit theorem for martingales with discrete and continuous time. Ann. Probab. 16 275–299.
  • [8] Hall, P. (1984). Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal. 14 1–16.
  • [9] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [10] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
  • [11] Kato, T. (1980). Perturbation Theory for Linear Operators. Springer, New York.
  • [12] Kneip, A. and Utikal, K. J. (2001). Inference for density families using functional principal component analysis. J. Amer. Statist. Assoc. 96 519–542.
  • [13] Koltchinskii, V. and Lounici, K. (2016). Asymtotics and concentration bounds for bilinear forms of spectral projectors of sample covariance. Ann. Inst. H. Poincaré Probab. Statist. 52 1976–2013.
  • [14] Koltchinskii, V. and Lounici, K. (2016). Concentration inequalities and moment bounds for sample covariance operators. Bernoulli. To appear. Available at arXiv:1405.2468.
  • [15] Koltchinskii, V. I. (1998). Asymptotics of spectral projections of some random matrices approximating integral operators. In High Dimensional Probability (Oberwolfach, 1996). Progress in Probability 43 191–227. Birkhäuser, Basel.
  • [16] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
  • [17] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer, Berlin.
  • [18] Lounici, K. (2013). Sparse principal component analysis with missing observations. In High Dimensional Probability VI. Progress in Probability 66 327–356. Birkhäuser, Basel.
  • [19] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • [20] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
  • [21] Vu, V. and Lei, J. (2012). Minimax rates of estimation for sparse PCA in high dimensions. J. Mach. Learn. Res. 22 1278–1286.