Annals of Statistics

Testing for principal component directions under weak identifiability

Abstract

We consider the problem of testing, on the basis of a $p$-variate Gaussian random sample, the null hypothesis $\mathcal{H}_{0}:\boldsymbol{\theta}_{1}=\boldsymbol{\theta}_{1}^{0}$ against the alternative $\mathcal{H}_{1}:\boldsymbol{\theta}_{1}\neq \boldsymbol{\theta}_{1}^{0}$, where $\boldsymbol{\theta}_{1}$ is the “first” eigenvector of the underlying covariance matrix and $\boldsymbol{\theta}_{1}^{0}$ is a fixed unit $p$-vector. In the classical setup where eigenvalues $\lambda_{1}>\lambda_{2}\geq \cdots \geq \lambda_{p}$ are fixed, the Anderson (Ann. Math. Stat. 34 (1963) 122–148) likelihood ratio test (LRT) and the Hallin, Paindaveine and Verdebout (Ann. Statist. 38 (2010) 3245–3299) Le Cam optimal test for this problem are asymptotically equivalent under the null hypothesis, hence also under sequences of contiguous alternatives. We show that this equivalence does not survive asymptotic scenarios where $\lambda_{n1}/\lambda_{n2}=1+O(r_{n})$ with $r_{n}=O(1/\sqrt{n})$. For such scenarios, the Le Cam optimal test still asymptotically meets the nominal level constraint, whereas the LRT severely overrejects the null hypothesis. Consequently, the former test should be favored over the latter one whenever the two largest sample eigenvalues are close to each other. By relying on the Le Cam’s asymptotic theory of statistical experiments, we study the non-null and optimality properties of the Le Cam optimal test in the aforementioned asymptotic scenarios and show that the null robustness of this test is not obtained at the expense of power. Our asymptotic investigation is extensive in the sense that it allows $r_{n}$ to converge to zero at an arbitrary rate. While we restrict to single-spiked spectra of the form $\lambda_{n1}>\lambda_{n2}=\cdots =\lambda_{np}$ to make our results as striking as possible, we extend our results to the more general elliptical case. Finally, we present an illustrative real data example.

Article information

Source
Ann. Statist., Volume 48, Number 1 (2020), 324-345.

Dates
Revised: July 2018
First available in Project Euclid: 17 February 2020

https://projecteuclid.org/euclid.aos/1581930137

Digital Object Identifier
doi:10.1214/18-AOS1805

Mathematical Reviews number (MathSciNet)
MR4065164

Zentralblatt MATH identifier
07196541

Citation

Paindaveine, Davy; Remy, Julien; Verdebout, Thomas. Testing for principal component directions under weak identifiability. Ann. Statist. 48 (2020), no. 1, 324--345. doi:10.1214/18-AOS1805. https://projecteuclid.org/euclid.aos/1581930137

References

• Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Ann. Math. Stat. 34 122–148.
• Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ.
• Atkinson, A. C., Riani, M. and Cerioli, A. (2004). Exploring Multivariate Data with the Forward Search. Springer Series in Statistics. Springer, New York.
• Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
• Bali, J. L., Boente, G., Tyler, D. E. and Wang, J.-L. (2011). Robust functional principal components: A projection-pursuit approach. Ann. Statist. 39 2852–2882.
• Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
• Boente, G. and Fraiman, R. (2000). Kernel-based functional principal components. Statist. Probab. Lett. 48 335–345.
• Burman, P. and Polonik, W. (2009). Multivariate mode hunting: Data analytic tools with measures of significance. J. Multivariate Anal. 100 1198–1218.
• Croux, C. and Haesbroeck, G. (2000). Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. Biometrika 87 603–618.
• Cuevas, A. (2014). A partial overview of the theory of statistics with functional data. J. Statist. Plann. Inference 147 1–23.
• Dufour, J.-M. (1997). Some impossibility theorems in econometrics with applications to structural and dynamic models. Econometrica 65 1365–1387.
• Dufour, J.-M. (2006). Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics. J. Econometrics 133 443–477.
• Flury, B. (1988). Common Principal Components and Related Multivariate Models. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, New York.
• Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A Practical Approach. CRC Press, London.
• Forchini, G. and Hillier, G. (2003). Conditional inference for possibly unidentified structural equations. Econometric Theory 19 707–743.
• Fritz, H., García-Escudero, L. A. and Mayo-Iscar, A. (2012). tclust: An R package for a trimming approach to cluster analysis. J. Stat. Softw. 47.
• Girolami, M. (1999). Self-Organizing Neural Networks. Independent Component Analysis and Blind Source Separation. Springer, London.
• Hallin, M. and Paindaveine, D. (2006). Semiparametrically efficient rank-based inference for shape. I. Optimal rank-based tests for sphericity. Ann. Statist. 34 2707–2756.
• Hallin, M., Paindaveine, D. and Verdebout, T. (2010). Optimal rank-based testing for principal components. Ann. Statist. 38 3245–3299.
• Hallin, M., Paindaveine, D. and Verdebout, T. (2014). Efficient R-estimation of principal and common principal components. J. Amer. Statist. Assoc. 109 1071–1083.
• Han, F. and Liu, H. (2014). Scale-invariant sparse PCA on high-dimensional meta-elliptical data. J. Amer. Statist. Assoc. 109 275–287.
• Härdle, W. and Simar, L. (2007). Applied Multivariate Statistical Analysis, 2nd ed. Springer, Berlin.
• He, R., Hu, B.-G., Zheng, W.-S. and Kong, X.-W. (2011). Robust principal component analysis based on maximum correntropy criterion. IEEE Trans. Image Process. 20 1485–1494.
• Hubert, M., Rousseeuw, P. J. and Vanden Branden, K. (2005). ROBPCA: A new approach to robust principal component analysis. Technometrics 47 64–79.
• Jackson, J. E. (2005). A User’s Guide to Principal Components. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ.
• Jeganathan, P. (1995). Some aspects of asymptotic theory with applications to time series models. Econometric Theory 11 818–887.
• Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
• Jolicoeur, P. (1984). Principal components, factor analysis, and multivariate allometry: A small-sample direction test. Biometrics 40 685–690.
• Koch, I. (2014). Analysis of Multivariate and High-Dimensional Data. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge Univ. Press, New York.
• Magnus, J. R. and Neudecker, H. (2007). Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd ed. Wiley, Chichester.
• Murray, P. M., Browne, R. P. and McNicholas, P. D. (2016). uskewFactors: Model-based clustering via mixtures of unrestricted skew-t sactor analyzer models. R package. Available at https://cran.r-project.org/web/packages/uskewFactors/index.html.
• Paindaveine, D., Remy, J. and Verdebout, T. (2019). Supplement to “Testing for principal component directions under weak identifiability.” https://doi.org/10.1214/18-AOS1805SUPP.
• Paindaveine, D. and Verdebout, T. (2017). Inference on the mode of weak directional signals: A Le Cam perspective on hypothesis testing near singularities. Ann. Statist. 45 800–832.
• Pötscher, B. M. (2002). Lower risk bounds and properties of confidence sets for ill-posed estimation problems with applications to spectral density and persistence estimation, unit roots, and estimation of long memory parameters. Econometrica 70 1035–1065.
• Roussas, G. G. and Bhattacharya, D. (2011). Revisiting local asymptotic normality (LAN) and passing on to local asymptotic mixed normality (LAMN) and local asymptotic quadratic (LAQ) experiments. In Advances in Directional and Linear Statistics (M. T. Wells and A. Sengupta, eds.) 253–280. Physica-Verlag/Springer, Heidelberg.
• Salibián-Barrera, M., Van Aelst, S. and Willems, G. (2006). Principal components analysis based on multivariate MM estimators with fast and robust bootstrap. J. Amer. Statist. Assoc. 101 1198–1211.
• Schwartzman, A., Mascarenhas, W. F. and Taylor, J. E. (2008). Inference for eigenvalues and eigenvectors of Gaussian symmetric matrices. Ann. Statist. 36 2886–2919.
• Shinmura, S. (2016). New Theory of Discriminant Analysis After R. Fisher. Springer, Singapore.
• Sylvester, A. D., Kramer, P. A. and Jungers, W. L. (2008). Modern humans are not (quite) isometric. Amer. J. Phys. Anthropol. 137 371–383.
• Tyler, D. E. (1981). Asymptotic inference for eigenvectors. Ann. Statist. 9 725–736.
• Tyler, D. E. (1983). A class of asymptotic tests for principal component vectors. Ann. Statist. 11 1243–1250.
• van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.

Supplemental materials

• Supplement to “Testing for principal component directions under weak identifiability”. In this supplement, we prove all theoretical results of the present paper.