## The Annals of Statistics

### Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics

#### Abstract

Perturbation bounds for singular spaces, in particular Wedin’s $\mathop{\mathrm{sin}}\nolimits \Theta$ theorem, are a fundamental tool in many fields including high-dimensional statistics, machine learning and applied mathematics. In this paper, we establish separate perturbation bounds, measured in both spectral and Frobenius $\mathop{\mathrm{sin}}\nolimits \Theta$ distances, for the left and right singular subspaces. Lower bounds, which show that the individual perturbation bounds are rate-optimal, are also given.

The new perturbation bounds are applicable to a wide range of problems. In this paper, we consider in detail applications to low-rank matrix denoising and singular space estimation, high-dimensional clustering and canonical correlation analysis (CCA). In particular, separate matching upper and lower bounds are obtained for estimating the left and right singular spaces. To the best of our knowledge, this is the first result that gives different optimal rates for the left and right singular spaces under the same perturbation.

#### Article information

Source
Ann. Statist., Volume 46, Number 1 (2018), 60-89.

Dates
Revised: November 2016
First available in Project Euclid: 22 February 2018

https://projecteuclid.org/euclid.aos/1519268424

Digital Object Identifier
doi:10.1214/17-AOS1541

Mathematical Reviews number (MathSciNet)
MR3766946

Zentralblatt MATH identifier
06865105

#### Citation

Cai, T. Tony; Zhang, Anru. Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. Ann. Statist. 46 (2018), no. 1, 60--89. doi:10.1214/17-AOS1541. https://projecteuclid.org/euclid.aos/1519268424

#### References

• Alzahrani, T. and Horadam, K. J. (2016). Community detection in bipartite networks: Algorithms and case studies. In Complex Systems and Networks. Underst. Complex Syst. 25–50. Springer, Heidelberg.
• Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
• Argyriou, A., Evgeniou, T. and Pontil, M. (2008). Convex multi-task feature learning. Mach. Learn. 73 243–272.
• Azizyan, M., Singh, A. and Wasserman, L. (2013). Minimax theory for high-dimensional Gaussian mixtures with sparse mean separation. In Advances in Neural Information Processing Systems 2139–2147.
• Balakrishnan, S., Xu, M., Krishnamurthy, A. and Singh, A. (2011). Noise thresholds for spectral clustering. In Advances in Neural Information Processing Systems 954–962.
• Benaych-Georges, F. and Nadakuditi, R. R. (2012). The singular values and vectors of low rank perturbations of large rectangular random matrices. J. Multivariate Anal. 111 120–135.
• Borg, I. and Groenen, P. J. (2005). Modern Multidimensional Scaling: Theory and Applications. Springer, Berlin.
• Boutsidis, C., Zouzias, A., Mahoney, M. W. and Drineas, P. (2015). Randomized dimensionality reduction for $k$-means clustering. IEEE Trans. Inform. Theory 61 1045–1062.
• Bura, E. and Pfeiffer, R. (2008). On the distribution of the left singular vectors of a random matrix and its applications. Statist. Probab. Lett. 78 2275–2280.
• Cai, T. T., Li, X. and Ma, Z. (2016). Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. Ann. Statist. 44 2221–2251.
• Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
• Cai, T. T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781–815.
• Cai, T. T and Zhang, A. (2017). Supplement to “Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics.” DOI:10.1214/17-AOS1541SUPP.
• Cai, T., Cai, T. T., Liao, K. and Liu, W. (2015). Large-scale simultaneous testing of cross-covariance matrix with applications to PheWAS. Technical report.
• Candes, E. J. and Plan, Y. (2010). Matrix completion with noise. Proc. IEEE 98 925–936.
• Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
• Candès, E. J., Sing-Long, C. A. and Trzasko, J. D. (2013). Unbiased risk estimates for singular value thresholding and spectral estimators. IEEE Trans. Signal Process. 61 4643–4657.
• Candès, E. J. and Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inform. Theory 56 2053–2080.
• Capitaine, M., Donati-Martin, C. and Féral, D. (2009). The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. Ann. Probab. 37 1–47.
• Chatterjee, S. (2014). Matrix estimation by universal singular value thresholding. Ann. Statist. 43 177–214.
• Chen, M., Gao, C., Ren, Z. and Zhou, H. H. (2013). Sparse CCA via precision adjusted iterative thresholding. Preprint. Available at arXiv:1311.6186.
• Cho, J., Kim, D. and Rohe, K. (2015). Asymptotic theory for estimating the singular vectors and values of a partially-observed low rank matrix with noise. Preprint. Available at arXiv:1508.05431.
• Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1–46.
• Donoho, D. and Gavish, M. (2014). Minimax risk of matrix denoising by singular value thresholding. Ann. Statist. 42 2413–2440.
• Dopico, F. M. (2000). A note on sin $\Theta$ theorems for singular subspace variations. BIT Numerical Mathematics 40 395–403.
• Fan, J., Wang, W. and Zhong, Y. (2016). An $\ell _{\infty}$ eigenvector perturbation bound and its application to robust covariance estimation. Preprint. Available at arXiv:1603.03516.
• Feldman, D., Schmidt, M. and Sohler, C. (2013). Turning big data into tiny data: Constant-size coresets for $k$-means, PCA and projective clustering. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms 1434–1453. SIAM, Philadelphia, PA.
• Gao, C., Ma, Z. and Zhou, H. H. (2014). Sparse CCA: Adaptive estimation and computational barriers. Preprint. Available at arXiv:1409.8565.
• Gao, C., Ma, Z., Ren, Z. and Zhou, H. H. (2015). Minimax estimation in sparse canonical correlation analysis. Ann. Statist. 43 2168–2197.
• Gavish, M. and Donoho, D. L. (2014). The optimal hard threshold for singular values is $4/\sqrt{3}$. IEEE Trans. Inform. Theory 60 5040–5053.
• Goldberg, D., Nichols, D., Oki, B. M. and Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Commun. ACM 35 61–70.
• Gross, D. (2011). Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform. Theory 57 1548–1566.
• Hardoon, D. R., Szedmak, S. and Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16 2639–2664.
• Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed. Springer Series in Statistics. Springer, New York.
• Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28 321–377.
• Jin, J., Ke, Z. T. and Wang, W. (2015). Phase transitions for high dimensional clustering and related problems. Preprint. Available at arXiv:1502.06952.
• Jin, J. and Wang, W. (2016). Influential features PCA for high dimensional clustering (with discussion). Ann. Statist. 44 2323–2359.
• Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
• Keshavan, R. H., Montanari, A. and Oh, S. (2010). Matrix completion from noisy entries. J. Mach. Learn. Res. 11 2057–2078.
• Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237.
• Liu, Z. and Vandenberghe, L. (2009). Interior-point method for nuclear norm approximation with application to system identification. SIAM J. Matrix Anal. Appl. 31 1235–1256.
• Ma, Z. and Li, X. (2016). Subspace perspective on canonical correlation analysis: Dimension reduction and minimax rates. Preprint. Available at arXiv:1605.03662.
• Melamed, D. (2014). Community structures in bipartite networks: A dual-projection approach. PLoS ONE 9 e97823.
• O’Rourke, S., Vu, V. and Wang, K. (2013). Random perturbation of low rank matrices: Improving classical bounds. Preprint. Available at arXiv:1311.2657.
• Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
• Shabalin, A. A. and Nobel, A. B. (2013). Reconstruction of a low-rank matrix in the presence of Gaussian noise. J. Multivariate Anal. 118 67–76.
• Singer, A. and Cucuringu, M. (2010). Uniqueness of low-rank matrix completion by rigidity theory. SIAM J. Matrix Anal. Appl. 31 1621–1641.
• Stewart, G. W. (1991). Perturbation theory for the singular value decomposition. In SVD and Signal Processing II: Algorithms, Analysis and Applications 99–109. Elsevier, Amsterdam.
• Stewart, M. (2006). Perturbation of the SVD in the presence of small singular values. Linear Algebra Appl. 419 53–77.
• Sun, R. and Luo, Z.-Q. (2015). Guaranteed matrix completion via nonconvex factorization. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science—FOCS 2015 270–289. IEEE Computer Soc., Los Alamitos, CA.
• Tao, T. (2012). Topics in Random Matrix Theory. Graduate Studies in Mathematics 132. Amer. Math. Soc., Providence, RI.
• Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
• von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555–586.
• Vu, V. (2011). Singular vectors under random perturbation. Random Structures Algorithms 39 526–538.
• Wang, R. (2015). Singular vector perturbation under Gaussian noise. SIAM J. Matrix Anal. Appl. 36 158–177.
• Wedin, P. (1972). Perturbation bounds in connection with singular value decomposition. BIT 12 99–111.
• Weyl, H. (1912). Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung). Math. Ann. 71 441–479.
• Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 515–534.
• Yang, D., Ma, Z. and Buja, A. (2016). Rate optimal denoising of simultaneously sparse and low rank matrices. J. Mach. Learn. Res. 17 Paper No. 92, 27.
• Yu, Y., Wang, T. and Samworth, R. J. (2015). A useful variant of the Davis–Kahan theorem for statisticians. Biometrika 102 315–323.

#### Supplemental materials

• Supplement to “Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics”. The supplementary material includes the proofs for Theorem 2, Corollary 1, matrix denoising, high-dimensional clustering, canonical correlation analysis and all the technical lemmas.