## The Annals of Statistics

### Matrix estimation by Universal Singular Value Thresholding

Sourav Chatterjee

#### Abstract

Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candès and collaborators. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has “a little bit of structure.” Surprisingly, this simple estimator achieves the minimax error rate up to a constant factor. The method is applied to solve problems related to low rank matrix estimation, blockmodels, distance matrix completion, latent space models, positive definite matrix completion, graphon estimation and generalized Bradley–Terry models for pairwise comparison.

#### Article information

Source
Ann. Statist., Volume 43, Number 1 (2015), 177-214.

Dates
First available in Project Euclid: 9 December 2014

https://projecteuclid.org/euclid.aos/1418135619

Digital Object Identifier
doi:10.1214/14-AOS1272

Mathematical Reviews number (MathSciNet)
MR3285604

Zentralblatt MATH identifier
1308.62038

#### Citation

Chatterjee, Sourav. Matrix estimation by Universal Singular Value Thresholding. Ann. Statist. 43 (2015), no. 1, 177--214. doi:10.1214/14-AOS1272. https://projecteuclid.org/euclid.aos/1418135619

#### References

• [1] Achlioptas, D. and McSherry, F. (2001). Fast computation of low rank matrix approximations. In Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing 611–618 (electronic). ACM, New York.
• [2] Adams, E. (2005). Bayesian analysis of linear dominance hierarchies. Animal Behaviour 69 1191–1201.
• [3] Agresti, A. (1990). Categorical Data Analysis. Wiley, New York.
• [4] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
• [5] Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables. J. Multivariate Anal. 11 581–598.
• [6] Alfakih, A. Y., Khandani, A. and Wolkowicz, H. (1999). Solving Euclidean distance matrix completion problems via semidefinite programming. Comput. Optim. Appl. 12 13–30.
• [7] Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097–2122.
• [8] Anderson, G. W., Guionnet, A. and Zeitouni, O. (2010). An Introduction to Random Matrices. Cambridge Studies in Advanced Mathematics 118. Cambridge Univ. Press, Cambridge.
• [9] Austin, T. (2008). On exchangeable random variables and the statistics of large graphs and hypergraphs. Probab. Surv. 5 80–145.
• [10] Azar, Y., Flat, A., Karlin, A., McSherry, F. and Sala, J. (2001). Spectral analysis of data. In Proceedings of the Thirty-third Annual ACM Symposium on Theory of Computing 619–626. ACM, New York.
• [11] Bakonyi, M. and Johnson, C. R. (1995). The Euclidean distance matrix completion problem. SIAM J. Matrix Anal. Appl. 16 646–654.
• [12] Bennett, G. (1962). Probability inequalities for sums of independent random variables. J. Amer. Statist. Assoc. 57 33–45.
• [13] Bernstein, S. (1924). Sur une modification de l’inéqualité de Tchebichef. Annals Science Institute Sav. Ukraine, Sect. Math. I (Russian, French summary.) 38–49.
• [14] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
• [15] Bhatia, R. (2007). Positive Definite Matrices. Princeton Univ. Press, Princeton, NJ.
• [16] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
• [17] Bickel, P. J., Chen, A. and Levina, E. (2011). The method of moments and degree distributions for network models. Ann. Statist. 39 2280–2301.
• [18] Biswas, P., Lian, T.-C., Wang, T.-C. and Ye, Y. (2006). Semidefinite programming based algorithms for sensor network localization. ACM Trans. Sen. Netw. 2 188–220.
• [19] Borg, I. and Groenen, P. J. F. (2005). Modern Multidimensional Scaling: Theory and Applications, 2nd ed. Springer, New York.
• [20] Borgs, C., Chayes, J., Lovász, L., Sós, V. T. and Vesztergombi, K. (2006). Counting graph homomorphisms. In Topics in Discrete Mathematics. Algorithms Combin. 26 315–371. Springer, Berlin.
• [21] Borgs, C., Chayes, J. T., Lovász, L., Sós, V. T. and Vesztergombi, K. (2008). Convergent sequences of dense graphs. I. Subgraph frequencies, metric properties and testing. Adv. Math. 219 1801–1851.
• [22] Borgs, C., Chayes, J. T., Lovász, L., Sós, V. T. and Vesztergombi, K. (2012). Convergent sequences of dense graphs II. Multiway cuts and statistical physics. Ann. of Math. (2) 176 151–219.
• [23] Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324–345.
• [24] Cai, J.-F., Candès, E. J. and Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20 1956–1982.
• [25] Candès, E. J. and Plan, Y. (2010). Matrix completion with noise. Proceedings of the IEEE 98 925–936.
• [26] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
• [27] Candès, E. J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52 489–509.
• [28] Candès, E. J. and Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inform. Theory 56 2053–2080.
• [29] Caron, F. and Doucet, A. (2012). Efficient Bayesian inference for generalized Bradley–Terry models. J. Comput. Graph. Statist. 21 174–196.
• [30] Chatterjee, S. and Diaconis, P. (2013). Estimating and understanding exponential random graph models. Ann. Statist. 41 2428–2461.
• [31] Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435.
• [32] Chatterjee, S. and Varadhan, S. R. S. (2011). The large deviation principle for the Erdős–Rényi random graph. European J. Combin. 32 1000–1017.
• [33] Chatterjee, S. and Varadhan, S. R. S. (2012). Large deviations for random matrices. Commun. Stoch. Anal. 6 1–13.
• [34] Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. J. Mach. Learn. Res. 35 1–23.
• [35] Choi, D. and Wolfe, P. J. (2014). Co-clustering separately exchangeable network data. Ann. Statist. 42 29–63.
• [36] Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika 99 273–284.
• [37] Condon, A. and Karp, R. M. (2001). Algorithms for graph partitioning on the planted partition model. Random Structures Algorithms 18 116–140.
• [38] Davenport, M. A., Plan, Y., van den Berg, E. and Wootters, M. (2012). 1-bit matrix completion. Preprint. Available at arXiv:1209.3672.
• [39] David, H. A. (1988). The Method of Paired Comparisons, 2nd ed. Griffin’s Statistical Monographs & Courses 41. Oxford Univ. Press, London.
• [40] Davidson, R. R. and Farquhar, P. H. (1976). A bibliography on the method of paired comparisons. Biometrics 32 241–252.
• [41] Diaconis, P. (1988). Group Representations in Probability and Statistics. Institute of Mathematical Statistics Lecture Notes—Monograph Series 11. IMS, Hayward, CA.
• [42] Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. (7) 28 33–61.
• [43] Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
• [44] Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200–1224.
• [45] Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, Stanford Univ., Stanford, CA.
• [46] Füredi, Z. and Komlós, J. (1981). The eigenvalues of random symmetric matrices. Combinatorica 1 233–241.
• [47] Gavish, M. and Donoho, D. L. (2014). The optimal hard threshold for singular values is $4/\sqrt3$. IEEE Trans. Inform. Theory 60 5040–5053.
• [48] Gormley, I. C. and Murphy, T. B. (2008). Exploring voting blocs within the Irish electorate: A mixture modeling approach. J. Amer. Statist. Assoc. 103 1014–1027.
• [49] Gormley, I. C. and Murphy, T. B. (2009). A grade of membership model for rank data. Bayesian Anal. 4 265–295.
• [50] Görür, D., Jäkel, F. and Rasmussen, C. E. (2006). A choice model with infinitely many latent features. In Proceedings of the 23rd Annual International Conference on Machine Learning 361–368. ACM, New York.
• [51] Grone, R., Johnson, C. R., de Sá, E. M. and Wolkowicz, H. (1984). Positive definite completions of partial Hermitian matrices. Linear Algebra Appl. 58 109–124.
• [52] Guiver, J. and Snelson, E. (2009). Bayesian inference for Plackett–Luce ranking models. In Proceedings of the 26th Annual International Conference on Machine Learning 377–384. ACM, New York.
• [53] Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301–354.
• [54] Hastie, T. and Tibshirani, R. (1998). Classification by pairwise coupling. Ann. Statist. 26 451–471.
• [55] Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.
• [56] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
• [57] Hoover, D. N. (1982). Row-column exchangeability and a generalized model for probability. In Exchangeability in Probability and Statistics (Rome, 1981) 281–291. North-Holland, Amsterdam.
• [58] Huang, T.-K., Weng, R. C. and Lin, C.-J. (2006). Generalized Bradley–Terry models and multi-class probability estimates. J. Mach. Learn. Res. 7 85–115.
• [59] Hunter, D. R. (2004). MM algorithms for generalized Bradley–Terry models. Ann. Statist. 32 384–406.
• [60] Javanmard, A. and Montanari, A. (2011). Localization from incomplete noisy distance measurements. In 2011 IEEE International Symposium on Information Theory Proceedings (ISIT) 1584–1588. IEEE, New York.
• [61] Johnson, C. R. (1990). Matrix completion problems: A survey. In Matrix Theory and Applications (Phoenix, AZ, 1989). Proc. Sympos. Appl. Math. 40 171–198. Amer. Math. Soc., Providence, RI.
• [62] Keshavan, R. H., Montanari, A. and Oh, S. (2010). Matrix completion from noisy entries. J. Mach. Learn. Res. 11 2057–2078.
• [63] Keshavan, R. H., Montanari, A. and Oh, S. (2010). Matrix completion from a few entries. IEEE Trans. Inform. Theory 56 2980–2998.
• [64] Koltchinskii, V. (2011). Von Neumann entropy penalization and low-rank matrix estimation. Ann. Statist. 39 2936–2973.
• [65] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
• [66] Lange, K., Hunter, D. R. and Yang, I. (2000). Optimization transfer using surrogate objective functions. J. Comput. Graph. Statist. 9 1–59.
• [67] Leskovec, J., Lang, K. J., Dasgupta, A. and Mahoney, M. W. (2008). Statistical properties of community structure in large social and information networks. In Proceeding of the 17th International Conference on World Wide Web 695–704. ACM, Beijing, China.
• [68] Lovász, L. (2012). Large Networks and Graph Limits. American Mathematical Society Colloquium Publications 60. Amer. Math. Soc., Providence, RI.
• [69] Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences. J. Combin. Theory Ser. B 96 933–957.
• [70] Lubetzky, E. and Zhao, Y. (2012). On replica symmetry of large deviations in random graphs. Preprint. Available at arXiv:1210.7013.
• [71] Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley, New York.
• [72] Luce, R. D. (1977). The choice axiom after twenty years. J. Math. Psych. 15 215–233.
• [73] Mazumder, R., Hastie, T. and Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11 2287–2322.
• [74] Mossel, E., Neeman, J. and Sly, A. (2012). Stochastic block models and reconstruction. Preprint. Available at arXiv:1202.1499.
• [75] Nadakuditi, R. R. (2013). OptShrink: An algorithm for improved low-rank signal matrix denoising by optimal, data-driven singular value shrinkage. Preprint. Available at arXiv:1306.6042.
• [76] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069–1097.
• [77] Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077–1087.
• [78] Oh, S., Montanari, A. and Karbasi, A. (2010). Sensor network localization from local connectivity: Performance analysis for the MDS-MAP algorithm. In Information Theory Workshop (ITW) 1–5. IEEE, New York.
• [79] Oliveira, R. I. (2009). Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Preprint. Available at arXiv:0911.0600.
• [80] Plackett, R. L. (1975). The analysis of permutations. J. R. Stat. Soc. Ser. C. Appl. Stat. 24 193–202.
• [81] Radin, C. and Yin, M. (2013). Phase transitions in exponential random graphs. Ann. Appl. Probab. 23 2458–2471.
• [82] Rao, P. V. and Kupper, L. L. (1967). Ties in paired-comparison experiments: A generalization of the Bradley–Terry model. J. Amer. Statist. Assoc. 62 194–204.
• [83] Rennie, J. D. and Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd International Conference on Machine Learning 713–719. ACM, New York.
• [84] Rohde, A. and Tsybakov, A. B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887–930.
• [85] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
• [86] Rohe, K., Qin, T. and Fan, H. (2012). The highest dimensional Stochastic Blockmodel with a regularized estimator. Preprint. Available at arXiv:1206.2380.
• [87] Rudelson, M. and Vershynin, R. (2007). Sampling from large matrices: An approach through geometric functional analysis. J. ACM 54 Art. 21, 19 pp. (electronic).
• [88] Simons, G. and Yao, Y.-C. (1999). Asymptotics when the number of parameters tends to infinity in the Bradley–Terry model for paired comparisons. Ann. Statist. 27 1041–1060.
• [89] Singer, A. (2008). A remark on global positioning from local distances. Proc. Natl. Acad. Sci. USA 105 9507–9511.
• [90] Singer, A. and Cucuringu, M. (2009/10). Uniqueness of low-rank matrix completion by rigidity theory. SIAM J. Matrix Anal. Appl. 31 1621–1641.
• [91] Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.
• [92] Spence, I. and Domoney, D. (1974). Single subject incomplete designs for nonmetric multidimensional scaling. Psychometrika 39 469–490.
• [93] Talagrand, M. (1996). A new look at independence. Ann. Probab. 24 1–34.
• [94] Vu, V. H. (2007). Spectral norm of random matrices. Combinatorica 27 721–736.
• [95] Wolfe, P. J. and Olhede, S. C. (2013). Nonparametric graphon estimation. Preprint. Available at arXiv:1309.5936.
• [96] Yang, J. J., Han, Q. and Airoldi, E. M. (2014). Nonparametric estimation and testing of exchangeable graph models. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics. Journal of Machine Learning Research, Conference and Workshop Proceedings, Vol. 33 1060–1067.
• [97] Zermelo, E. (1929). Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung. Math. Z. 29 436–460.