The Annals of Statistics

Estimation of high-dimensional low-rank matrices

Angelika Rohde and Alexandre B. Tsybakov

Full-text: Open access

Abstract

Suppose that we observe entries or, more generally, linear combinations of entries of an unknown m×T-matrix A corrupted by noise. We are particularly interested in the high-dimensional setting where the number mT of unknown entries can be much larger than the sample size N. Motivated by several applications, we consider estimation of matrix A under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-p quasi-norm penalty term, p≤1. We study these estimators under two possible assumptions—a modified version of the restricted isometry condition and a uniform bound on the ratio “empirical norm induced by the sampling operator/Frobenius norm.” The main results are stated as nonasymptotic upper bounds on the prediction risk and on the Schatten-q risk of the estimators, where q∈[p, 2]. The rates that we obtain for the prediction risk are of the form rm/N (for m=T), up to logarithmic factors, where r is the rank of A. The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the kth entropy numbers of the quasi-convex Schatten class embeddings SpM↪S2M, p<1, which are of independent interest.

Article information

Source
Ann. Statist., Volume 39, Number 2 (2011), 887-930.

Dates
First available in Project Euclid: 9 March 2011

Permanent link to this document
https://projecteuclid.org/euclid.aos/1299680958

Digital Object Identifier
doi:10.1214/10-AOS860

Mathematical Reviews number (MathSciNet)
MR2816342

Zentralblatt MATH identifier
1215.62056

Subjects
Primary: 62G05: Estimation 62F10: Point estimation

Keywords
High-dimensional low-rank matrices empirical process sparse recovery Schatten norm penalized least-squares estimator quasi-convex Schatten class embeddings

Citation

Rohde, Angelika; Tsybakov, Alexandre B. Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 (2011), no. 2, 887--930. doi:10.1214/10-AOS860. https://projecteuclid.org/euclid.aos/1299680958


Export citation

References

  • Abernethy, J., Bach, F., Evgeniou, T. and Vert, J.-P. (2009). A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res. 10 803–826.
  • Amini, A. and Wainwright, M. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
  • Argyriou, A., Evgeniou, T. and Pontil, M. (2008). Convex multi-task feature learning. Mach. Learn. 73 243–272.
  • Argyriou, A., Micchelli, C. A. and Pontil, M. (2010). On spectral learning. J. Mach. Learn. Res. To appear.
  • Argyriou, A., Micchelli, C. A., Pontil, M. and Ying, Y. (2008). A spectral regularization framework for multi-task structure learning. In Advances in Neural Information Processing Systems 20 (J.C. Platt, et al., eds.) 25–32. MIT Press, Cambridge, MA.
  • Bach, F. R. (2008). Consistency of trace norm minimization. J. Mach. Learn. Res. 9 1019–1048.
  • Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bunea, F., She, Y. and Wegkamp, M. H. (2010). Optimal selection of reduced rank estimators of high-dimensional matrices. Available at arXiv:1004.2995.
  • Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • Cai, T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
  • Candès, E. J. and Plan, Y. (2010a). Matrix completion with noise. Proc. IEEE 98 925–936.
  • Candès, E. J. and Plan, Y. (2010b). Tight oracle bounds for low-rank matrix recovery from a mininal number of noisy random measurements. Available at arXiv:1001.0339.
  • Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
  • Candès, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
  • Candès, E. J. and Tao, T. (2009). The power of convex relaxation: Near-optimal matrix completion. Unpublished manuscript.
  • McCarthy, C. A. (1967). Cp. Israel J. Math. 5 249–272.
  • Dalalyan, A. and Tsybakov, A. (2008). Aggregation by exponential weighting, sharp oracle inequalities and sparsity. Mach. Learn. 72 39–61.
  • Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object. J. Roy. Statist. Soc. Ser. B 54 41–81.
  • Edmunds, D. E. and Triebel, H. (1996). Function Spaces, Entropy Numbers, Differential Operators. Cambridge Univ. Press, Cambridge.
  • Edmunds, D. E. and Triebel, H. (1989). Entropy numbers and approximation numbers in function spaces. Proc. London Math. Soc. 58 137–152.
  • Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
  • Guédon, O. and Litvak, A. E. (2000). Euclidean projections of a p-convex body. In Geometric Aspects of Functional Analysis, Israel Seminar (GAFA) 1996–2000 (V. D. Milman and G. Schechtman, eds.). Lecutre Notes in Mathematics 1745 95–108. Springer, Berlin.
  • Gross, D. (2009). Recovering low-rank matrices from few coefficients in any basis. Available at arXiv:0910.1879.
  • Keshavan, R. H., Montanari, A. and Oh, S. (2009). Matrix completion from noisy entries. Available at arXiv:0906.2027.
  • Kolmogorov, A. N. and Tikhomirov, V. M. (1959). The ε-entropy and ε-capacity of sets in function spaces. Uspekhi Matem. Nauk 14 3–86.
  • Koltchinskii, V. (2008). Oracle inequalities in empirical risk minimization and sparse recovery problems. Ecole d’Eté de Probabilités de Saint-Flour, Lecture Notes. Preprint.
  • Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2009). Taking advantage of sparsity in multi-task learning. In Proceedings of COLT-2009.
  • Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2010). Oracle inequalities and optimal inference under group sparsity. Available at arXiv:1007.1771.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Funct. Anal. 17 1248–1282.
  • Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2009). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Advances in Neural Information Processing Systems, NIPS-2009.
  • Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low rank matrices with noise and high-dimensional scaling. Ann. Statist. To appear. Available at arXiv:0912.5100.
  • Nemirovski, A. (2004). Regular Banach spaces and large deviations of random sums. Unpublished manuscript.
  • Pajor, A. (1998). Metric entropy of the Grassmann manifold. Convex Geom. Anal. 34 181–188.
  • Paulsen, V. I. (1986). Completely Bounded Maps and Dilations. In Pitman Research Notes in Mathematics 146. Longman, New York.
  • Pietsch, A. (1980). Operator Ideals. Elsevier, Amsterdam.
  • Pinelis, I. F. and Sakhanenko, A. I. (1985). Remarks on inequalities for the probabilities of large deviations. Theory Probab. Appl. 30 143–148.
  • Ravikumar, P., Wainwright, M., Raskutti, G. and Yu, B. (2008). High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence. Unpublished manuscript.
  • Recht, B. (2009). A simpler approach to matrix completion. Available at arXiv:0910.0651.
  • Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471–501.
  • Rigollet, P. and Tsybakov, A. B. (2010). Exponential screening and optimal rates of sparse estimation. Available at arXiv:1003.2654.
  • Rotfeld, S. Y. (1969). The singular numbers of the sum of completely continuous operators. In Topics in Mathematical Physics (M. S. Birman, ed.). Spectral Theory 3 73–78. English version published by Consultants Bureau, New York.
  • Srebro, N., Rennie, J. and Jaakkola, T. (2005). Maximum margin matrix factorization. In Advances in Neural Information Processing Systems 17 (L. Saul, Y. Weiss and L. Bottou, eds.) 1329–1336. MIT Press, Cambridge, MA.
  • Srebro, N. and Shraibman, A. (2005). Rank, trace-norm and max-norm. In Learning Theory, Proceedings of COLT-2005. Lecture Notes in Comput. Sci. 3559 545–560. Springer, Berlin.
  • Tropp, J. A. (2010). User-friendly tail bounds for sums of random matrices. Available at arXiv:1004.4389.
  • Tsybakov, A. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • Tsybakov, A. and van de Geer, S. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203–1224.
  • van de Geer, S. (2000). Empirical Processes in M-estimation. Cambridge Univ. Press, Cambridge.
  • Vershynin, R. (2007). Some problems in asymptotic convex geometry and random matrices motivated by numerical algorithms. In Banach Spaces and Their Applications in Analysis (B. Randrianantoanina and N. Randrianantoanina, eds.) 209–218. de Gruyter, Berlin.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. J. Mach. Learn. Res. 7 2541–2563.