Annals of Statistics

Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions

Alekh Agarwal, Sahand Negahban, and Martin J. Wainwright

Full-text: Open access


We analyze a class of estimators based on convex relaxation for solving high-dimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation $\mathfrak{X}$ of the sum of an (approximately) low rank matrix $\Theta^{\star}$ with a second matrix $\Gamma^{\star}$ endowed with a complementary form of low-dimensional structure; this set-up includes many statistical models of interest, including factor analysis, multi-task regression and robust covariance estimation. We derive a general theorem that bounds the Frobenius norm error for an estimate of the pair $(\Theta^{\star},\Gamma^{\star})$ obtained by solving a convex optimization problem that combines the nuclear norm with a general decomposable regularizer. Our results use a “spikiness” condition that is related to, but milder than, singular vector incoherence. We specialize our general result to two cases that have been studied in past work: low rank plus an entrywise sparse matrix, and low rank plus a columnwise sparse matrix. For both models, our theory yields nonasymptotic Frobenius error bounds for both deterministic and stochastic noise matrices, and applies to matrices $\Theta^{\star}$ that can be exactly or approximately low rank, and matrices $\Gamma^{\star}$ that can be exactly or approximately sparse. Moreover, for the case of stochastic noise matrices and the identity observation operator, we establish matching lower bounds on the minimax error. The sharpness of our nonasymptotic predictions is confirmed by numerical simulations.

Article information

Ann. Statist., Volume 40, Number 2 (2012), 1171-1197.

First available in Project Euclid: 18 July 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F30: Inference under constraints 62F30: Inference under constraints
Secondary: 62H12: Estimation

High-dimensional inference nuclear norm composite regularizers


Agarwal, Alekh; Negahban, Sahand; Wainwright, Martin J. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. Ann. Statist. 40 (2012), no. 2, 1171--1197. doi:10.1214/12-AOS1000.

Export citation


  • [1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Supplement to “Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions.” DOI:10.1214/12-AOS1000SUPP.
  • [2] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
  • [3] Ando, R. K. and Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6 1817–1853.
  • [4] Blitzer, J., Foster, D. P. and Kakade, S. M. (2009). Zero-shot domain adaptation: A multi-view approach. Technical report, Toyota Technological Institute at Chicago.
  • [5] Blitzer, J., Mcdonald, R. and Pereira, F. (2006). Domain adaptation with structural correspondence learning. In EMNLP Conference, Sydney, Australia.
  • [6] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
  • [7] Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis? J. ACM 58 Art. 11, 37.
  • [8] Chandrasekaran, V., Parillo, P. A. and Willsky, A. S. (2010). Latent variable graphical model selection via convex optimization. Technical report, Massachusetts Institute of Technology.
  • [9] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A. and Willsky, A. S. (2011). Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21 572–596.
  • [10] Fan, J., Liao, Y. and Mincheva, M. (2012). Large covariance estimation by thresholding principal orthogonal components. Technical report, Princeton Univ. Available at arXiv:1201.0175v1.
  • [11] Hsu, D., Kakade, S. M. and Zhang, T. (2011). Robust matrix decomposition with sparse corruptions. IEEE Trans. Inform. Theory 57 7221–7234.
  • [12] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [13] McCoy, M. and Tropp, J. A. (2011). Two proposals for robust PCA using semidefinite programming. Electron. J. Stat. 5 1123–1160.
  • [14] Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2009). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In NIPS Conference, Vancouver, Canada, December 2009. Full length version available at arXiv:1010.2731v1. Statist. Sci. To appear.
  • [15] Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069–1097.
  • [16] Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and (weighted) matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665–1697.
  • [17] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $\ell_q$-balls. IEEE Trans. Inform. Theory 57 6976–6994.
  • [18] Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471–501.
  • [19] Rockafellar, R. T. (1970). Convex Analysis. Princeton Mathematical Series 28. Princeton Univ. Press, Princeton, NJ.
  • [20] Rohde, A. and Tsybakov, A. B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887–930.
  • [21] Xu, H., Caramanis, C. and Sanghavi, S. (2010). Robust PCA via outlier pursuit. Technical report, Univ. Texas, Austin. Available at arXiv:1010.4237.
  • [22] Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 329–346.

Supplemental materials

  • Supplementary material: Simulations and proofs. This supplementary material contains numerical simulations that demonstrate excellent agreement between the theoretical predictions and the practical behavior of our estimators. We also provide proofs for our upper and lower bounds, including slightly sharpened versions of Corollaries 2 and 6.