We study an instance of high-dimensional inference in which the goal is to estimate a matrix Θ∗∈ℝm1×m2 on the basis of N noisy observations. The unknown matrix Θ∗ is assumed to be either exactly low rank, or “near” low-rank, meaning that it can be well-approximated by a matrix with low rank. We consider a standard M-estimator based on regularization by the nuclear or trace norm over matrices, and analyze its performance under high-dimensional scaling. We define the notion of restricted strong convexity (RSC) for the loss function, and use it to derive nonasymptotic bounds on the Frobenius norm error that hold for a general class of noisy observation models, and apply to both exactly low-rank and approximately low rank matrices. We then illustrate consequences of this general theory for a number of specific matrix models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes and recovery of low-rank matrices from random projections. These results involve nonasymptotic random matrix theory to establish that the RSC condition holds, and to determine an appropriate choice of regularization parameter. Simulation results show excellent agreement with the high-dimensional scaling of the error predicted by our theory.
 Abernethy, J., Bach, F., Evgeniou, T. and Stein, J. (2006). Low-rank matrix factorization with attributes. Technical Report N-24/06/MM, Ecole des mines de Paris, France.
 Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
 Anderson, C. W., Stolz, E. A. and Shamsunder, S. (1998). Multivariate autoregressive models for classification of spontaneous electroencephalogram during mental tasks. IEEE Trans. Bio-Med. Eng. 45 277.
 Anderson, T. W. (1971). The Statistical Analysis of Time Series. Wiley, New York.
Mathematical Reviews (MathSciNet): MR283939
 Argyriou, A., Evgeniou, T. and Pontil, M. (2006). Multi-task feature learning. In Neural Information Processing Systems (NIPS) 41–48. Vancouver, Canada.
 Bach, F. (2008). Consistency of trace norm minimization. J. Mach. Learn. Res. 9 1019–1048.
 Bickel, P. and Levina, E. (2008). Covariance estimation by thresholding. Ann. Statist. 36 2577–2604.
 Bickel, P. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
 Bickel, P. and Li, B. (2006). Regularization in statistics. TEST 15 271–344.
 Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
 Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
 Brown, E. N., Kass, R. E. and Mitra, P. P. (2004). Multiple neural spike train data analysis: State-of-the-art and future challenges. Nature Neuroscience 7 456–466.
 Candès, E. and Plan, Y. (2010). Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. Technical report, Stanford Univ. Available at arXiv:1001.0339v1
 Candes, E. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
 Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
 Chen, S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
 Cohen, A., Dahmen, W. and DeVore, R. (2009). Compressed sensing and best k-term approximation. J. Amer. Math. Soc. 22 211–231.
 Donoho, D. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
 El-Karoui, N. (2008). Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
 Fan, J. and Li, R. (2001). Variable selection via non-concave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
 Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
 Fisher, J. and Black, M. J. (2005). Motor cortical decoding using an autoregressive moving average model. 27th Annual International Conference of the Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005 2130–2133.
 Friedman, J., Hastie, T. and Tibshirani, R. (2007). Sparse inverse covariance estimation with the graphical Lasso. Biostatistics 9 432–441.
 Harrison, L., Penny, W. D. and Friston, K. (2003). Multivariate autoregressive modeling of fmri time series. NeuroImage 19 1477–1491.
 Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR832183
 Horn, R. A. and Johnson, C. R. (1991). Topics in Matrix Analysis. Cambridge Univ. Press, Cambridge.
 Huang, J. and Zhang, T. (2009). The benefit of group sparsity. Technical report, Rutgers Univ. Available at arXiv:0901.2962
 Ji, S. and Ye, J. (2009). An accelerated gradient method for trace norm minimization. In International Conference on Machine Learning (ICML) 457–464. ACM, New York.
 Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
 Keshavan, R. H., Montanari, A. and Oh, S. (2009). Matrix completion from noisy entries. Technical report, Stanford Univ. Available at http://arxiv.org/abs/0906.2027v1
 Lee, K. and Bresler, Y. (2009). Guaranteed minimum rank approximation from linear observations by nuclear norm minimization with an ellipsoidal constraint. Technical report. UIUC. Available at arXiv:0903.4742
 Liu, Z. and Vandenberghe, L. (2009). Interior-point method for nuclear norm optimization with application to system identification. SIAM J. Matrix Anal. Appl. 31 1235–1256.
 Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2009). Taking advantage of sparsity in multi-task learning. Technical report, ETH Zurich. Available at arXiv:0903.1468
 Lütkepolhl, H. (2006). New Introduction to Multiple Time Series Analysis. Springer, New York.
 Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
 Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2009). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Proceedings of the NIPS Conference 1348–1356. Vancouver, Canada.
 Negahban, S. and Wainwright, M. J. (2010). Restricted strong convexity and (weighted) matrix completion: Near-optimal bounds with noise. Technical report, Univ. California, Berkeley.
 Negahban, S. and Wainwright, M. J. (2010). Supplement to “Estimation of (near) low-rank matrices with noise and high-dimensional scaling.” DOI: 10.1214/10-AOS850SUPP
 Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. Technical Report 2007/76, CORE, Univ. Catholique de Louvain.
 Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Union support recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
 Paul, D. and Johnstone, I. (2008). Augmented sparse principal component analysis for high-dimensional data. Technical report, Univ. California, Davis.
 Raskutti, G., Wainwright, M. J. and Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over ℓq
-balls. Technical report, Dept. Statistics, Univ. California, Berkeley. Available at arXiv:0910.2042
 Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2008). High-dimensional covariance estimation: Convergence rates of ℓ1-regularized log-determinant divergence. Technical report, Dept. Statistics, Univ. California, Berkeley.
 Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471–501.
 Rohde, A. and Tsybakov, A. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887–930.
 Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic J. Statist. 2 494–515.
 Srebro, N., Rennie, J. and Jaakkola, T. (2005). Maximum-margin matrix factorization. In Proceedings of the NIPS Conference 1329–1336. Vancouver, Canada.
 Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
 Vandenberghe, L. and Boyd, S. (1996). Semidefinite programming. SIAM Rev. 38 49–95.
 Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
 Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 329–346.
 Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
 Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.