The Annals of Statistics

Estimation of (near) low-rank matrices with noise and high-dimensional scaling

Sahand Negahban and Martin J. Wainwright

Full-text: Open access

Abstract

We study an instance of high-dimensional inference in which the goal is to estimate a matrix Θ∈ℝm1×m2 on the basis of N noisy observations. The unknown matrix Θ is assumed to be either exactly low rank, or “near” low-rank, meaning that it can be well-approximated by a matrix with low rank. We consider a standard M-estimator based on regularization by the nuclear or trace norm over matrices, and analyze its performance under high-dimensional scaling. We define the notion of restricted strong convexity (RSC) for the loss function, and use it to derive nonasymptotic bounds on the Frobenius norm error that hold for a general class of noisy observation models, and apply to both exactly low-rank and approximately low rank matrices. We then illustrate consequences of this general theory for a number of specific matrix models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes and recovery of low-rank matrices from random projections. These results involve nonasymptotic random matrix theory to establish that the RSC condition holds, and to determine an appropriate choice of regularization parameter. Simulation results show excellent agreement with the high-dimensional scaling of the error predicted by our theory.

Article information

Source
Ann. Statist. Volume 39, Number 2 (2011), 1069-1097.

Dates
First available in Project Euclid: 9 May 2011

Permanent link to this document
http://projecteuclid.org/euclid.aos/1304947044

Digital Object Identifier
doi:10.1214/10-AOS850

Zentralblatt MATH identifier
05914741

Mathematical Reviews number (MathSciNet)
MR2816348

Subjects
Primary: 62F30: Inference under constraints
Secondary: 62H12: Estimation

Keywords
High-dimensional inference rank constraints nuclear norm trace norm M-estimators random matrix theory

Citation

Negahban, Sahand; Wainwright, Martin J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics 39 (2011), no. 2, 1069--1097. doi:10.1214/10-AOS850. http://projecteuclid.org/euclid.aos/1304947044.


Export citation

References

  • [1] Abernethy, J., Bach, F., Evgeniou, T. and Stein, J. (2006). Low-rank matrix factorization with attributes. Technical Report N-24/06/MM, Ecole des mines de Paris, France.
  • [2] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
  • [3] Anderson, C. W., Stolz, E. A. and Shamsunder, S. (1998). Multivariate autoregressive models for classification of spontaneous electroencephalogram during mental tasks. IEEE Trans. Bio-Med. Eng. 45 277.
  • [4] Anderson, T. W. (1971). The Statistical Analysis of Time Series. Wiley, New York.
  • [5] Argyriou, A., Evgeniou, T. and Pontil, M. (2006). Multi-task feature learning. In Neural Information Processing Systems (NIPS) 41–48. Vancouver, Canada.
  • [6] Bach, F. (2008). Consistency of trace norm minimization. J. Mach. Learn. Res. 9 1019–1048.
  • [7] Bickel, P. and Levina, E. (2008). Covariance estimation by thresholding. Ann. Statist. 36 2577–2604.
  • [8] Bickel, P. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [9] Bickel, P. and Li, B. (2006). Regularization in statistics. TEST 15 271–344.
  • [10] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [11] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
  • [12] Brown, E. N., Kass, R. E. and Mitra, P. P. (2004). Multiple neural spike train data analysis: State-of-the-art and future challenges. Nature Neuroscience 7 456–466.
  • [13] Candès, E. and Plan, Y. (2010). Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. Technical report, Stanford Univ. Available at arXiv:1001.0339v1.
  • [14] Candes, E. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
  • [15] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
  • [16] Chen, S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • [17] Cohen, A., Dahmen, W. and DeVore, R. (2009). Compressed sensing and best k-term approximation. J. Amer. Math. Soc. 22 211–231.
  • [18] Donoho, D. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
  • [19] El-Karoui, N. (2008). Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • [20] Fan, J. and Li, R. (2001). Variable selection via non-concave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [21] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
  • [22] Fazel, M. (2002). Matrix Rank Minimization with Applications. Ph.D. thesis, Stanford Univ. Available at http://faculty.washington.edu/mfazel/thesis-final.pdf.
  • [23] Fisher, J. and Black, M. J. (2005). Motor cortical decoding using an autoregressive moving average model. 27th Annual International Conference of the Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005 2130–2133.
  • [24] Friedman, J., Hastie, T. and Tibshirani, R. (2007). Sparse inverse covariance estimation with the graphical Lasso. Biostatistics 9 432–441.
  • [25] Harrison, L., Penny, W. D. and Friston, K. (2003). Multivariate autoregressive modeling of fmri time series. NeuroImage 19 1477–1491.
  • [26] Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge Univ. Press, Cambridge.
  • [27] Horn, R. A. and Johnson, C. R. (1991). Topics in Matrix Analysis. Cambridge Univ. Press, Cambridge.
  • [28] Huang, J. and Zhang, T. (2009). The benefit of group sparsity. Technical report, Rutgers Univ. Available at arXiv:0901.2962.
  • [29] Ji, S. and Ye, J. (2009). An accelerated gradient method for trace norm minimization. In International Conference on Machine Learning (ICML) 457–464. ACM, New York.
  • [30] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [31] Keshavan, R. H., Montanari, A. and Oh, S. (2009). Matrix completion from noisy entries. Technical report, Stanford Univ. Available at http://arxiv.org/abs/0906.2027v1.
  • [32] Lee, K. and Bresler, Y. (2009). Guaranteed minimum rank approximation from linear observations by nuclear norm minimization with an ellipsoidal constraint. Technical report. UIUC. Available at arXiv:0903.4742.
  • [33] Liu, Z. and Vandenberghe, L. (2009). Interior-point method for nuclear norm optimization with application to system identification. SIAM J. Matrix Anal. Appl. 31 1235–1256.
  • [34] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2009). Taking advantage of sparsity in multi-task learning. Technical report, ETH Zurich. Available at arXiv:0903.1468.
  • [35] Lütkepolhl, H. (2006). New Introduction to Multiple Time Series Analysis. Springer, New York.
  • [36] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • [37] Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2009). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Proceedings of the NIPS Conference 1348–1356. Vancouver, Canada.
  • [38] Negahban, S. and Wainwright, M. J. (2010). Restricted strong convexity and (weighted) matrix completion: Near-optimal bounds with noise. Technical report, Univ. California, Berkeley.
  • [39] Negahban, S. and Wainwright, M. J. (2010). Supplement to “Estimation of (near) low-rank matrices with noise and high-dimensional scaling.” DOI: 10.1214/10-AOS850SUPP.
  • [40] Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. Technical Report 2007/76, CORE, Univ. Catholique de Louvain.
  • [41] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Union support recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
  • [42] Paul, D. and Johnstone, I. (2008). Augmented sparse principal component analysis for high-dimensional data. Technical report, Univ. California, Davis.
  • [43] Raskutti, G., Wainwright, M. J. and Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over q-balls. Technical report, Dept. Statistics, Univ. California, Berkeley. Available at arXiv:0910.2042.
  • [44] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2008). High-dimensional covariance estimation: Convergence rates of 1-regularized log-determinant divergence. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • [45] Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471–501.
  • [46] Recht, B., Xu, W. and Hassibi, B. (2009). Null space conditions and thresholds for rank minimization. Technical report, Univ. Wisconsin–Madison. Available at http://pages.cs.wisc.edu/~brecht/papers/10.RecXuHas.Thresholds.pdf.
  • [47] Rohde, A. and Tsybakov, A. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887–930.
  • [48] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic J. Statist. 2 494–515.
  • [49] Srebro, N., Rennie, J. and Jaakkola, T. (2005). Maximum-margin matrix factorization. In Proceedings of the NIPS Conference 1329–1336. Vancouver, Canada.
  • [50] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [51] Vandenberghe, L. and Boyd, S. (1996). Semidefinite programming. SIAM Rev. 38 49–95.
  • [52] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • [53] Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 329–346.
  • [54] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • [55] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.

Supplemental materials

  • Supplementary material: Supplement to “Estimation of (Near) Low-Rank Matrices with Noise and High-Dimensional Scaling”. Owing to space constraints, we have moved many of the technical proofs and details to the Appendix, which is contained in the supplementary document [39].