The Annals of Statistics

Estimation of (near) low-rank matrices with noise and high-dimensional scaling

Sahand Negahban and Martin J. Wainwright
Source: Ann. Statist. Volume 39, Number 2 (2011), 1069-1097.

Abstract

We study an instance of high-dimensional inference in which the goal is to estimate a matrix Θ∈ℝm1×m2 on the basis of N noisy observations. The unknown matrix Θ is assumed to be either exactly low rank, or “near” low-rank, meaning that it can be well-approximated by a matrix with low rank. We consider a standard M-estimator based on regularization by the nuclear or trace norm over matrices, and analyze its performance under high-dimensional scaling. We define the notion of restricted strong convexity (RSC) for the loss function, and use it to derive nonasymptotic bounds on the Frobenius norm error that hold for a general class of noisy observation models, and apply to both exactly low-rank and approximately low rank matrices. We then illustrate consequences of this general theory for a number of specific matrix models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes and recovery of low-rank matrices from random projections. These results involve nonasymptotic random matrix theory to establish that the RSC condition holds, and to determine an appropriate choice of regularization parameter. Simulation results show excellent agreement with the high-dimensional scaling of the error predicted by our theory.

First Page: Show Hide

Related Works:

Primary Subjects: 62F30
Secondary Subjects: 62H12
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1304947044
Digital Object Identifier: doi:10.1214/10-AOS850
Zentralblatt MATH identifier: 05914741
Mathematical Reviews number (MathSciNet): MR2816348

References

[1] Abernethy, J., Bach, F., Evgeniou, T. and Stein, J. (2006). Low-rank matrix factorization with attributes. Technical Report N-24/06/MM, Ecole des mines de Paris, France.
[2] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
Mathematical Reviews (MathSciNet): MR2541450
Zentralblatt MATH: 1173.62049
Digital Object Identifier: doi:10.1214/08-AOS664
Project Euclid: euclid.aos/1247836672
[3] Anderson, C. W., Stolz, E. A. and Shamsunder, S. (1998). Multivariate autoregressive models for classification of spontaneous electroencephalogram during mental tasks. IEEE Trans. Bio-Med. Eng. 45 277.
[4] Anderson, T. W. (1971). The Statistical Analysis of Time Series. Wiley, New York.
Mathematical Reviews (MathSciNet): MR283939
[5] Argyriou, A., Evgeniou, T. and Pontil, M. (2006). Multi-task feature learning. In Neural Information Processing Systems (NIPS) 41–48. Vancouver, Canada.
[6] Bach, F. (2008). Consistency of trace norm minimization. J. Mach. Learn. Res. 9 1019–1048.
Mathematical Reviews (MathSciNet): MR2417263
[7] Bickel, P. and Levina, E. (2008). Covariance estimation by thresholding. Ann. Statist. 36 2577–2604.
[8] Bickel, P. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
Mathematical Reviews (MathSciNet): MR2387969
Zentralblatt MATH: 1132.62040
Digital Object Identifier: doi:10.1214/009053607000000758
Project Euclid: euclid.aos/1201877299
[9] Bickel, P. and Li, B. (2006). Regularization in statistics. TEST 15 271–344.
Mathematical Reviews (MathSciNet): MR2273731
Zentralblatt MATH: 1110.62051
Digital Object Identifier: doi:10.1007/BF02607055
[10] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
Mathematical Reviews (MathSciNet): MR2533469
Zentralblatt MATH: 1173.62022
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830
[11] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR2061575
[12] Brown, E. N., Kass, R. E. and Mitra, P. P. (2004). Multiple neural spike train data analysis: State-of-the-art and future challenges. Nature Neuroscience 7 456–466.
[13] Candès, E. and Plan, Y. (2010). Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. Technical report, Stanford Univ. Available at arXiv:1001.0339v1.
[14] Candes, E. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
Mathematical Reviews (MathSciNet): MR2243152
Digital Object Identifier: doi:10.1109/TIT.2005.858979
[15] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
Mathematical Reviews (MathSciNet): MR2565240
Digital Object Identifier: doi:10.1007/s10208-009-9045-5
[16] Chen, S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
Mathematical Reviews (MathSciNet): MR1639094
Zentralblatt MATH: 0919.94002
Digital Object Identifier: doi:10.1137/S1064827596304010
[17] Cohen, A., Dahmen, W. and DeVore, R. (2009). Compressed sensing and best k-term approximation. J. Amer. Math. Soc. 22 211–231.
Mathematical Reviews (MathSciNet): MR2449058
Zentralblatt MATH: 1206.94008
Digital Object Identifier: doi:10.1090/S0894-0347-08-00610-3
[18] Donoho, D. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
Mathematical Reviews (MathSciNet): MR2241189
Digital Object Identifier: doi:10.1109/TIT.2006.871582
[19] El-Karoui, N. (2008). Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
Mathematical Reviews (MathSciNet): MR2485011
Zentralblatt MATH: 1196.62064
Digital Object Identifier: doi:10.1214/07-AOS559
Project Euclid: euclid.aos/1231165183
[20] Fan, J. and Li, R. (2001). Variable selection via non-concave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
Mathematical Reviews (MathSciNet): MR1946581
Zentralblatt MATH: 1073.62547
Digital Object Identifier: doi:10.1198/016214501753382273
[21] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
Mathematical Reviews (MathSciNet): MR2640659
Zentralblatt MATH: 1180.62080
[22] Fazel, M. (2002). Matrix Rank Minimization with Applications. Ph.D. thesis, Stanford Univ. Available at http://faculty.washington.edu/mfazel/thesis-final.pdf.
[23] Fisher, J. and Black, M. J. (2005). Motor cortical decoding using an autoregressive moving average model. 27th Annual International Conference of the Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005 2130–2133.
[24] Friedman, J., Hastie, T. and Tibshirani, R. (2007). Sparse inverse covariance estimation with the graphical Lasso. Biostatistics 9 432–441.
[25] Harrison, L., Penny, W. D. and Friston, K. (2003). Multivariate autoregressive modeling of fmri time series. NeuroImage 19 1477–1491.
[26] Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR832183
[27] Horn, R. A. and Johnson, C. R. (1991). Topics in Matrix Analysis. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR1091716
[28] Huang, J. and Zhang, T. (2009). The benefit of group sparsity. Technical report, Rutgers Univ. Available at arXiv:0901.2962.
Mathematical Reviews (MathSciNet): MR2676881
Zentralblatt MATH: 1202.62052
Digital Object Identifier: doi:10.1214/09-AOS778
Project Euclid: euclid.aos/1278861240
[29] Ji, S. and Ye, J. (2009). An accelerated gradient method for trace norm minimization. In International Conference on Machine Learning (ICML) 457–464. ACM, New York.
[30] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
Mathematical Reviews (MathSciNet): MR1863961
Zentralblatt MATH: 1016.62078
Digital Object Identifier: doi:10.1214/aos/1009210544
Project Euclid: euclid.aos/1009210544
[31] Keshavan, R. H., Montanari, A. and Oh, S. (2009). Matrix completion from noisy entries. Technical report, Stanford Univ. Available at http://arxiv.org/abs/0906.2027v1.
Mathematical Reviews (MathSciNet): MR2678022
[32] Lee, K. and Bresler, Y. (2009). Guaranteed minimum rank approximation from linear observations by nuclear norm minimization with an ellipsoidal constraint. Technical report. UIUC. Available at arXiv:0903.4742.
[33] Liu, Z. and Vandenberghe, L. (2009). Interior-point method for nuclear norm optimization with application to system identification. SIAM J. Matrix Anal. Appl. 31 1235–1256.
Mathematical Reviews (MathSciNet): MR2558821
Zentralblatt MATH: 1201.90151
Digital Object Identifier: doi:10.1137/090755436
[34] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. (2009). Taking advantage of sparsity in multi-task learning. Technical report, ETH Zurich. Available at arXiv:0903.1468.
Mathematical Reviews (MathSciNet): MR2604071
Zentralblatt MATH: 1177.62001
[35] Lütkepolhl, H. (2006). New Introduction to Multiple Time Series Analysis. Springer, New York.
[36] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
[37] Negahban, S., Ravikumar, P., Wainwright, M. J. and Yu, B. (2009). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Proceedings of the NIPS Conference 1348–1356. Vancouver, Canada.
[38] Negahban, S. and Wainwright, M. J. (2010). Restricted strong convexity and (weighted) matrix completion: Near-optimal bounds with noise. Technical report, Univ. California, Berkeley.
[39] Negahban, S. and Wainwright, M. J. (2010). Supplement to “Estimation of (near) low-rank matrices with noise and high-dimensional scaling.” DOI: 10.1214/10-AOS850SUPP.
[40] Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. Technical Report 2007/76, CORE, Univ. Catholique de Louvain.
[41] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Union support recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
[42] Paul, D. and Johnstone, I. (2008). Augmented sparse principal component analysis for high-dimensional data. Technical report, Univ. California, Davis.
[43] Raskutti, G., Wainwright, M. J. and Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over q-balls. Technical report, Dept. Statistics, Univ. California, Berkeley. Available at arXiv:0910.2042.
[44] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2008). High-dimensional covariance estimation: Convergence rates of 1-regularized log-determinant divergence. Technical report, Dept. Statistics, Univ. California, Berkeley.
[45] Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471–501.
Mathematical Reviews (MathSciNet): MR2680543
Zentralblatt MATH: 1198.90321
Digital Object Identifier: doi:10.1137/070697835
[46] Recht, B., Xu, W. and Hassibi, B. (2009). Null space conditions and thresholds for rank minimization. Technical report, Univ. Wisconsin–Madison. Available at http://pages.cs.wisc.edu/~brecht/papers/10.RecXuHas.Thresholds.pdf.
[47] Rohde, A. and Tsybakov, A. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887–930.
[48] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic J. Statist. 2 494–515.
Mathematical Reviews (MathSciNet): MR2417391
Digital Object Identifier: doi:10.1214/08-EJS176
Project Euclid: euclid.ejs/1214491853
[49] Srebro, N., Rennie, J. and Jaakkola, T. (2005). Maximum-margin matrix factorization. In Proceedings of the NIPS Conference 1329–1336. Vancouver, Canada.
[50] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242
[51] Vandenberghe, L. and Boyd, S. (1996). Semidefinite programming. SIAM Rev. 38 49–95.
Mathematical Reviews (MathSciNet): MR1379041
Digital Object Identifier: doi:10.1137/1038003
[52] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
Mathematical Reviews (MathSciNet): MR2729873
Digital Object Identifier: doi:10.1109/TIT.2009.2016018
[53] Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 329–346.
Mathematical Reviews (MathSciNet): MR2323756
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.00591.x
[54] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
[55] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
Mathematical Reviews (MathSciNet): MR2367824
Zentralblatt MATH: 1142.62408
Digital Object Identifier: doi:10.1093/biomet/asm018

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?