## The Annals of Statistics

### Joint variable and rank selection for parsimonious estimation of high-dimensional matrices

#### Abstract

We propose dimension reduction methods for sparse, high-dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower-dimensional approximations of the parameter matrix in such models. We show in this article that important gains in prediction accuracy can be obtained by considering them jointly. We motivate a new class of sparse multivariate regression models, in which the coefficient matrix has low rank and zero rows or can be well approximated by such a matrix. Next, we introduce estimators that are based on penalized least squares, with novel penalties that impose simultaneous row and rank restrictions on the coefficient matrix. We prove that these estimators indeed adapt to the unknown matrix sparsity and have fast rates of convergence. We support our theoretical results with an extensive simulation study and two data analyses.

#### Article information

Source
Ann. Statist., Volume 40, Number 5 (2012), 2359-2388.

Dates
First available in Project Euclid: 4 February 2013

https://projecteuclid.org/euclid.aos/1359987524

Digital Object Identifier
doi:10.1214/12-AOS1039

Mathematical Reviews number (MathSciNet)
MR3097606

Zentralblatt MATH identifier
1373.62246

#### Citation

Bunea, Florentina; She, Yiyuan; Wegkamp, Marten H. Joint variable and rank selection for parsimonious estimation of high-dimensional matrices. Ann. Statist. 40 (2012), no. 5, 2359--2388. doi:10.1214/12-AOS1039. https://projecteuclid.org/euclid.aos/1359987524

#### References

• Aldrin, M. (1996). Moderate projection pursuit regression for multivariate response data. Comput. Statist. Data Anal. 21 501–531.
• Anderson, T. W. (1951). Estimating linear restrictions on regression coefficients for multivariate normal distributions. Ann. Math. Statist. 22 327–351.
• Bertsekas, D. (1999). Nonlinear Programming. Athena Scientific, Nashua, NH.
• Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• Boothby, W. M. (1986). An Introduction to Differentiable Manifolds and Riemannian Geometry, 2nd ed. Pure and Applied Mathematics 120. Academic Press, Orlando, FL.
• Brillinger, D. R. (1981). Time Series: Data Analysis and Theory, 2nd ed. Holden-Day, Oakland, CA.
• Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• Bunea, F. (2008). Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization. Electron. J. Stat. 2 1153–1194.
• Bunea, F., She, Y. and Wegkamp, M. H. (2011). Optimal selection of reduced rank estimators of high-dimensional matrices. Ann. Statist. 39 1282–1309.
• Bunea, F., She, Y., Ombao, H., Gongvatana, A., Devlin, K. and Cohen, R. (2011). Penalized least squares regression methods and applications to neuroimaging. NeuroImage 55 1519–1527.
• Candès, E. J. and Plan, Y. (2010). Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. IEEE Trans. Inform. Theory 57 2342–2359.
• Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
• Gabay, D. (1982). Minimizing a differentiable function over a differential manifold. J. Optim. Theory Appl. 37 177–219.
• Giraud, C. (2011). Low rank multivariate regression. Electron. J. Stat. 5 775–799.
• Izenman, A. J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer, New York.
• Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
• Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164–2204.
• Luenberger, D. G. and Ye, Y. (2008). Linear and Nonlinear Programming, 3rd ed. International Series in Operations Research & Management Science 116. Springer, New York.
• Negahban, S. and Wainwright, M. J. (2011). Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39 1069–1097.
• Reinsel, G. C. and Velu, R. P. (1998). Multivariate Reduced-Rank Regression: Theory and Applications. Lecture Notes in Statistics 136. Springer, New York.
• Rohde, A. and Tsybakov, A. B. (2011). Estimation of high-dimensional low-rank matrices. Ann. Statist. 39 887–930.
• She, Y. (2012). An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors. Comput. Statist. Data Anal. 56 2976–2990.
• Shimizu, K., Ishizuka, Y. and Bard, J. F. (1997). Nondifferentiable and Two-Level Mathematical Programming. Kluwer Academic, Boston, MA.
• Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 475–494.
• Wei, F. and Huang, J. (2010). Consistent group selection in high-dimensional linear regression. Bernoulli 16 1369–1384.
• Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11 95–103.
• Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
• Yuan, M., Ekici, A., Lu, Z. and Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 329–346.
• Zangwill, W. I. and Mond, B. (1969). Nonlinear Programming: A Unified Approach. Prentice Hall International, Englewood Cliffs, NJ.