## The Annals of Statistics

### Convex regularization for high-dimensional multiresponse tensor regression

#### Abstract

In this paper, we present a general convex optimization approach for solving high-dimensional multiple response tensor regression problems under low-dimensional structural assumptions. We consider using convex and weakly decomposable regularizers assuming that the underlying tensor lies in an unknown low-dimensional subspace. Within our framework, we derive general risk bounds of the resulting estimate under fairly general dependence structure among covariates. Our framework leads to upper bounds in terms of two very simple quantities, the Gaussian width of a convex set in tensor space and the intrinsic dimension of the low-dimensional tensor subspace. To the best of our knowledge, this is the first general framework that applies to multiple response problems. These general bounds provide useful upper bounds on rates of convergence for a number of fundamental statistical models of interest including multiresponse regression, vector autoregressive models, low-rank tensor models and pairwise interaction models. Moreover, in many of these settings we prove that the resulting estimates are minimax optimal. We also provide a numerical study that both validates our theoretical guarantees and demonstrates the breadth of our framework.

#### Article information

Source
Ann. Statist., Volume 47, Number 3 (2019), 1554-1584.

Dates
Revised: May 2018
First available in Project Euclid: 13 February 2019

https://projecteuclid.org/euclid.aos/1550026849

Digital Object Identifier
doi:10.1214/18-AOS1725

Mathematical Reviews number (MathSciNet)
MR3911122

Zentralblatt MATH identifier
07053518

#### Citation

Raskutti, Garvesh; Yuan, Ming; Chen, Han. Convex regularization for high-dimensional multiresponse tensor regression. Ann. Statist. 47 (2019), no. 3, 1554--1584. doi:10.1214/18-AOS1725. https://projecteuclid.org/euclid.aos/1550026849

#### References

• [1] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley, New York.
• [2] Basu, S. and Michailidis, G. (2015). Regularized estimation in sparse high-dimensional time series models. Ann. Statist. 43 1535–1567.
• [3] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
• [4] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• [5] Chen, S., Lyu, M. R., King, I. and Xu, Z. (2013). Exact and stable recovery of pairwise interaction tensors. In Advances in Neural Information Processing Systems.
• [6] Cohen, S. B. and Collins, M. (2012). Tensor decomposition for fast parsing with latent-variable PCFGS. In Advances in Neural Information Processing Systems.
• [7] Gandy, S., Recht, B. and Yamada, I. (2011). Tensor completion and low-$n$-rank tensor recovery via convex optimization. Inverse Probl. 27 025010, 19.
• [8] Gordon, Y. (1988). On Milman’s inequality and random subspaces which escape through a mesh in ${\mathbf{R}}^{n}$. In Geometric Aspects of Functional Analysis (1986/87). Lecture Notes in Math. 1317 84–106. Springer, Berlin.
• [9] Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Monographs on Statistics and Applied Probability 143. CRC Press, Boca Raton, FL.
• [10] Hoff, P. D. (2015). Multilinear tensor regression for longitudinal relational data. Ann. Appl. Stat. 9 1169–1193.
• [11] Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev. 51 455–500.
• [12] Li, N. and Li, B. (2010). Tensor completion for on-board compression of hyperspectral images. In 17th IEEE International Conference on Image Processing (ICIP) 517–520.
• [13] Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer, Berlin.
• [14] Mendelson, S. (2016). Upper bounds on product and multiplier empirical processes. Stochastic Process. Appl. 126 3652–3680.
• [15] Mesgarani, N., Slaney, M. and Shamma, S. (2006). Content-based audio classification based on multiscale spectro-temporal features. IEEE Trans. Speech Audio Process. 14 920–930.
• [16] Mu, C., Huang, B., Wright, J. and Goldfarb, D. (2014). Square deal: Lower bounds and improved relaxations for tensor recovery. In International Conference on Machine Learning.
• [17] Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665–1697.
• [18] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
• [19] Nion, D. and Sidiropoulos, N. D. (2010). Tensor algebra and multidimensional harmonic retrieval in signal processing for MIMO radar. IEEE Trans. Signal Process. 58 5693–5705.
• [20] Pisier, G. (1989). The Volume of Convex Bodies and Banach Space Geometry. Cambridge Tracts in Mathematics 94. Cambridge Univ. Press, Cambridge.
• [21] Qin, Z., Scheinberg, K. and Goldfarb, D. (2013). Efficient block-coordinate descent algorithms for the group Lasso. Math. Program. Comput. 5 143–169.
• [22] Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
• [23] Raskutti, G., Yuan, M. and Chen, H. (2019). Supplement to “Convex regularization for high-dimensional multiresponse tensor regression.” DOI:10.1214/18-AOS1725SUPP.
• [24] Rendle, S., Marinho, L. B., Nanopoulos, A. and Schmidt-Thieme, L. (2009). Learning optimal ranking with tensor factorization for tag recommendation. In SIGKDD.
• [25] Rendle, S. and Schmidt-Thieme, L. (2010). Pairwise interaction tensor factorization for personalized tag recommendation. In ICDM.
• [26] Rockafellar, R. T. (1970). Convex Analysis. Princeton Mathematical Series 28. Princeton Univ. Press, Princeton, NJ.
• [27] Semerci, O., Hao, N., Kilmer, M. E. and Miller, E. L. (2014). Tensor-based formulation and nuclear norm regularization for multienergy computed tomography. IEEE Trans. Image Process. 23 1678–1693.
• [28] Simon, N., Friedman, J. and Hastie, T. (2013). A blockwise coordinate descent algorithm for penalized multiresponse and grouped multinomial regression. Technical report, Georgia Tech.
• [29] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [30] Turlach, B. A., Venables, W. N. and Wright, S. J. (2005). Simultaneous variable selection. Technometrics 47 349–363.
• [31] van de Geer, S. (2014). Weakly decomposable regularization penalties and structured sparsity. Scand. J. Stat. 41 72–86.
• [32] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
• [33] Yuan, M. and Zhang, C.-H. (2016). On tensor completion via nuclear norm minimization. Found. Comput. Math. 16 1031–1068.
• [34] Zhou, S. (2009). Restricted eigenvalue conditions on subgaussian random matrices. Technical report, ETH, Zurich. Available at arXiv:0912.4045.

#### Supplemental materials

• Proofs. We provide all the proofs to the main theorem.