## Electronic Journal of Statistics

### Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization

#### Abstract

Least squares fitting is in general not useful for high-dimensional linear models, in which the number of predictors is of the same or even larger order of magnitude than the number of samples. Theory developed in recent years has coined a paradigm according to which sparsity-promoting regularization is regarded as a necessity in such setting. Deviating from this paradigm, we show that non-negativity constraints on the regression coefficients may be similarly effective as explicit regularization if the design matrix has additional properties, which are met in several applications of non-negative least squares (NNLS). We show that for these designs, the performance of NNLS with regard to prediction and estimation is comparable to that of the lasso. We argue further that in specific cases, NNLS may have a better $\ell_{\infty}$-rate in estimation and hence also advantages with respect to support recovery when combined with thresholding. From a practical point of view, NNLS does not depend on a regularization parameter and is hence easier to use.

#### Article information

Source
Electron. J. Statist. Volume 7 (2013), 3004-3056.

Dates
First available in Project Euclid: 13 December 2013

https://projecteuclid.org/euclid.ejs/1386943911

Digital Object Identifier
doi:10.1214/13-EJS868

Mathematical Reviews number (MathSciNet)
MR3151760

Zentralblatt MATH identifier
1280.62086

Subjects
Primary: 62J05: Linear regression
Secondary: 52B99: None of the above, but in this section

#### Citation

Slawski, Martin; Hein, Matthias. Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization. Electron. J. Statist. 7 (2013), 3004--3056. doi:10.1214/13-EJS868. https://projecteuclid.org/euclid.ejs/1386943911

#### References

• [1] Szlam, A., Guo, Z., and Osher, S. A split Bregman method for non-negative sparsity penalized least squares with applications to hyperspectral demixing. In, IEEE International Conference on Image Processing, 2010.
• [2] Bardsley, J. and Nagy, J. Covariance-preconditioned iterative methods for nonnegatively constrained astronomical imaging., SIAM Journal on Matrix Analysis and Applications, 27 :1184–1198, 2006.
• [3] Bartlett, P., Mendelson, S., and Neeman, J. $\ell_1$-regularized linear regression: Persistence and oracle inequalities., Probability Theory and Related Fields, 254:193–224, 2012.
• [4] Belloni, A., Chernozukov, V., and Wang, L. Square root lasso: Pivotal recovery of sparse signals via conic programming., Biometrika, 98:791–806, 2011.
• [5] Bickel, P., Ritov, Y., and Tsybakov, A. Simultaneous analysis of lasso and Dantzig selector., The Annals of Statistics, 37 :1705–1732, 2009.
• [6] Boyd, S. and Vandenberghe, L., Convex Optimization. Cambridge University Press, 2004.
• [7] Bruckstein, A., Elad, M., and Zibulevsky, M. On the uniqueness of nonnegative sparse solutions to underdetermined systems of equations., IEEE Transactions on Information Theory, 54 :4813–4820, 2008.
• [8] Bühlmann, P. and van de Geer, S., Statistics for High-Dimensional Data. Springer, 2011.
• [9] Bunea, F. Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization., The Electronic Journal of Statistics, 2 :1153–1194, 2008.
• [10] Bunea, F., Tsybakov, A., and Wegkamp, M. Sparsity oracle inequalities for the lasso., The Electronic Journal of Statistics, 1:169–194, 2007.
• [11] Bunea, F., Tsybakov, A., Wegkamp, M., and Barbu, A. SPADES and mixture models., The Annals of Statistics, 38 :2525–2558, 2010.
• [12] Candes, E. and Plan, Y. Near-ideal model selection by $\ell_1$-minimization., The Annals of Statistics, 37 :2145–2177, 2009.
• [13] Candes, E. and Tao, T. The Dantzig selector: Statistical estimation when $p$ is much larger than $n$., The Annals of Statistics, 35 :2313–2351, 2007.
• [14] Chen, D. and Plemmons, R. Nonnegativity constraints in numerical analysis. In, Symposium on the Birth of Numerical Analysis, 2007.
• [15] Donoho, D. Compressed sensing., IEEE Transactions on Information Theory, 52 :1289–1306, 2006
• [16] Donoho, D., Johnstone, I., Hoch, J., and Stern, A. Maximum entropy and the nearly black object., Journal of the Royal Statistical Society Series B, 54:41–81, 1992.
• [17] Donoho, D. and Tanner, J. Neighbourliness of randomly projected simplices in high dimensions., Proceedings of the National Academy of Science, 102 :9452–9457, 2005a.
• [18] Donoho, D. and Tanner, J. Sparse nonnegative solution of underdetermined linear equations by linear programming., Proceedings of the National Academy of Science, 102 :9446–9451, 2005b.
• [19] Donoho, D. and Tanner, J. Counting the faces of randomly-projected hypercubes and orthants, with applications., Discrete and Computational Geometry, 43:522–541, 2010.
• [20] Dorfman, R. The detection of defective members of large populations., The Annals of Mathematical Statistics, 14:436–440, 1943.
• [21] Du, D. and Hwang, F., Pooling Designs and Nonadaptive Group Testing. Twayne Publishers, 2006.
• [22] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. Least angle regression., The Annals of Statistics, 32:407–499, 2004.
• [23] Genovese, C., Jin, J., Wasserman, L., and Yao, Z. A comparison of the lasso and marginal regression., Journal of Machine Learning Research, 13 :2107–2143, 2012.
• [24] Greenshtein, E. and Ritov, Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization., Bernoulli, 6:971–988, 2004.
• [25] Hebiri, M. and Lederer, J. How correlations influence lasso prediction., IEEE Transactions on Information Theory, 59 :1846–1854, 2013.
• [26] Huet, S., Giraud, C., and Verzelen, N. High-dimensional regression with unknown variance., Statistical Science, 27:500–518, 2012.
• [27] Kim, D., Sra, S., and Dhillon, I. Tackling box-constrained convex optimization via a new projected quasi-Newton approach., SIAM Journal on Scientific Computing, 32 :3548–3563, 2010.
• [28] Latala, R., Mankiewicz, P., Oleskiewicz, K., and Tomczak-Jaegermann, N. Banach-Mazur distances and projections on random subgaussian polytopes., Discrete and Computational Geometry, 38:29–50, 2007.
• [29] Lawson, R. and Hanson, C., Solving Least Squares Problems. SIAM Classics in Applied Mathematics, 1987.
• [30] Li, L. and Speed, T. Parametric deconvolution of positive spike trains., The Annals of Statistics, 28 :1279–1301, 2000.
• [31] Lin, Y., Lee, D., and Saul, L. Nonnegative deconvolution for time of arrival estimation. In, ICASSP, 2004.
• [32] Litvak, A., Pajor, A., Rudelson, M., and Tomczak-Jaegermann, N. Smallest singular value of random matrices and geometry of random polytopes., Advances in Mathematics, 195:491–523, 2005.
• [33] Lounici, K. Sup-norm convergence rate and sign concentration property of lasso and Dantzig selectors., The Electronic Journal of Statistics, 2:90–102, 2008.
• [34] Meinshausen, N. Sign-constrained least squares estimation for high-dimensional regression., The Electronic Journal of Statistics, 7 :1607–1631, 2013.
• [35] Meinshausen, N. and Bühlmann, P. High-dimensional graphs and variable selection with the lasso., The Annals of Statistics, 34 :1436–1462, 2006.
• [36] Meinshausen, N. and Yu, B. Lasso-type recovery of sparse representations for high-dimensional data., The Annals of Statistics, 37:246–270, 2009.
• [37] Raskutti, G., Wainwright, M., and Yu, B. Restricted nullspace and eigenvalue properties for correlated Gaussian designs., Journal of Machine Learning Research, 11 :2241–2259, 2010.
• [38] Rudelson, M. and Vershynin, R. Non-asymptotic theory of random matrices: Extreme singular values. In, International Congress of Mathematics, 2010.
• [39] Rudelson, M. and Zhou, S. Reconstruction from anisotropic random measurements., IEEE Transactions on Information Theory, 59 :3434–3447, 2013.
• [40] Schneider, R. and Weil, W., Stochastic and Integral Geometry. Springer, 2008.
• [41] Schölkopf, B. and Smola, A., Learning with Kernels. MIT Press, Cambridge, Massachussets, 2002.
• [42] Slawski, M. and Hein, M. Sparse recovery for protein mass spectrometry data. In, NIPS workshop on practical applications of sparse modelling, 2010.
• [43] Slawski, M. and Hein, M. Sparse recovery by thresholded non-negative least squares. In, Advances in Neural Information Processing Systems 24, pages 1926–1934, 2011.
• [44] Slawski, M. and Hein, M. Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization. arXiv :1205.0953v1, 2012.
• [45] Slawski, M. and Hein, M. Supplementary material to ‘Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization’., http://www.ml.uni-saarland.de/people/slawski/nnls_ejs_supplement.pdf, 2013.
• [46] Sun, T. and Zhang, C.-H. Scaled sparse linear regression., Biometrika, 99:879–898, 2012.
• [47] Tibshirani, R. Regression shrinkage and variable selection via the lasso., Journal of the Royal Statistical Society Series B, 58:671–686, 1996.
• [48] Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective (with discussion)., Journal of the Royal Statistical Society Series B, 73:273–282, 2011.
• [49] Tropp, J. Greed is good: Algorithmic results for sparse approximation., IEEE Transactions on Information Theory, 50 :2231–2242, 2004.
• [50] Tsybakov, A. Discussion of ‘Stability Section’ by N. Meinshausen and P. Bühlmann., Journal of the Royal Statistical Society Series B, 72:467, 2010.
• [51] van de Geer, S. High-dimensional generalized linear models and the lasso., The Annals of Statistics, 36:614–645, 2008.
• [52] van de Geer, S. and Bühlmann, P. On the conditions used to prove oracle results for the lasso., The Electronic Journal of Statistics, 3 :1360–1392, 2009.
• [53] van de Geer, S. and Lederer, J., From Probability to Statistics and Back: High-Dimensional Models and Processes. A Festschrift in Honor of Jon Wellner, chapter ‘The lasso, correlated design, and improved oracle inequalities’. IMS Collections. 2012.
• [54] Wainwright, M. Sharp thresholds for noisy and high-dimensional recovery of sparsity using $\ell_1$-constrained quadratic programming (lasso)., IEEE Transactions on Information Theory, 55 :2183–2202, 2009.
• [55] Wang, M. and Tang, A. Conditions for a unique non-negative solution to an underdetermined system. In, Proceedings of Allerton Conference on Communication, Control, and Computing, 2009.
• [56] Wang, M., Xu, W., and Tang, A. A unique nonnegative solution to an undetermined system: From vectors to matrices., IEEE Transactions on Signal Processing, 59 :1007–1016, 2011.
• [57] Wendel, J. G. A problem in geometric probability., Mathematics Scandinavia, 11:109–111, 1962.
• [58] Zhang, C.-H. and Huang, J. The sparsity and bias of the lasso selection in high-dimensional linear regression., The Annals of Statistics, 36 :1567–1594, 2008.
• [59] Zhang, T. On the consistency of feature selection using greedy least squares regression., Journal of Machine Learning Research, 10:555–568, 2009a.
• [60] Zhang, T. Some sharp performance bounds for least squares regression with $L_1$ regularization., The Annals of Statistics, 37 :2109–2144, 2009b.
• [61] Zhao, P. and Yu, B. On model selection consistency of the lasso., Journal of Machine Learning Research, 7 :2541–2567, 2006.
• [62] Zhao, Y.-B. Sparsest and least $\ell_1$-norm nonnegative solutions to linear systems with applications to compressive sensing and linear programs. Technical report, University of Birmingham, 2012.
• [63] Zhou, S. Thresholding procedures for high dimensional variable selection and statistical estimation. In, NIPS, 2009.
• [64] Zou, H. The adaptive lasso and its oracle properties., Journal of the American Statistical Association, 101 :1418–1429, 2006.