Abstract
Many regularization schemes for high-dimensional regression have been put forward. Most require the choice of a tuning parameter, using model selection criteria or cross-validation. We show that a simple sign-constrained least squares estimation is a very simple and effective regularization technique for a certain class of high-dimensional regression problems. The sign constraint has to be derived via prior knowledge or an initial estimator. The success depends on conditions that are easy to check in practice. A sufficient condition for our results is that most variables with the same sign constraint are positively correlated. For a sparse optimal predictor, a non-asymptotic bound on the $\ell_{1}$-error of the regression coefficients is then proven. Without using any further regularization, the regression vector can be estimated consistently as long as $s^{2}\log(p)/n\rightarrow 0$ for $n\rightarrow\infty$, where $s$ is the sparsity of the optimal regression vector, $p$ the number of variables and $n$ sample size. The bounds are almost as tight as similar bounds for the Lasso for strongly correlated design despite the fact that the method does not have a tuning parameter and does not require cross-validation. Network tomography is shown to be an application where the necessary conditions for success of sign-constrained least squares are naturally fulfilled and empirical results confirm the effectiveness of the sign constraint for sparse recovery if predictor variables are strongly correlated.
Citation
Nicolai Meinshausen. "Sign-constrained least squares estimation for high-dimensional regression." Electron. J. Statist. 7 1607 - 1631, 2013. https://doi.org/10.1214/13-EJS818
Information