## The Annals of Statistics

### A lava attack on the recovery of sums of dense and sparse signals

#### Abstract

Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of nonzero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small nonzero parameters. We consider a generalization of these two basic models, termed here a “sparse $+$ dense” model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation. We propose a new penalization-based method, called lava, which is computationally efficient. With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide a deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein’s unbiased estimator for lava’s prediction risk. A simulation example compares the performance of lava to lasso, ridge and elastic net in a regression example using data-dependent penalty parameters and illustrates lava’s improved performance relative to these benchmarks.

#### Article information

Source
Ann. Statist., Volume 45, Number 1 (2017), 39-76.

Dates
Revised: December 2015
First available in Project Euclid: 21 February 2017

https://projecteuclid.org/euclid.aos/1487667617

Digital Object Identifier
doi:10.1214/16-AOS1434

Mathematical Reviews number (MathSciNet)
MR3611486

Zentralblatt MATH identifier
06710505

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62J05: Linear regression

#### Citation

Chernozhukov, Victor; Hansen, Christian; Liao, Yuan. A lava attack on the recovery of sums of dense and sparse signals. Ann. Statist. 45 (2017), no. 1, 39--76. doi:10.1214/16-AOS1434. https://projecteuclid.org/euclid.aos/1487667617

#### References

• [1] Banerjee, A., Chen, S., Fazayeli, F. and Sivakumar, V. (2015). Estimation with norm regularization. Available at arXiv:1505.02294 [stat.ML].
• [2] Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521–547.
• [3] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
• [4] Belloni, A., Chernozhukov, V. and Wang, L. (2014). Pivotal estimation via square-root Lasso in nonparametric regression. Ann. Statist. 42 757–788.
• [5] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [6] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
• [7] Bunea, F., Tsybakov, A. B., Wegkamp, M. H. and Barbu, A. (2010). Spades and mixture models. Ann. Statist. 38 2525–2558.
• [8] Cai, T. T., Liang, T. and Rakhlin, A. (2016). Geometric inference for general high-dimensional linear inverse problems. Ann. Statist. 44 1536–1563.
• [9] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• [10] Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis? J. ACM 58 Art. 11, 37.
• [11] Chandrasekaran, V., Recht, B., Parrilo, P. A. and Willsky, A. S. (2012). The convex geometry of linear inverse problems. Found. Comput. Math. 12 805–849.
• [12] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A. and Willsky, A. S. (2011). Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21 572–596.
• [13] Chen, Y. and Dalalyan, A. (2012). Fused sparsity and robust estimation for linear models with unknown variance. In Advances in Neural Information Processing Systems 25 (NIPS 2012).
• [14] Chernozhukov, V., Hansen, C. and Liao, Y. (2016). Supplement to “A lava attack on the recovery of sums of dense and sparse signals.” DOI:10.1214/16-AOS1434SUPP.
• [15] Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200–1224.
• [16] Dossal, C., Kachour, M., Fadili, M. J., Peyré, G. and Chesneau, C. (2013). The degrees of freedom of the Lasso for general design matrix. Statist. Sinica 23 809–828.
• [17] Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation. J. Amer. Statist. Assoc. 99 619–642.
• [18] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
• [19] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [20] Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 603–680.
• [21] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
• [22] Frank, L. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109–135.
• [23] Hirano, K. and Porter, J. R. (2012). Impossibility results for nondifferentiable functionals. Econometrica 80 1769–1790.
• [24] Horn, R. A. and Johnson, C. R. (2012). Matrix Analysis. Cambridge University Press, Cambridge.
• [25] Hsu, D., Kakade, S. M. and Zhang, T. (2014). Random design analysis of ridge regression. Found. Comput. Math. 14 569–600.
• [26] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35 73–101.
• [27] Jalali, A., Ravikumar, P. and Sanghavi, S. (2013). A dirty model for multiple sparse regression. IEEE Trans. Inform. Theory 59 7947–7968.
• [28] Klopp, O., Lounici, K. and Tsybakov, A. B. (2014). Robust matrix completion. Preprint. Available at arXiv:1412.8132.
• [29] Loh, P. and Wainwright, M. J. (2015). Regularized $M$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16 559–616.
• [30] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
• [31] Meyer, M. and Woodroofe, M. (2000). On the degrees of freedom in shape-restricted regression. Ann. Statist. 28 1083–1104.
• [32] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
• [33] Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 19541955, Vol. i 197–206. Univ. California Press, Berkeley and Los Angeles.
• [34] Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135–1151.
• [35] Sun, T. and Zhang, C. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
• [36] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [37] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91–108.
• [38] Tibshirani, R. J. and Taylor, J. (2012). Degrees of freedom in lasso problems. Ann. Statist. 40 1198–1232.
• [39] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• [40] Yang, E. and Ravikumar, P. K. (2013). Dirty statistical models. In Advances in Neural Information Processing Systems 26 (C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Weinberger, eds.) 611–619. Curran Associates, Red Hook, New York. Available at http://papers.nips.cc/paper/5092-dirty-statistical-models.pdf.
• [41] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
• [42] Zhang, C. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
• [43] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
• [44] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.