The Annals of Statistics

False discoveries occur early on the Lasso path

Weijie Su, Małgorzata Bogdan, and Emmanuel Candès

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity—meaning that the fraction of variables with a nonvanishing effect tends to a constant, however small—this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are. We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-off states that if we ever want to achieve a type II error (false negative rate) under a critical value, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter.

Article information

Source
Ann. Statist., Volume 45, Number 5 (2017), 2133-2150.

Dates
Received: June 2016
Revised: September 2016
First available in Project Euclid: 31 October 2017

Permanent link to this document
https://projecteuclid.org/euclid.aos/1509436830

Digital Object Identifier
doi:10.1214/16-AOS1521

Mathematical Reviews number (MathSciNet)
MR3718164

Zentralblatt MATH identifier
06821121

Subjects
Primary: 62F03: Hypothesis testing
Secondary: 62J07: Ridge regression; shrinkage estimators 62J05: Linear regression

Keywords
Lasso Lasso path false discovery rate false negative rate power approximate message passing (AMP) adaptive selection of parameters

Citation

Su, Weijie; Bogdan, Małgorzata; Candès, Emmanuel. False discoveries occur early on the Lasso path. Ann. Statist. 45 (2017), no. 5, 2133--2150. doi:10.1214/16-AOS1521. https://projecteuclid.org/euclid.aos/1509436830


Export citation

References

  • [1] Barber, R. F. and Candès, E. J. (2016). A knockoff filter for high-dimensional selective inference. Preprint. Available at arXiv:1602.03574.
  • [2] Bayati, M., Lelarge, M. and Montanari, A. (2015). Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 25 753–822.
  • [3] Bayati, M. and Montanari, A. (2011). The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inform. Theory 57 764–785.
  • [4] Bayati, M. and Montanari, A. (2012). The LASSO risk for Gaussian matrices. IEEE Trans. Inform. Theory 58 1997–2017.
  • [5] Berk, R., Lawrence, B., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837.
  • [6] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 1705–1732.
  • [7] Bühlmann, P. (2011). Invited discussion on “Regression shrinkage and selection via the Lasso: A retrospective (R. Tibshirani)”. J. R. Stat. Soc. Ser. B. 73 277–279.
  • [8] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer, Heidelberg.
  • [9] Candès, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
  • [10] Deshpande, Y., Abbe, E. and Montanari, A. (2015). Asymptotic mutual information for the two-groups stochastic block model. Preprint. Available at arXiv:1507.08685.
  • [11] Deshpande, Y. and Montanari, A. (2014). Information-theoretically optimal sparse PCA. In IEEE International Symposium on Information Theory (ISIT) 2197–2201.
  • [12] Donoho, D. and Tanner, J. (2009). Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4273–4293.
  • [13] Donoho, D. L., Maleki, A. and Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 106 18914–18919.
  • [14] Donoho, D. L. and Montanari, A. (2013). High dimensional robust M-estimation: Asymptotic variance via approximate message passing. Preprint. Available at arXiv:1310.7320.
  • [15] Donoho, D. L. and Montanari, A. (2015). Variance breakdown of Huber (M)-estimators. $n/p\rightarrow m\in(1,\infty)$. Preprint. Available at arXiv:1503.02106.
  • [16] Eberlin, L. S. et al. (2014). Molecular assessment of surgical-resection margins of gastric cancer by mass-spectrometric imaging. Proc. Natl. Acad. Sci. USA 111 2436–2441.
  • [17] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • [18] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [19] Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. Ann. Statist. 38 3567–3604.
  • [20] G’Sell, M., Hastie, T. and Tibshirani, R. (2013). False variable selection rates in regression. Preprint. Available at arXiv:1302.2303.
  • [21] Ji, P. and Zhao, Z. (2014). Rate optimal multiple testing procedure in high-dimensional regression. Preprint. Available at arXiv:1404.2961.
  • [22] Lee, J. D., Sun, D. L., Sun, Y. and Taylor, J. E. (2016). Exact post-selection inference, with application to the Lasso. Ann. Statist. 44 802–837.
  • [23] Lee, W. et al. (2007). A high-resolution atlas of nucleosome occupancy in yeast. Nat. Genet. 39 1235–1244.
  • [24] Maleki, A., Anitori, L., Yang, Z. and Baraniuk, R. (2013). Asymptotic analysis of complex LASSO via complex approximate message passing (CAMP). IEEE Trans. Inform. Theory 59 4290–4308.
  • [25] Montanari, A. and Richard, E. (2014). Non-negative principal component analysis: Message passing algorithms and sharp asymptotics. Preprint. Available at arXiv:1406.4775.
  • [26] Mousavi, A., Maleki, A. and Baraniuk, R. G. (2013). Asymptotic analysis of LASSO’s solution path with implications for approximate message passing. Preprint. Available at arXiv:1309.5979.
  • [27] Paolo, F. S., Fricker, H. A. and Padman, L. (2015). Volume loss from Antarctic ice shelves is accelerating. Science 348 327–331.
  • [28] Pokarowski, P. and Mielniczuk, J. (2015). Combined $\ell_{1}$ and greedy $\ell_{0}$ penalized least squares for linear model selection. J. Mach. Learn. Res. 16 961–992.
  • [29] Reeves, G. and Gastpar, M. C. (2013). Approximate sparsity pattern recovery: Information-theoretic lower bounds. IEEE Trans. Inform. Theory 59 3451–3465.
  • [30] Santosa, F. and Symes, W. W. (1986). Linear inversion of band-limited reflection seismograms. SIAM J. Sci. Statist. Comput. 7 1307–1330.
  • [31] Su, W., Bogdan, M. and Candès, E. (2017). Supplement to “False discoveries occur early on the Lasso path.” DOI:10.1214/16-AOS1521SUPP.
  • [32] Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [33] Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann. Statist. 39 1335–1371.
  • [34] Tibshirani, R. J., Taylor, J., Lockhart, R. and Tibshirani, R. (2014). Exact post-selection inference for sequential regression procedures. Preprint. Available at arXiv:1401.3889.
  • [35] Wainwright, M. J. (2009). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55 5728–5741.
  • [36] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
  • [37] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. A 68 49–67.
  • [38] Yuan, Y. et al. (2014). Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol. 32 644–652.
  • [39] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • [40] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • [41] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301–320.

Supplemental materials

  • Supplement to “False discoveries occur early on the Lasso path”. The supplementary materials contain proofs of some technical results in this paper.