The Annals of Statistics

Debiasing the lasso: Optimal sample size for Gaussian designs

Adel Javanmard and Andrea Montanari

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Performing statistical inference in high-dimensional models is challenging because of the lack of precise information on the distribution of high-dimensional regularized estimators.

Here, we consider linear regression in the high-dimensional regime $p>>n$ and the Lasso estimator: we would like to perform inference on the parameter vector $\theta^{*}\in\mathbb{R}^{p}$. Important progress has been achieved in computing confidence intervals and $p$-values for single coordinates $\theta^{*}_{i}$, $i\in\{1,\dots,p\}$. A key role in these new inferential methods is played by a certain debiased estimator $\widehat{\theta}^{\mathrm{d}}$. Earlier work establishes that, under suitable assumptions on the design matrix, the coordinates of $\widehat{\theta}^{\mathrm{d}}$ are asymptotically Gaussian provided the true parameters vector $\theta^{*}$ is $s_{0}$-sparse with $s_{0}=o(\sqrt{n}/\log p)$.

The condition $s_{0}=o(\sqrt{n}/\log p)$ is considerably stronger than the one for consistent estimation, namely $s_{0}=o(n/\log p)$. In this paper, we consider Gaussian designs with known or unknown population covariance. When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_{0}=o(n/(\log p)^{2})$.

The same conclusion holds if the population covariance is unknown but can be estimated sufficiently well. For intermediate regimes, we describe the trade-off between sparsity in the coefficients $\theta^{*}$, and sparsity in the inverse covariance of the design. We further discuss several applications of our results beyond high-dimensional inference. In particular, we propose a thresholded Lasso estimator that is minimax optimal up to a factor $1+o_{n}(1)$ for i.i.d. Gaussian designs.

Article information

Source
Ann. Statist., Volume 46, Number 6A (2018), 2593-2622.

Dates
Received: June 2016
Revised: August 2017
First available in Project Euclid: 7 September 2018

Permanent link to this document
https://projecteuclid.org/euclid.aos/1536307227

Digital Object Identifier
doi:10.1214/17-AOS1630

Mathematical Reviews number (MathSciNet)
MR3851749

Zentralblatt MATH identifier
06968593

Subjects
Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators
Secondary: 62F12: Asymptotic properties of estimators

Keywords
Lasso high-dimensional regression confidence intervals hypothesis testing bias and variance sample size

Citation

Javanmard, Adel; Montanari, Andrea. Debiasing the lasso: Optimal sample size for Gaussian designs. Ann. Statist. 46 (2018), no. 6A, 2593--2622. doi:10.1214/17-AOS1630. https://projecteuclid.org/euclid.aos/1536307227


Export citation

References

  • [1] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control AC-19 716–723. System identification and time-series analysis.
  • [2] Barber, R. F. and Candès, E. J. (2015). Controlling the false discovery rate via knockoffs. Ann. Statist. 43 2055–2085.
  • [3] Bayati, M., Erdogdu, M. A. and Montanari, A. (2013). Estimating lasso risk and noise level. In Advances in Neural Information Processing Systems 944–952.
  • [4] Bayati, M., Lelarge, M. and Montanari, A. (2015). Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 25 753–822.
  • [5] Bayati, M. and Montanari, A. (2011). The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inform. Theory 57 764–785.
  • [6] Bayati, M. and Montanari, A. (2012). The Lasso risk for Gaussian matrices. IEEE Trans. Inform. Theory 58 1997–2017.
  • [7] Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521–547.
  • [8] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [9] Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212–1242.
  • [10] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Springer, Heidelberg.
  • [11] Bühlmann, P. and van de Geer, S. (2015). High-dimensional inference in misspecified linear models. Electron. J. Stat. 9 1449–1473.
  • [12] Cai, T. T. and Guo, Z. (2016). Accuracy assessment for high-dimensional linear regression. Available at arXiv:1603.03474.
  • [13] Cai, T. T., Guo, Z. et al. (2017). Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Statist. 45 615–646.
  • [14] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • [15] Candès, E. J., Romberg, J. K. and Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59 1207–1223.
  • [16] Chapelle, O., Schölkopf, B. and Zien, A. (2006). Semi-Supervised Learning. MIT Press, Cambridge, MA.
  • [17] Chen, M., Ren, Z., Zhao, H. and Zhou, H. (2016). Asymptotically normal and efficient estimation of covariate-adjusted Gaussian graphical model. J. Amer. Statist. Assoc. 111 394–406.
  • [18] Chen, S. S. and Donoho, D. L. (1995). Examples of basis pursuit. In Proceedings of Wavelet Applications in Signal and Image Processing III.
  • [19] Chernozhukov, V., Hansen, C. and Spindler, M. (2015). Valid post-selection and post-regularization inference: An elementary, general approach. Ann. Rev. Econ. 7 649–688.
  • [20] Dezeure, R., Bühlmann, P., Meier, L., Meinshausen, N. et al. (2015). High-dimensional inference: Confidence intervals, $p$-values and R-software hdi. Statist. Sci. 30 533–558.
  • [21] Dicker, L. H. (2012). Residual variance and the signal-to-noise ratio in high-dimensional linear models. Available at arXiv:1209.0012.
  • [22] Donoho, D. L., Elad, M. and Temlyakov, V. N. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
  • [23] Donoho, D. L. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845–2862.
  • [24] Donoho, D. L., Johnstone, I. and Montanari, A. (2013). Accurate prediction of phase transitions in compressed sensing via a connection to minimax denoising. IEEE Trans. Inform. Theory 59 3396–3433.
  • [25] Donoho, D. L., Maleki, A. and Montanari, A. (2009). Message passing algorithms for compressed sensing. Proc. Natl. Acad. Sci. USA 106 18914–18919.
  • [26] Donoho, D. L., Maleki, A. and Montanari, A. (2011). The noise sensitivity phase transition in compressed sensing. IEEE Trans. Inform. Theory 57 6920–6941.
  • [27] Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation. J. Amer. Statist. Assoc. 99 619–642.
  • [28] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [29] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849–911.
  • [30] Fan, J., Samworth, R. and Wu, Y. (2009). Ultrahigh dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 10 2013–2038.
  • [31] Fithian, W., Sun, D. and Taylor, J. (2014). Optimal inference after model selection. Available at arXiv:1410.2597.
  • [32] Janková, J. and van de Geer, S. (2015). Confidence intervals for high-dimensional inverse covariance estimation. Electron. J. Stat. 9 1205–1229.
  • [33] Janková, J. and van de Geer, S. (2017). Honest confidence regions and optimality in high-dimensional precision matrix estimation. TEST 26 143–162.
  • [34] Janson, L., Foygel Barber, R. and Candès, E. (2017). EigenPrism: Inference for high dimensional signal-to-noise ratios. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 1037–1065.
  • [35] Janson, L., Su, W. et al. (2016). Familywise error rate control via knockoffs. Electron. J. Stat. 10 960–975.
  • [36] Javanmard, A. and Lee, J. D. (2017). A Flexible Framework for Hypothesis Testing in High-dimensions. Available at arXiv:1704.07971.
  • [37] Javanmard, A. and Montanari, A. (2013). Nearly optimal sample size in hypothesis testing for high-dimensional regression. In 51st Annual Allerton Conference 1427–1434.
  • [38] Javanmard, A. and Montanari, A. (2014). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. IEEE Trans. Inform. Theory 60 6522–6554.
  • [39] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 2869–2909.
  • [40] Javanmard, A. and Montanari, A. (2018). Supplement to “Debiasing the Lasso: Optimal sample size for Gaussian designs.” DOI:10.1214/17-AOS1630SUPP.
  • [41] Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the Lasso. Ann. Statist. 42 413–468.
  • [42] Mallows, C. L. (1973). Some comments on $C_{p}$. Technometrics 15 661–675.
  • [43] Meinshausen, N. (2014). Group bound: Confidence intervals for groups of variables in sparse high dimensional regression without assumptions on the design. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77 923–945.
  • [44] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [45] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
  • [46] Obuchi, T. and Kabashima, Y. (2016). Cross validation in LASSO and its acceleration. J. Stat. Mech. Theory Exp. 2016 53304–53339.
  • [47] Reid, S., Tibshirani, R. and Friedman, J. (2013). A study of error variance estimation in Lasso regression. Available at arXiv:1311.5274.
  • [48] Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements. IEEE Trans. Inform. Theory 59 3434–3447.
  • [49] Städler, N., Bühlmann, P. and van de Geer, S. (2010). $\ell_{1}$-penalization for mixture regression models. TEST 19 209–256.
  • [50] Su, W., Candes, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist. 44 1038–1068.
  • [51] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
  • [52] Taylor, J., Lockhart, R., Tibshirani, R. J. and Tibshirani, R. (2014). Exact post-selection inference for forward stepwise and least angle regression. Available at arXiv:1401.3889.
  • [53] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [54] Van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
  • [55] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178–2201.
  • [56] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls. J. Mach. Learn. Res. 11 3519–3540.
  • [57] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • [58] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [59] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.
  • [60] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173–2192.

Supplemental materials

  • Supplement to “Debiasing the Lasso: Optimal Sample Size for Gaussian Designs”. Due to space constraints, proof of theorems and some of the technical details as well as additional numerical studies are provided in the Supplementary Material [40].