The Annals of Statistics

Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity

Abstract

Confidence sets play a fundamental role in statistical inference. In this paper, we consider confidence intervals for high-dimensional linear regression with random design. We first establish the convergence rates of the minimax expected length for confidence intervals in the oracle setting where the sparsity parameter is given. The focus is then on the problem of adaptation to sparsity for the construction of confidence intervals. Ideally, an adaptive confidence interval should have its length automatically adjusted to the sparsity of the unknown regression vector, while maintaining a pre-specified coverage probability. It is shown that such a goal is in general not attainable, except when the sparsity parameter is restricted to a small region over which the confidence intervals have the optimal length of the usual parametric rate. It is further demonstrated that the lack of adaptivity is not due to the conservativeness of the minimax framework, but is fundamentally caused by the difficulty of learning the bias accurately.

Article information

Source
Ann. Statist., Volume 45, Number 2 (2017), 615-646.

Dates
Revised: February 2016
First available in Project Euclid: 16 May 2017

https://projecteuclid.org/euclid.aos/1494921952

Digital Object Identifier
doi:10.1214/16-AOS1461

Mathematical Reviews number (MathSciNet)
MR3650395

Zentralblatt MATH identifier
1371.62045

Subjects
Primary: 62G15: Tolerance and confidence regions
Secondary: 62C20: Minimax procedures 62H35: Image analysis

Citation

Cai, T. Tony; Guo, Zijian. Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Statist. 45 (2017), no. 2, 615--646. doi:10.1214/16-AOS1461. https://projecteuclid.org/euclid.aos/1494921952

References

• [1] Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
• [2] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
• [3] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [4] Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106 672–684.
• [5] Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• [6] Cai, T. T. and Guo, Z. (2015). Supplement to “Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity.” DOI:10.1214/16-AOS1461SUPP.
• [7] Cai, T. T., Low, M. and Ma, Z. (2014). Adaptive confidence bands for nonparametric regression functions. J. Amer. Statist. Assoc. 109 1054–1070.
• [8] Cai, T. T. and Low, M. G. (2004). Minimax estimation of linear functionals over nonconvex parameter spaces. Ann. Statist. 32 552–576.
• [9] Cai, T. T. and Low, M. G. (2004). An adaptation theory for nonparametric confidence intervals. Ann. Statist. 32 1805–1840.
• [10] Cai, T. T. and Low, M. G. (2006). Adaptive confidence balls. Ann. Statist. 34 202–228.
• [11] Cai, T. T. and Zhou, H. H. (2012). Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist. 40 2389–2420.
• [12] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• [13] Collier, O., Comminges, L. and Tsybakov, A. B. (2015). Minimax estimation of linear and quadratic functionals on sparsity classes. Preprint. Available at arXiv:1502.00665.
• [14] Hoffmann, M. and Nickl, R. (2011). On adaptive inference and confidence bands. Ann. Statist. 39 2383–2409.
• [15] Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
• [16] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 2869–2909.
• [17] Javanmard, A. and Montanari, A. (2014). Hypothesis testing in high-dimensional regression under the Gaussian random design model: Asymptotic theory. IEEE Trans. Inform. Theory 60 6522–6554.
• [18] Javanmard, A. and Montanari, A. (2015). De-biasing the Lasso: Optimal sample size for Gaussian designs. Preprint. Available at arXiv:1508.02757.
• [19] Nickl, R. and van de Geer, S. (2013). Confidence sets in sparse regression. Ann. Statist. 41 2852–2876.
• [20] Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
• [21] Ren, Z., Sun, T., Zhang, C.-H. and Zhou, H. H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43 991–1026.
• [22] Robins, J. and van der Vaart, A. (2006). Adaptive nonparametric confidence sets. Ann. Statist. 34 229–253.
• [23] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
• [24] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
• [25] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
• [26] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 38–90.
• [27] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.

Supplemental materials

• Supplement to “Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity”. Detailed proofs of the adaptivity lower bound and minimax upper bound for confidence intervals of the linear functional $\xi^{\intercal}\beta$ with a dense loading $\xi$ are given. The minimax rates and adaptivity of confidence intervals of the linear functional $\xi^{\intercal}\beta$ are established when there is prior knowledge that $\Omega=\mathrm{I}$ and $\sigma=\sigma_{0}$. Extra propositions and technical lemmas are also proved in the supplement.