Bernoulli

  • Bernoulli
  • Volume 24, Number 4B (2018), 3683-3710.

Adaptive estimation of high-dimensional signal-to-noise ratios

Nicolas Verzelen and Elisabeth Gassiat

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We consider the equivalent problems of estimating the residual variance, the proportion of explained variance $\eta$ and the signal strength in a high-dimensional linear regression model with Gaussian random design. Our aim is to understand the impact of not knowing the sparsity of the vector of regression coefficients and not knowing the distribution of the design on minimax estimation rates of $\eta$. Depending on the sparsity $k$ of the vector regression coefficients, optimal estimators of $\eta$ either rely on estimating the vector of regression coefficients or are based on $U$-type statistics. In the important situation where $k$ is unknown, we build an adaptive procedure whose convergence rate simultaneously achieves the minimax risk over all $k$ up to a logarithmic loss which we prove to be non avoidable. Finally, the knowledge of the design distribution is shown to play a critical role. When the distribution of the design is unknown, consistent estimation of explained variance is indeed possible in much narrower regimes than for known design distribution.

Article information

Source
Bernoulli, Volume 24, Number 4B (2018), 3683-3710.

Dates
Received: March 2017
Revised: July 2017
First available in Project Euclid: 18 April 2018

Permanent link to this document
https://projecteuclid.org/euclid.bj/1524038767

Digital Object Identifier
doi:10.3150/17-BEJ975

Mathematical Reviews number (MathSciNet)
MR3788186

Zentralblatt MATH identifier
06869889

Keywords
heritability minimax analysis quadratic functional signal to noise ratio

Citation

Verzelen, Nicolas; Gassiat, Elisabeth. Adaptive estimation of high-dimensional signal-to-noise ratios. Bernoulli 24 (2018), no. 4B, 3683--3710. doi:10.3150/17-BEJ975. https://projecteuclid.org/euclid.bj/1524038767


Export citation

References

  • [1] Adamczak, R. and Wolff, P. (2015). Concentration inequalities for non-Lipschitz functions with bounded derivatives of higher order. Probab. Theory Related Fields 162 531–586.
  • [2] Amaral, D.G., Schumann, C.M. and Nordahl, C.W. (2008). Neuroanatomy of autism. Trends Neurosci. 31 137–145.
  • [3] Arias-Castro, E., Candès, E.J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
  • [4] Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
  • [5] Bayati, M., Erdogdu, M.A. and Montanari, A. (2013). Estimating lasso risk and noise level. In Advances in Neural Information Processing Systems 944–952.
  • [6] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
  • [7] Bonnet, A., Gassiat, E. and Lévy-Leduc, C. (2015). Heritability estimation in high dimensional sparse linear mixed models. Electron. J. Stat. 9 2099–2129.
  • [8] Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
  • [9] Cai, T.T. and Guo, Z. (2017). Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Statist. 45 615–646.
  • [10] Cai, T.T. and Low, M.G. (2005). Nonquadratic estimators of a quadratic functional. Ann. Statist. 33 2930–2956.
  • [11] Cai, T.T. and Low, M.G. (2011). Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional. Ann. Statist. 39 1012–1041.
  • [12] Collier, O., Comminges, L. and Tsybakov, A.B. (2017). Minimax estimation of linear and quadratic functionals on sparsity classes. Ann. Statist. 45 923–958.
  • [13] Dicker, L.H. (2014). Variance estimation in high-dimensional linear models. Biometrika 101 269–284.
  • [14] Dicker, L.H. and Erdogdu, M.A. (2016). Maximum likelihood for variance estimation in high-dimensional linear models. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics 159–167.
  • [15] Donoho, D.L. and Nussbaum, M. (1990). Minimax quadratic estimation of a quadratic functional. J. Complexity 6 290–323.
  • [16] Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 37–65.
  • [17] Goldstein, D.B. (2009). Common genetic variation and human traits. N. Engl. J. Med. 360 1696–1698.
  • [18] Guo, Z., Wang, W., Cai, T. and Li, H. (2016). Optimal estimation of co-heritability in high-dimensional linear models. arXiv preprint, arXiv:1605.07244.
  • [19] Ingster, Yu.I. and Suslina, I.A. (2003). Nonparametric Goodness-of-Fit Testing Under Gaussian Models. Lecture Notes in Statistics 169. New York: Springer.
  • [20] Ingster, Y.I., Tsybakov, A.B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
  • [21] Janson, L., Barber, R.F. and Candès, E. (2015). Eigenprism: Inference for high-dimensional signal-to-noise ratios. arXiv preprint, arXiv:1505.02097.
  • [22] Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15 2869–2909.
  • [23] Javanmard, A. and Montanari, A. (2015). De-biasing the lasso: Optimal sample size for Gaussian designs. arXiv preprint, arXiv:1508.02757.
  • [24] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302–1338.
  • [25] Maher, B. (2008). Personal genomes: The case of the missing heritability. Nature 456 18–21.
  • [26] Nickl, R. and van de Geer, S. (2013). Confidence sets in sparse regression. Ann. Statist. 41 2852–2876.
  • [27] Steen, R.G., Mull, C., Mcclure, R., Hamer, R.M. and Lieberman, J.A. (2006). Brain volume in first-episode schizophrenia. Br. J. Psychiatry 188 510–518.
  • [28] Stein, J.L., Medland, S.E., Vasquez, A.A., Hibar, D.P., Senstad, R.E., Winkler, A.M., Toro, R., Appel, K., Bartecek, R. and Bergmann, Ø. (2012). Identification of common variants associated with human hippocampal and intracranial volumes. Nat. Genet. 44 552–561.
  • [29] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
  • [30] Toro, R., Poline, J.-B., Huguet, G., Loth, E., Frouin, V., Banaschewski, T., Barker, G.J., Bokde, A., Büchel, C., Carvalho, F., Conrod, P., Fauth-Bühler, M., Flor, H., Gallinat, J., Garavan, H., Gowloan, P., Heinz, A., Ittermann, B., Lawrence, C., Lemaître, H., Mann, K., Nees, F., Paus, T., Pausova, Z., Rietschel, M., Robbins, T., Smolka, M., Ströhle, A., Schumann, G. and Bourgeron, T. (2015). Genomic architecture of human neuroanatomical diversity. Mol. Psychiatry 20 1011–1016.
  • [31] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist. 42 1166–1202.
  • [32] Verzelen, N. and Gassiat, E. (2016). Adaptive estimation of high-dimensional signal-to-noise ratios (version 1). arXiv preprint, arXiv:1602.08006v1.
  • [33] Verzelen, N. and Gassiat, E. (2017). Supplement to “Adaptive estimation of high-dimensional signal-to-noise ratios.” DOI:10.3150/17-BEJ975SUPP.
  • [34] Verzelen, N. and Villers, F. (2010). Goodness-of-fit tests for high-dimensional Gaussian linear models. Ann. Statist. 38 704–752.
  • [35] Zhang, C.-H. and Zhang, S.S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.

Supplemental materials

  • Supplement to “Adaptive estimation of high-dimensional signal-to-noise ratios”. This supplement contains the remaining proofs.