The Annals of Applied Statistics

Powerful test based on conditional effects for genome-wide screening

Yaowu Liu and Jun Xie

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

This paper considers testing procedures for screening large genome-wide data, where we examine hundreds of thousands of genetic variants, for example, single nucleotide polymorphisms (SNP), on a quantitative phenotype. We screen the whole genome by SNP sets and propose a new test that is based on conditional effects from multiple SNPs. The test statistic is developed for weak genetic effects and incorporates correlations among genetic variables, which may be very high due to linkage disequilibrium. The limiting null distribution of the test statistic and the power of the test are derived. Under appropriate conditions, the test is shown to be more powerful than the minimum $p$-value method, which is based on marginal SNP effects and is the most commonly used method in genome-wide screening. The proposed test is also compared with other existing methods, including the Higher Criticism (HC) test and the sequence kernel association test (SKAT), through simulations and analysis of a real genome data set. For typical genome-wide data, where effects of individual SNPs are weak and correlations among SNPs are high, the proposed test is more advantageous and clearly outperforms the other methods in the literature.

Article information

Source
Ann. Appl. Stat. Volume 12, Number 1 (2018), 567-585.

Dates
Received: May 2016
Revised: March 2017
First available in Project Euclid: 9 March 2018

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1520564484

Digital Object Identifier
doi:10.1214/17-AOAS1103

Keywords
Asymptotically powerful high dimensional test limiting null distribution

Citation

Liu, Yaowu; Xie, Jun. Powerful test based on conditional effects for genome-wide screening. Ann. Appl. Stat. 12 (2018), no. 1, 567--585. doi:10.1214/17-AOAS1103. https://projecteuclid.org/euclid.aoas/1520564484


Export citation

References

  • Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
  • Ballard, D. H., Cho, J. and Zhao, H. (2010). Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet. Epidemiol. 34 201–212.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.
  • Chen, B. E., Sakoda, L. C., Hsing, A. W. and Rosenberg, P. S. (2006). Resampling-based multiple hypothesis testing procedures for genetic case-control association studies. Genet. Epidemiol. 30 495–507.
  • Cui, J., Stahl, E. A., Saevarsdottir, S., Miceli, C., Diogo, D., Trynka, G., Raj, T., Mirkov, M. U., Canhao, H., Ikari, K. et al. (2013). Genome-wide association study and gene expression analysis identifies CD84 as a predictor of response to etanercept therapy in rheumatoid arthritis. PLoS Genet. 9 e1003394.
  • Dolcino, M., Ottria, A., Barbieri, A., Patuzzo, G., Tinazzi, E., Argentino, G., Beri, R., Lunardi, C. and Puccetti, A. (2015). Gene expression profiling in peripheral blood cells and synovial membranes of patients with psoriatic arthritis. PLoS ONE 10 e0128262.
  • Donoho, D. and Jin, J. (2015). Higher criticism for large-scale inference, especially for rare and weak effects. Statist. Sci. 30 1–25.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849–911.
  • Goeman, J. J., Van De Geer, S. A. and Van Houwelingen, H. C. (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 477–493.
  • Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • Ingster, Y. I., Tsybakov, A. B., Verzelen, N. et al. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
  • Lee, S., Abecasis, G. R., Boehnke, M. and Lin, X. (2014). Rare-variant association analysis: Study designs and statistical tests. Am. J. Hum. Genet. 95 5–23.
  • Leisch, F., Weingessel, A. and Hornik, K. (1998). On the generation of correlated artificial binary data.
  • Li, J. and Zhong, P. (2017). A rate optimal procedure for recovering sparse differences between high-dimensional means under dependence. Ann. Statist. 45 557–590.
  • Liu, Y. and Xie, J. (2018). Supplement to “Powerful test based on conditional effects for genome-wide screening.” DOI:10.1214/17-AOAS1103SUPP.
  • Lockhart, R., Taylor, J., Tibshirani, R. J. and Tibshirani, R. (2014). A significance test for the lasso. Ann. Statist. 42 413–468.
  • Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38 904–909.
  • Sham, P. C. and Purcell, S. M. (2014). Statistical power and significance testing in large-scale genetic studies. Nat. Rev. Genet. 15 335–346.
  • Taylor, J., Loftus, J. and Tibshirani, R. (2013). Tests in adaptive regression via the Kac-Rice formula. Preprint. Available at arXiv:1308.3020.
  • Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J. and Lin, X. (2010). Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86 929–942.
  • Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89 82–93.
  • Wu, Z., Sun, Y., He, S., Cho, J., Zhao, H. and Jin, J. (2014). Detection boundary and higher criticism approach for rare and weak genetic effects. Ann. Appl. Stat. 8 824–851.

Supplemental materials

  • Supplement to “Powerful test based on conditional effects for genome-wide screening”. The supplementary material contains (1) technical lemmas and their proofs; (2) the proofs of all theorems; (3) additional table and figures regarding simulation results under constant effect magnitude and sparsity parameter $\gamma=1/4$, simulations using real genotype data, the stability of the real data analysis result and the conservativeness of $p$-value calculation based on asymptotic null distribution.