The Annals of Applied Statistics

Semiparametric covariate-modulated local false discovery rate for genome-wide association studies

Rong W. Zablocki, Richard A. Levine, Andrew J. Schork, Shujing Xu, Yunpeng Wang, Chun C. Fan, and Wesley K. Thompson

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

While genome-wide association studies (GWAS) have discovered thousands of risk loci for heritable disorders, so far even very large meta-analyses have recovered only a fraction of the heritability of most complex traits. Recent work utilizing variance components models has demonstrated that a larger fraction of the heritability of complex phenotypes is captured by the additive effects of SNPs than is evident only in loci surpassing genome-wide significance thresholds, typically set at a Bonferroni-inspired $p\le5\times10^{-8}$. Procedures that control false discovery rate can be more powerful, yet these are still under-powered to detect the majority of nonnull effects from GWAS. The current work proposes a novel Bayesian semiparametric two-group mixture model and develops a Markov Chain Monte Carlo (MCMC) algorithm for a covariate-modulated local false discovery rate (cmfdr). The probability of being nonnull depends on a set of covariates via a logistic function, and the nonnull distribution is approximated as a linear combination of B-spline densities, where the weight of each B-spline density depends on a multinomial function of the covariates. The proposed methods were motivated by work on a large meta-analysis of schizophrenia GWAS performed by the Psychiatric Genetics Consortium (PGC). We show that the new cmfdr model fits the PGC schizophrenia GWAS test statistics well, performing better than our previously proposed parametric gamma model for estimating the nonnull density and substantially improving power over usual fdr. Using loci declared significant at cmfdr $\le0.20$, we perform follow-up pathway analyses using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Homo sapiens pathways database. We demonstrate that the increased yield from the cmfdr model results in an improved ability to test for pathways associated with schizophrenia compared to using those SNPs selected according to usual fdr.

Article information

Source
Ann. Appl. Stat. Volume 11, Number 4 (2017), 2252-2269.

Dates
Received: September 2016
Revised: June 2017
First available in Project Euclid: 28 December 2017

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1514430285

Digital Object Identifier
doi:10.1214/17-AOAS1077

Keywords
Bayesian mixture model B-spline densities genome-wide association study multiple-comparison procedures mixture of experts

Citation

Zablocki, Rong W.; Levine, Richard A.; Schork, Andrew J.; Xu, Shujing; Wang, Yunpeng; Fan, Chun C.; Thompson, Wesley K. Semiparametric covariate-modulated local false discovery rate for genome-wide association studies. Ann. Appl. Stat. 11 (2017), no. 4, 2252--2269. doi:10.1214/17-AOAS1077. https://projecteuclid.org/euclid.aoas/1514430285


Export citation

References

  • 1000 Genomes Project Consortium, Abecasis, G. R., Auton, A., Brooks, L. D., DePristo, M. A., Durbin, R. M., Handsaker, R. E., Kang, H. M., Marth, G. T. and McVean, G. A. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491 56–65.
  • Battaglino, R., Fu, J., Späte, U., Ersoy, U., Joe, M., Sedaghat, L. and Stashenko, P. (2004). Serotonin regulates osteoclast differentiation through its transporter. J. Bone Miner. Res. 19 1420–1431.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Buckley, P. F., Pillai, A. and Howell, K. R. (2011). Brain-derived neurotrophic factor: Findings in schizophrenia. Curr. Opin. Psychiatry 24 122–127.
  • Chib, S. and Jeliazkov, I. (2006). Inference in semiparametric dynamic models for binary longitudinal data. J. Amer. Statist. Assoc. 101 685–700.
  • Davies, G., Tenesa, A., Payton, A., Yang, J., Harris, S. E., Liewald, D., Ke, X., Le Hellard, S., Christoforou, A., Luciano, M. et al. (2011). Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Mol. Psychiatry 16 996–1005.
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • Efron, B. (2007). Size, power and false discovery rates. Ann. Statist. 35 1351–1377.
  • Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23 70–86.
  • Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with $B$-splines and penalties. Statist. Sci. 11 89–121. With comments and a rejoinder by the authors.
  • Ferkingstad, E., Frigessi, A., Rue, H., Thorleifsson, G. and Kong, A. (2008). Unsupervised empirical Bayesian multiple testing with external covariates. Ann. Appl. Stat. 2 714–735.
  • Gardiner, E., Beveridge, N., Wu, J., Carr, V., Scott, R., Tooney, P. and Cairns, M. (2012). Imprinted DLK1-DIO3 region of 14q32 defines a schizophrenia-associated miRNA signature in peripheral blood mononuclear cells. Mol. Psychiatry 17 827–840.
  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1 515–533.
  • Glazier, A. M., Nadeau, J. H. and Aitman, T. J. (2002). Finding genes that underlie complex traits. Science 298 2345–2349.
  • Greiner, A. and Nicolson, G. (1965). Schizophrenia-melanosis. Lancet 286 1165–1167.
  • Holmans, P., Green, E. K., Pahwa, J. S., Ferreira, M. A., Purcell, S. M., Sklar, P., Owen, M. J., O’Donovan, M. C., Craddock, N., The Wellcome Trust Case-Control Consortium et al. (2009). Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am. J. Hum. Genet. 85 13–24.
  • Hsu, F., Kent, W. J., Clawson, H., Kuhn, R. M., Diekhans, M. and Haussler, D. (2006). The UCSC known genes. Bioinformatics 22 1036–1046.
  • Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 27–30.
  • Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. and Tanabe, M. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44 D457–D462.
  • Lang, S. and Brezger, A. (2004). Bayesian P-splines. J. Comput. Graph. Statist. 13 183–212.
  • Leucht, S., Burkard, T., Henderson, J., Maj, M. and Sartorius, N. (2007). Physical illness and schizophrenia: A review of the literature. Acta Psychiatr. Scand. 116 317–333.
  • Lewinger, J. P., Conti, D. V., Baurley, J. W., Triche, T. J. and Thomas, D. C. (2007). Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genet. Epidemiol. 31 871–882.
  • Lidow, M. S. (2003). Calcium signaling dysfunction in schizophrenia: A unifying approach. Brains Res. Rev. 43 70–84.
  • Liu, Y., Li, Z., Zhang, M., Deng, Y., Yi, Z. and Shi, T. (2013). Exploring the pathogenetic association between schizophrenia and type 2 diabetes mellitus diseases based on pathway analysis. BMC Med. Genomics 6 Article ID 1.
  • Lopes, H. F. and Dias, R. (2012). Bayesian mixture of parametric and nonparametric density estimation: A misspecification problem. Braz. Rev. Econometrics 31 19–44.
  • Maiti, S., Kumar, K. H. B. G., Castellani, C. A., O’Reilly, R. and Singh, S. M. (2011). Ontogenetic de novo copy number variations (CNVs) as a source of genetic individuality: Studies on two families with MZD twins for schizophrenia. PLoS ONE 6 Article ID e17125.
  • Martin, R. and Tokdar, S. T. (2012). A nonparametric empirical Bayes framework for large-scale multiple testing. Biostatistics 13 427–439.
  • Psychiatric-Genomics-Consortium (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature 511 421–427.
  • Psychiatric-GWAS-Consortium (2011). Genome-wide association study identifies five new schizophrenia loci. Nat. Genet. 43 969–976.
  • Purcell, S. M., Wray, N. R., Stone, J. L., Visscher, P. M., O’Donovan, M. C., Sullivan, P. F., Sklar, P., Ruderfer, D. M., McQuillin, A., Morris, D. W. et al. (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460 748–752.
  • Putnam, D. K., Sun, J. and Zhao, Z. (2011). Exploring schizophrenia drug-gene interactions through molecular network and pathway modeling. In AMIA Annual Symposium Proceedings 1127–1133.
  • Reich, D. E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P. C., Richter, D. J., Lavery, T., Kouyoumjian, R., Farhadian, S. F., Ward, R. et al. (2001). Linkage disequilibrium in the human genome. Nature 411 199–204.
  • Rosen, O. and Thompson, W. K. (2015). Bayesian semiparametric copula estimation with application to psychiatric genetics. Biom. J. 57 468–484.
  • Ruppert, D. (2002). Selecting the number of knots for penalized splines. J. Comput. Graph. Statist. 11 735–757.
  • Schork, A. J., Thompson, W. K., Pham, P., Torkamani, A., Roddey, J. C., Sullivan, P. F., Kelsoe, J. R., O’Donovan, M. C., Furberg, H., Schork, N. J. et al. (2013). All SNPs are not created equal: Genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS Genet. 9 Article ID e1003449.
  • Scott, J. G., Kelly, R. C., Smith, M. A., Zhou, P. and Kass, R. E. (2015). False discovery rate regression: An application to neural synchrony detection in primary visual cortex. J. Amer. Statist. Assoc. 110 459–471.
  • Thompson, W. K. and Rosen, O. (2008). A Bayesian model for sparse functional data. Biometrics 64 54–63, 321.
  • Vehtari, A. and Gelman, A. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27 1413–1432.
  • Wand, M. P., Ormerod, J. T., Padoan, S. A. and Frührwirth, R. (2011). Mean field variational Bayes for elaborate distributions. Bayesian Anal. 6 847–900.
  • Willer, C. J., Li, Y. and Abecasis, G. R. (2010). METAL: Fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26 2190–2191.
  • Yang, J., Benyamin, B., McEvoy, B. P., Gordon, S., Henders, A. K., Nyholt, D. R., Madden, P. A., Heath, A. C., Martin, N. G., Montgomery, G. W. et al. (2010). Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42 565–569.
  • Yang, J., Bakshi, A., Zhu, Z., Hemani, G., Vinkhuyzen, A. A., Lee, S. H., Robinson, M. R., Perry, J. R., Nolte, I. M., van Vliet-Ostaptchouk, J. V. et al. (2015). Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47 1114–1120.
  • Zablocki, R. W., Levine, R. A., Schork, A. J., Xu, S., Wang, Y., Fan, C. C. and Thompson, W. K. (2017). Supplement to “Semiparametric covariate-modulated local false discovery rate for genome-wide association studies.” DOI:10.1214/17-AOAS1077SUPP.
  • Zablocki, R. W., Schork, A. J., Levine, R. A., Andreassen, O. A., Dale, A. M. and Thompson, W. K. (2014). Covariate-modulated local false discovery rate for genome-wide association studies. Bioinformatics 30 2098–2104.

Supplemental materials

  • Supplement to “Semiparametric covariate-modulated local false discovery rate for genome-wide association studies”. The supplement consists of 4 sections. Section 1 presents conditional posteriors and Gibbs sampling algorithm. Section 2 provides the full list of KEGG Homo sapiens pathways with ALIGATOR $p$-values from different models. Section 3 demonstrates identifiability of the mixture model. Section 4 shows convergence diagnosis plots of parameter estimates.