The Annals of Applied Statistics

Molecular QTL discovery incorporating genomic annotations using Bayesian false discovery rate control

Xiaoquan Wen

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Mapping molecular QTLs has emerged as an important tool for understanding the genetic basis of cell functions. With the increasing availability of functional genomic data, it is natural to incorporate genomic annotations into QTL discovery. Discovering molecular QTLs is typically framed as a multiple hypothesis testing problem and solved using false discovery rate (FDR) control procedures. Currently, most existing statistical approaches rely on obtaining $p$-values for each candidate locus through permutation-based schemes, which are not only inconvenient for incorporating highly informative genomic annotations but also computationally inefficient. In this paper, we discuss a novel statistical approach for integrative QTL discovery based on the theoretical framework of Bayesian FDR control. We use a Bayesian hierarchical model to naturally integrate genomic annotations into molecular QTL mapping and propose an empirical Bayes-based computational procedure to approximate the necessary posterior probabilities to achieve high computational efficiency. Through theoretical arguments and simulation studies, we demonstrate that the proposed approach rigorously controls the desired type I error rate and greatly improves the power of QTL discovery when incorporating informative annotations. Finally, we demonstrate our approach by analyzing the expression-genotype data from 44 human tissues generated by the GTEx project. By integrating the simple annotation of SNP distance to transcription start sites, we discover more genes that harbor expression-associated SNPs in all 44 tissues, with an average increase of 1485 genes per tissue.

Article information

Source
Ann. Appl. Stat. Volume 10, Number 3 (2016), 1619-1638.

Dates
Received: February 2016
Revised: June 2016
First available in Project Euclid: 28 September 2016

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1475069621

Digital Object Identifier
doi:10.1214/16-AOAS952

Mathematical Reviews number (MathSciNet)
MR3553238

Keywords
Molecular QTL genomic annotations Bayesian FDR control QTL mapping

Citation

Wen, Xiaoquan. Molecular QTL discovery incorporating genomic annotations using Bayesian false discovery rate control. Ann. Appl. Stat. 10 (2016), no. 3, 1619--1638. doi:10.1214/16-AOAS952. https://projecteuclid.org/euclid.aoas/1475069621.


Export citation

References

  • Ardlie, K. G., Deluca, D. S., Segrè, A. V., Sullivan, T. J., Young, T. R., Gelfand, E. T., Trowbridge, C. A., Maller, J. B., Tukiainen, T., Lek, M. et al. (2015). The genotype-tissue expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science 348 648–660.
  • Ball, R. D. (2001). Bayesian methods for quantitative trait loci mapping based on model selection: Approximate analysis using the Bayesian information criterion. Genetics 159 1351–1364.
  • Banovich, N. E., Lan, X., McVicker, G., van de Geijn, B., Degner, J. F., Blischak, J. D., Roux, J., Pritchard, J. K. and Gilad, Y. (2014). Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels. PLoS Genet. 10 e1004663.
  • Barreiro, L. B., Tailleux, L., Pai, A. A., Gicquel, B., Marioni, J. C. and Gilad, Y. (2012). Deciphering the genetic architecture of variation in the immune response to mycobacterium tuberculosis infection. Proc. Natl. Acad. Sci. USA 109 1204–1209.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289–300.
  • Berisa, T. and Pickrell, J. K. (2016). Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32 283–285.
  • Breitling, R., Li, Y., Tesson, B. M., Fu, J., Wu, C., Wiltshire, T., Gerrits, A., Bystrykh, L. V., De Haan, G., Su, A. I. et al. (2008). Genetical genomics: Spotlight on QTL hotspots. PLoS Genet. 4 e1000232.
  • Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Amer. Statist. Assoc. 103 1438–1456.
  • Churchill, G. A. and Doerge, R. W. (1994). Empirical threshold values for quantitative trait mapping Genetics 138 963–971.
  • Degner, J. F., Pai, A. A., Pique-Regi, R., Veyrieras, J.-B., Gaffney, D. J., Pickrell, J. K., De Leon, S., Michelini, K., Lewellen, N., Crawford, G. E. et al. (2012). DNase [thinsp] I sensitivity QTLs are a major determinant of human expression variation. Nature 482 390–394.
  • De la Cruz, O., Wen, X., Ke, B., Song, M. and Nicolae, D. L. (2010). Gene, region and pathway level analyses in whole-genome studies. Genet. Epidemiol. 34 222–231.
  • Ding, Z., Ni, Y., Timmer, S. W., Lee, B.-K., Battenhouse, A., Louzada, S., Yang, F., Dunham, I., Crawford, G. E., Lieb, J. D. et al. (2014). Quantitative genetics of CTCF binding reveal local sequence effects and different modes of X-chromosome association. PLOS Genet. 10 e1004798.
  • Doerge, R. W. and Churchill, G. A. (1996). Permutation tests for multiple loci affecting a quantitative character. Genetics 142 285–294.
  • ENCODE Project Consortium et al. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489 57–74.
  • Flutre, T., Wen, X., Pritchard, J. and Stephens, M. (2013). A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 9 e1003486.
  • Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M. J. et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature 518 317–330.
  • Lappalainen, T., Sammeth, M., Friedländer, M. R., Hoen, P. A. C. T., Monlong, J., Rivas, M. A., Gonzàlez-Porta, M., Kurbatova, N., Griebel, T., Ferreira, P. G. et al. (2013). Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501 506–511.
  • Leek, J. T. and Storey, J. D. (2007). Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3 e161.
  • Levine, R. A. and Casella, G. (2001). Implementations of the Monte Carlo EM algorithm. J. Comput. Graph. Statist. 10 422–439.
  • Maranville, J. C., Luca, F., Richards, A. L., Wen, X., Witonsky, D. B., Baxter, S., Stephens, M., Di Rienzo, A. and Gibson, G. (2011). Interactions between glucocorticoid treatment and cis-regulatory polymorphisms contribute to cellular response phenotypes. PLOS Genet. 7 e1002162.
  • Marin, J.-M. and Robert, C. P. (2007). Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New York.
  • McVicker, G., van de Geijn, B., Degner, J. F., Cain, C. E., Banovich, N. E., Raj, A., Lewellen, N., Myrthil, M., Gilad, Y. and Pritchard, J. K. (2013). Identification of genetic variants that affect histone modifications in human cells. Science 342 747–749.
  • Müller, P., Parmigiani, G., Robert, C. and Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. J. Amer. Statist. Assoc. 99 990–1001.
  • Neto, E. C., Keller, M. P., Broman, A. F., Attie, A. D., Jansen, R. C., Broman, K. W. and Yandell, B. S. (2012). Quantile-based permutation thresholds for quantitative trait loci hotspots. Genetics 191 1355–1365.
  • Neto, E. C., Broman, A. T., Keller, M. P., Attie, A. D., Zhang, B., Zhu, J. and Yandell, B. S. (2013). Modeling causality for pairs of phenotypes in system genetics. Genetics 193 1003–1013.
  • Newton, M. A., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 155–176.
  • Pique-Regi, R., Degner, J. F., Pai, A. A., Gaffney, D. J., Gilad, Y. and Pritchard, J. K. (2011). Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21 447–455.
  • Servin, B. and Stephens, M. (2007). Imputation-based analysis of association studies: Candidate regions and quantitative traits. PLoS Genet. 3 e114.
  • Shabalin, A. A. (2012). Matrix eQTL: Ultra fast eQTL analysis via large matrix operations. Bioinformatics 28 1353–1358.
  • Sillanpää, M. J. and Arjas, E. (1999). Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151 1605–1619.
  • Stegle, O., Parts, L., Piipari, M., Winn, J. and Durbin, R. (2012). Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7 500–507.
  • Stephens, D. A. and Fisch, R. D. (1998). Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo. Biometrics 54 1334–1347.
  • Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the $q$-value. Ann. Statist. 31 2013–2035.
  • Sul, J. H., Raj, T., de Jong, S., de Bakker, P. I., Raychaudhuri, S., Ophoff, R. A., Stranger, B. E., Eskin, E. and Han, B. (2015). Accurate and fast multiple-testing correction in eQTL studies. The American Journal of Human Genetics 96 857–868.
  • Veyrieras, J.-B., Kudaravalli, S., Kim, S. Y., Dermitzakis, E. T., Gilad, Y., Stephens, M. and Pritchard, J. K. (2008). High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLOS Genet. 4 e1000214.
  • Wakefield, J. (2009). Bayes factors for genome-wide association studies: Comparison with P-values. Genet. Epidemiol. 33 79–86.
  • Wen, X. (2011). Bayesian analysis of genetic association data, accounting for heterogeneity. Ph.D. thesis, Univ. Chicago.
  • Wen, X. (2014). Bayesian model selection in complex linear systems, as illustrated in genetic association studies. Biometrics 70 73–83.
  • Wen, X. (2015). Bayesian model comparison in genetic association analysis: Linear mixed modeling and SNP set testing. Biostatistics 16 701–712.
  • Wen, X. (2016). Supplement to “Molecular QTL discovery incorporating genomic annotations using Bayesian false discovery rate control.” DOI:10.1214/16-AOAS952SUPP.
  • Wen, X., Luca, F. and Pique-Regi, R. (2015). Cross-population joint analysis of eQTLs: Fine mapping and functional annotation. PLoS Genet. 11 e1005176.
  • Wen, X. and Stephens, M. (2014). Bayesian methods for genetic association analysis with heterogeneous subgroups: From meta-analyses to gene-environment interactions. Ann. Appl. Stat. 8 176–203.
  • Yi, N., Yandell, B. S., Churchill, G. A., Allison, D. B., Eisen, E. J. and Pomp, D. (2005). Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics 170 1333–1344.

Supplemental materials

  • Appendices. Appendices referenced in Sections 2.1, 2.3, 2.4 and 3.2 are provided in the supplementary file.