The Annals of Applied Statistics

Unsupervised empirical Bayesian multiple testing with external covariates

Egil Ferkingstad, Arnoldo Frigessi, Håvard Rue, Gudmar Thorleifsson, and Augustine Kong
Source: Ann. Appl. Stat. Volume 2, Number 2 (2008), 714-735.

Abstract

In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a list of significant hypotheses which differs in length and order from the list obtained by methods not taking covariate-information into account. Covariate-modulated posterior probabilities of each null hypothesis are estimated using a fast approximate algorithm. The new method is applied to expression quantitative trait loci (eQTL) data.

First Page: Show Hide

Related Works:

Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1215118535
Digital Object Identifier: doi:10.1214/08-AOAS158
Zentralblatt MATH identifier: 05591295
Mathematical Reviews number (MathSciNet): MR2524353

References

Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate., Ann. Statist. 34 584–653.
Mathematical Reviews (MathSciNet): MR2281879
Zentralblatt MATH: 1092.62005
Digital Object Identifier: doi:10.1214/009053606000000074
Project Euclid: euclid.aos/1151418235
Allison, D. B., Gadbury, G. L., Heo, M., Fernández, J. R., Lee, C., Prolla, T. A. and Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data., Comput. Statist. Data Anal. 39 1–20.
Mathematical Reviews (MathSciNet): MR1895555
Almasy, L. and Blangero, J. (1998). Multipoint quantitative-trait linkage analysis in general pedigrees., Am. J. Hum. Genet. 62 1198–1211.
Anderson, J. A. and Blair, V. (1982). Penalized maximum likelihood estimation in logistic regression and discrimination., Biometrika 69 123–136.
Mathematical Reviews (MathSciNet): MR655677
Zentralblatt MATH: 0486.62032
Digital Object Identifier: doi:10.1093/biomet/69.1.123
Baldi, P. and Long, A. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes., Bioinformatics 17 509–519.
Zentralblatt MATH: 0992.92024
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing., J. Roy. Statist. Soc. Ser. B 57 289–300.
Mathematical Reviews (MathSciNet): MR1325392
Brown, L., Gans, N., Mandelbaum, N. G. A., Sakov, A., Shen, H., Zeltyn, S. and Zhao, L. (2005). Statistical analysis of a telephone call center: A queueing-science perspective., J. Am. Statist. Assoc. 100 36–50.
Mathematical Reviews (MathSciNet): MR2166068
Zentralblatt MATH: 1117.62303
Digital Object Identifier: doi:10.1198/016214504000001808
Bystrykh, L., Weersing, E., Dontje, B., Sutton, S., Pletcher, M. T., Wiltshire, T., Su, A. I., Vellenga, E., Wang, J., Manly, K. F., Lu, L., Chesler, E. J., Alberts, R., Jansen, R. C., Williams, R. W., Cooke, M. P. and de Haan, G. (2005). Uncovering regulatory pathways that affect hematopoietic stem cell function using “genetical genomics.”, Nat. Genet. 37 225–232.
Diaconis, P. and Ylvisaker, D. (1985). Quantifying prior opinion (with discussion). In, Bayesian Statistics 2 (Valencia, 1983) 133–156. North-Holland, Amsterdam.
Mathematical Reviews (MathSciNet): MR862488
Zentralblatt MATH: 0673.62004
Do, K., Müller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression., J. Roy. Statist. Soc. Ser. C 54 627–644.
Mathematical Reviews (MathSciNet): MR2137258
Zentralblatt MATH: 05188702
Digital Object Identifier: doi:10.1111/j.1467-9876.2005.05593.x
Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment., J. Amer. Statist. Assoc. 96 1151–1160.
Mathematical Reviews (MathSciNet): MR1946571
Zentralblatt MATH: 1073.62511
Digital Object Identifier: doi:10.1198/016214501753382129
Ferkingstad, E., Frigessi, A., Rue, H., Thorleifsson, G. and Kong, A. (2008). Supplement to “Unsupervised empirical Bayesian multiple testing with external covariates.” DOI:, 10.1214/08-AOAS158SUPP.
Mathematical Reviews (MathSciNet): MR2524353
Zentralblatt MATH: 05591295
Digital Object Identifier: doi:10.1214/08-AOAS158
Project Euclid: euclid.aoas/1215118535
Genovese, C. R., Lazar, N. A. and Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate., NeuroImage 15 870–878.
Genovese, C. R., Roeder, K. and Wasserman, L. (2006). False discovery rate control with, p-value weighting. Biometrika 93 509–524.
Mathematical Reviews (MathSciNet): MR2261439
Zentralblatt MATH: 1108.62070
Digital Object Identifier: doi:10.1093/biomet/93.3.509
Gudbjartsson, D. F., Jonasson, K., Frigge, M. L. and Kong, A. (2000). Allegro, a new computer program for multipoint linkage analysis., Nat. Genet. 25 12–13.
Heikkinen, J. and Penttinen, A. (1999). Bayesian smoothing in the estimation of the pair potential function of gibbs point processes., Bernoulli 5 1119–1136.
Mathematical Reviews (MathSciNet): MR1735787
Digital Object Identifier: doi:10.2307/3318562
Project Euclid: euclid.bj/1143122305
Zentralblatt MATH: 0954.62035
Jansen, R. C. and Nap, J. P. (2001). Genetical genomics: The added value from segregation., Trends Genet 17 388–91.
Kendziorski, C., Chen, M., Yuan, M., Lan, H. and Attie, A. (2006). Statistical methods for expression quantitative trait loci (eQTL) mapping., Biometrics 62 19–27.
Mathematical Reviews (MathSciNet): MR2226552
Digital Object Identifier: doi:10.1111/j.1541-0420.2005.00437.x
Zentralblatt MATH: 1091.62119
Kong, A. and Cox, N. J. (1997). Allele-sharing models: LOD scores and accurate linkage tests., Am. J. Hum. Genet. 61 1179–1188.
Kunsch, H. (1994). Robust priors for smoothing and image restoration., Ann. Inst. Statist. Math. 46 1–19.
Mathematical Reviews (MathSciNet): MR1272743
Digital Object Identifier: doi:10.1007/BF00773588
Langaas, M., Lindqvist, B. H. and Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data., J. R. Stat. Soc. Ser. B Stat. Methodol. 67 555–572.
Mathematical Reviews (MathSciNet): MR2168204
Zentralblatt MATH: 1095.62037
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00515.x
Lonnstedt, I. and Speed, T. (2002). Replicated microarray data., Statist. Sinica 12 31–46.
Mathematical Reviews (MathSciNet): MR1894187
Miller, C. J., Genovese, C., Nichol, R. C., Wasserman, L., Connolly, A., Reichart, D., Hopkins, A., Schneider, J. and Moore, A. (2001). Controlling the false discovery rate in astrophysical data analysis., Astron. J. 122 3492–3505.
Monks, S. A., Leonardson, A., Zhu, H., Cundiff, P., Pietrusiak, P., Edwards, S., Phillips, J. W., Sachs, A. and Schadt, E. E. (2004). Genetic inheritance of gene expression in human cell lines., Am. J. Hum. Genet. 75 1094–1105.
Morley, M., Molony, C. M., Weber, T. M., Devlin, J. L., Ewens, K. G., Spielman, R. S. and Cheung, V. G. (2004). Genetic analysis of genome-wide variation in human gene expression., Nature 430 743–747.
Zentralblatt MATH: 1069.92506
Newton, M., Noueyri, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method., Biostatistics 5 155–176.
Ott, J. (1999)., Analysis of Human Genetic Linkage, 3rd ed. Johns Hopkins Univ. Press, Baltimore.
Parker, R. A. and Rothenberg, R. B. (1988). Identifying important results from multiple statistical tests., Stat. Med. 7 1031–1043.
Roeder, K., Bacanu, S.-A., Wasserman, L. and Devlin, B. (2006). Using linkage genome scans to improve power of association in genome scans., Am. J. Hum. Genet. 78 243–252.
Rue, H. and Martino, S. (2007). Approximate-Bayesian inference for hierarchical Gaussian Markov random field models., J. Statist. Plann. Inference 137 3177–3192.
Mathematical Reviews (MathSciNet): MR2365120
Zentralblatt MATH: 1114.62025
Digital Object Identifier: doi:10.1016/j.jspi.2006.07.016
Schadt, E. E., Monks, S. A., Drake, T. A., Lusis, A. J., Che, N., Colinayo, V., Ruff, T. G., Milligan, S. B., Lamb, J. R., Cavet, G., Linsley, P. S., Mao, M., Stoughton, R. B. and Friend, S. H. (2003). Genetics of gene expression surveyed in maize, mouse and man., Nature 422 297–302.
Sham, P. C., Purcell, S., Cherny, S. S. and Abecasis, G. R. (2002). Powerful regression-based quantitative-trait linkage analysis of general pedigrees., Am. J. Hum. Genet. 71 238–253.
Storey, J. D. (2002). A direct approach to false discovery rates., J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
Mathematical Reviews (MathSciNet): MR1924302
Zentralblatt MATH: 1090.62073
Digital Object Identifier: doi:10.1111/1467-9868.00346
Storey, J. D. (2007). The optimal discovery procedure: A new approach to simultaneous significance testing., J. R. Stat. Soc. Ser. B Stat. Methodol. 69 347–368.
Mathematical Reviews (MathSciNet): MR2323757
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.005592.x
Storey, J. D., Dai, J. Y. and Leek, J. T. (2007). The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments., Biostatistics 8 414–432.
Thompson, A., Brown, J., Kay, J. and Titterington, D. (1991). A study of methods of choosing the smoothing parameter in imagerestoration by regularization., IEEE Trans. Pattern Anal. Machine Intell. 13 326–339.
Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response., PNAS 98 5116–5121.
Wu, B., Guan, Z. and Zhao, H. (2006). Parametric and nonparametric FDR estimation revisited., Biometrics 62 735–744.
Mathematical Reviews (MathSciNet): MR2247201
Digital Object Identifier: doi:10.1111/j.1541-0420.2006.00531.x
Zentralblatt MATH: 1111.62113

2013 © Institute of Mathematical Statistics

The Annals of Applied Statistics

The Annals of Applied Statistics

Turn MathJax Off
What is MathJax?