Unsupervised empirical Bayesian multiple testing with external covariates



The Annals of Applied Statistics

Unsupervised empirical Bayesian multiple testing with external covariates

Egil Ferkingstad, Arnoldo Frigessi, Håvard Rue, Gudmar Thorleifsson, and Augustine Kong

Source: Ann. Appl. Stat. Volume 2, Number 2 (2008), 714-735.

Abstract

In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a list of significant hypotheses which differs in length and order from the list obtained by methods not taking covariate-information into account. Covariate-modulated posterior probabilities of each null hypothesis are estimated using a fast approximate algorithm. The new method is applied to expression quantitative trait loci (eQTL) data.

Related Works:

Keywords: Bioinformatics; multiple hypothesis testing; false discovery rates; data integration; empirical Bayes

Full-text: Access denied (no subscription detected)

In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Alternatively, the document is available for a cost of $15. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1215118535
Digital Object Identifier: doi:10.1214/08-AOAS158

References

Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
Allison, D. B., Gadbury, G. L., Heo, M., Fernández, J. R., Lee, C., Prolla, T. A. and Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data. Comput. Statist. Data Anal. 39 1–20.
Almasy, L. and Blangero, J. (1998). Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62 1198–1211.
Anderson, J. A. and Blair, V. (1982). Penalized maximum likelihood estimation in logistic regression and discrimination. Biometrika 69 123–136.
Baldi, P. and Long, A. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics 17 509–519.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
Brown, L., Gans, N., Mandelbaum, N. G. A., Sakov, A., Shen, H., Zeltyn, S. and Zhao, L. (2005). Statistical analysis of a telephone call center: A queueing-science perspective. J. Am. Statist. Assoc. 100 36–50.
Bystrykh, L., Weersing, E., Dontje, B., Sutton, S., Pletcher, M. T., Wiltshire, T., Su, A. I., Vellenga, E., Wang, J., Manly, K. F., Lu, L., Chesler, E. J., Alberts, R., Jansen, R. C., Williams, R. W., Cooke, M. P. and de Haan, G. (2005). Uncovering regulatory pathways that affect hematopoietic stem cell function using “genetical genomics.” Nat. Genet. 37 225–232.
Diaconis, P. and Ylvisaker, D. (1985). Quantifying prior opinion (with discussion). In Bayesian Statistics 2 (Valencia, 1983) 133–156. North-Holland, Amsterdam.
Do, K., Müller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression. J. Roy. Statist. Soc. Ser. C 54 627–644.
Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
Ferkingstad, E., Frigessi, A., Rue, H., Thorleifsson, G. and Kong, A. (2008). Supplement to “Unsupervised empirical Bayesian multiple testing with external covariates.” DOI: 10.1214/08-AOAS158SUPP.
Genovese, C. R., Lazar, N. A. and Nichols, T. (2002). Thresholding of statistical maps in functional neuroimaging using the false discovery rate. NeuroImage 15 870–878.
Genovese, C. R., Roeder, K. and Wasserman, L. (2006). False discovery rate control with p-value weighting. Biometrika 93 509–524.
Gudbjartsson, D. F., Jonasson, K., Frigge, M. L. and Kong, A. (2000). Allegro, a new computer program for multipoint linkage analysis. Nat. Genet. 25 12–13.
Heikkinen, J. and Penttinen, A. (1999). Bayesian smoothing in the estimation of the pair potential function of gibbs point processes. Bernoulli 5 1119–1136.
Jansen, R. C. and Nap, J. P. (2001). Genetical genomics: The added value from segregation. Trends Genet 17 388–91.
Kendziorski, C., Chen, M., Yuan, M., Lan, H. and Attie, A. (2006). Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics 62 19–27.
Kong, A. and Cox, N. J. (1997). Allele-sharing models: LOD scores and accurate linkage tests. Am. J. Hum. Genet. 61 1179–1188.
Kunsch, H. (1994). Robust priors for smoothing and image restoration. Ann. Inst. Statist. Math. 46 1–19.
Langaas, M., Lindqvist, B. H. and Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 555–572.
Lonnstedt, I. and Speed, T. (2002). Replicated microarray data. Statist. Sinica 12 31–46.
Miller, C. J., Genovese, C., Nichol, R. C., Wasserman, L., Connolly, A., Reichart, D., Hopkins, A., Schneider, J. and Moore, A. (2001). Controlling the false discovery rate in astrophysical data analysis. Astron. J. 122 3492–3505.
Monks, S. A., Leonardson, A., Zhu, H., Cundiff, P., Pietrusiak, P., Edwards, S., Phillips, J. W., Sachs, A. and Schadt, E. E. (2004). Genetic inheritance of gene expression in human cell lines. Am. J. Hum. Genet. 75 1094–1105.
Morley, M., Molony, C. M., Weber, T. M., Devlin, J. L., Ewens, K. G., Spielman, R. S. and Cheung, V. G. (2004). Genetic analysis of genome-wide variation in human gene expression. Nature 430 743–747.
Newton, M., Noueyri, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 155–176.
Ott, J. (1999). Analysis of Human Genetic Linkage, 3rd ed. Johns Hopkins Univ. Press, Baltimore.
Parker, R. A. and Rothenberg, R. B. (1988). Identifying important results from multiple statistical tests. Stat. Med. 7 1031–1043.
Roeder, K., Bacanu, S.-A., Wasserman, L. and Devlin, B. (2006). Using linkage genome scans to improve power of association in genome scans. Am. J. Hum. Genet. 78 243–252.
Rue, H. and Martino, S. (2007). Approximate-Bayesian inference for hierarchical Gaussian Markov random field models. J. Statist. Plann. Inference 137 3177–3192.
Schadt, E. E., Monks, S. A., Drake, T. A., Lusis, A. J., Che, N., Colinayo, V., Ruff, T. G., Milligan, S. B., Lamb, J. R., Cavet, G., Linsley, P. S., Mao, M., Stoughton, R. B. and Friend, S. H. (2003). Genetics of gene expression surveyed in maize, mouse and man. Nature 422 297–302.
Sham, P. C., Purcell, S., Cherny, S. S. and Abecasis, G. R. (2002). Powerful regression-based quantitative-trait linkage analysis of general pedigrees. Am. J. Hum. Genet. 71 238–253.
Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
Storey, J. D. (2007). The optimal discovery procedure: A new approach to simultaneous significance testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 347–368.
Storey, J. D., Dai, J. Y. and Leek, J. T. (2007). The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics 8 414–432.
Thompson, A., Brown, J., Kay, J. and Titterington, D. (1991). A study of methods of choosing the smoothing parameter in imagerestoration by regularization. IEEE Trans. Pattern Anal. Machine Intell. 13 326–339.
Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98 5116–5121.
Wu, B., Guan, Z. and Zhao, H. (2006). Parametric and nonparametric FDR estimation revisited. Biometrics 62 735–744.

2008 © Institute of Mathematical Statistics