Institute of Mathematical Statistics Collections

Multiple testing procedures under confounding

Debashis Ghosh

Abstract

While multiple testing procedures have been the focus of much statistical research, an important facet of the problem is how to deal with possible confounding. Procedures have been developed by authors in genetics and statistics. In this chapter, we relate these proposals. We propose two new multiple testing approaches within this framework. The first combines sensitivity analysis methods with false discovery rate estimation procedures. The second involves construction of shrinkage estimators that utilize the mixture model for multiple testing. The procedures are illustrated with applications to a gene expression profiling experiment in prostate cancer.

First Page: Show Hide
Primary Subjects: 62P10
Secondary Subjects: 92D10
Keywords: association studies; empirical null hypothesis; multiple comparisons; statistical genomics
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.imsc/1207058277
Digital Object Identifier: doi:10.1214/193940307000000176

References

[1] Abecasis, G. R., Ghosh, D. and Nichols, T. E. (2005). Linkage disequilibrium: Ancient history drives the new genetics. Human Heredity 59 118–124.
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
Mathematical Reviews (MathSciNet): MR1325392
[3] Benjamini, Y. and Liu, W. (1999). A step-down multiple hypothesis testing procedure that controls the false discovery rate under independence. J. Statist. Plann. Inference 82 163–170.
Mathematical Reviews (MathSciNet): MR1736441
Zentralblatt MATH: 1063.62558
Digital Object Identifier: doi:10.1016/S0378-3758(99)00040-3
[4] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
Mathematical Reviews (MathSciNet): MR1869245
Zentralblatt MATH: 1041.62061
Digital Object Identifier: doi:10.1214/aos/1013699998
Project Euclid: euclid.aos/1013699998
[5] Benjamini, Y. and Yekutieli, D. (2005). False discovery rate controlling confidence intervals for selected parameters (with discussion). J. Amer. Statist. Assoc. 100 71–80.
Mathematical Reviews (MathSciNet): MR2156820
Zentralblatt MATH: 1117.62302
Digital Object Identifier: doi:10.1198/016214504000001907
[6] Bhattacharya, S., Long, D. and Lyons-Weiler, J. (2003). Overcoming confounded controls in the analysis of gene expression data from microarray experiments. Applied Bioinformatics 2 197–208.
[7] Cardon, L. and Bell, J. (2001). Association study designs for complex diseases. Nature Reviews Genetics 2 91–99.
[8] Dalmasso, C., Broët, P. and Moreau, T. (2005). A simple procedure for estimating the false discovery rate. Bioinformatics 21 660–668.
[9] Datta, S. and Datta, S. (2005). Empirical Bayes screening of many p-values with applications to microarray studies. Bioinformatics 21 1987–1994.
[10] Devlin, B. and Roeder, K. (1999). Genomic control for association studies. Biometrics 55 997–1004.
[11] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
Mathematical Reviews (MathSciNet): MR2054289
Zentralblatt MATH: 1089.62502
Digital Object Identifier: doi:10.1198/016214504000000089
[12] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
Mathematical Reviews (MathSciNet): MR1946571
Zentralblatt MATH: 1073.62511
Digital Object Identifier: doi:10.1198/016214501753382129
[13] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. Roy. Statist. Soc. Ser. B 64 499–517.
Mathematical Reviews (MathSciNet): MR1924303
Zentralblatt MATH: 1090.62072
Digital Object Identifier: doi:10.1111/1467-9868.00347
[14] Genovese, C. and Wasserman, L. (2004). A stochastic approach to false discovery control. Ann. Statist. 32 1035–1061.
Mathematical Reviews (MathSciNet): MR2065197
Zentralblatt MATH: 1092.62065
Digital Object Identifier: doi:10.1214/009053604000000283
Project Euclid: euclid.aos/1085408494
[15] George, E. I. (1986). Minimax multiple shrinkage estimation. Ann. Statist. 14 188–205.
Mathematical Reviews (MathSciNet): MR829562
Zentralblatt MATH: 0602.62041
Digital Object Identifier: doi:10.1214/aos/1176349849
Project Euclid: euclid.aos/1176349849
[16] Ghosh, D. (2006). Shrunken p-values for assessing differential expression, with applications to genomic data analysis. Biometrics 59 1099–1106.
Mathematical Reviews (MathSciNet): MR2297681
Digital Object Identifier: doi:10.1111/j.1541-0420.2006.00616.x
[17] Ghosh, D. and Chinnaiyan, A. M. (2005). Covariate adjustment in the analysis of microarray data from clinical studies. Functional and Integrative Genomics 5 18–27.
[18] James, W. and Stein, C. (1961). Estimation with quadratic loss. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 361–380. Univ. California Press, Berkeley.
Mathematical Reviews (MathSciNet): MR133191
[19] Lin, D. Y., Kronmal, R. A. and Psaty, B. M. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54 948–963.
[20] Lindsey, J. K. (1974). Comparison of probability distributions. J. Roy. Statist. Soc. Ser. B 36 38–47.
Mathematical Reviews (MathSciNet): MR362643
[21] Prentice, R. L. and Qi, L. (2006). Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation. Biostatistics 7 339–354.
[22] Pounds, S. and Cheng, C. (2004). Improving false discovery rate estimation. Bioinformatics 20 1737–1745.
[23] Pritchard, J. K. and Rosenberg, N. A. (1999). Use of unlinked genetic markers to detect population stratification in association studies. Amer. J. Human Genetics 65 220–228.
[24] Sabatti, C., Service, S. and Freimer, N. (2003). False discovery rate in linkage and association genome screens for complex disorders. Genetics 164 829–833.
[25] Sarkar, S. (2002). Some results on false discovery rates in multiple testing procedures. Ann. Statist. 30 239–257.
Mathematical Reviews (MathSciNet): MR1892663
Zentralblatt MATH: 1101.62349
Digital Object Identifier: doi:10.1214/aos/1015362192
Project Euclid: euclid.aos/1015362192
[26] Sen, P. K. and Saleh, A. K. (1985). On some shrinkage estimators of multivariate location. Ann. Statist. 13 272–281.
Mathematical Reviews (MathSciNet): MR773167
Zentralblatt MATH: 0564.62029
Digital Object Identifier: doi:10.1214/aos/1176346592
Project Euclid: euclid.aos/1176346592
[27] Sen, P. K. and Saleh, A. K. (1987). On preliminary test and shrinkage M-estimation in linear models. Ann. Statist. 15 1580–1592.
Mathematical Reviews (MathSciNet): MR913575
Zentralblatt MATH: 0639.62046
Digital Object Identifier: doi:10.1214/aos/1176350611
Project Euclid: euclid.aos/1176350611
[28] Storey, J. D. (2002). A direct approach to false discovery rates. J. Roy. Statist. Soc. Ser. B 64 479–498.
Mathematical Reviews (MathSciNet): MR1924302
Zentralblatt MATH: 1090.62073
Digital Object Identifier: doi:10.1111/1467-9868.00346
[29] Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist. 31 2013–2035.
Mathematical Reviews (MathSciNet): MR2036398
Zentralblatt MATH: 1042.62026
Digital Object Identifier: doi:10.1214/aos/1074290335
Project Euclid: euclid.aos/1074290335
[30] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. Roy. Statist. Soc. Ser. B 66 187–205.
Mathematical Reviews (MathSciNet): MR2035766
Zentralblatt MATH: 1061.62110
Digital Object Identifier: doi:10.1111/j.1467-9868.2004.00439.x
[31] Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Nat. Acad. Sci. USA 100 9440–9445.
Mathematical Reviews (MathSciNet): MR1994856
Zentralblatt MATH: 1130.62385
Digital Object Identifier: doi:10.1073/pnas.1530509100
[32] van der Laan, M. J., Dudoit, S. and Pollard, K. S. (2004a). Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Statistical Applications in Genetics and Molecular Biology 3 Article 15.
Mathematical Reviews (MathSciNet): MR2101464
Zentralblatt MATH: 1166.62379
[33] van der Laan, M. J., Dudoit, S. and Pollard, K. S. (2004b). Multiple testing. Part II. Step-down procedures for control of the family-wise error rate. Statistical Applications in Genetics and Molecular Biology 3 Article 14.
Mathematical Reviews (MathSciNet): MR2101463
Zentralblatt MATH: 1166.62378
[34] Wacholder, S., Rothman, N. and Caporaso, N. (2000). Population stratification in epidemiologic studies of common genetic variants and cancer: Quantification of bias. J. National Cancer Institute 92 1151–1158.
[35] Varambally, S. et al. (2002). The polycomb group protein EZH2 is involved in progression of prostate cancer. Nature 419 624–629.
[36] Wright, S. (1951). The genetical structure of populations. Ann. Eugenics 15 323–354.
Mathematical Reviews (MathSciNet): MR41413
Zentralblatt MATH: 0042.14605

2012 © Institute of Mathematical Statistics

Institute of Mathematical Statistics Collections

Institute of Mathematical Statistics Collections