Multiple Hypothesis Testing in Microarray Experiments



Statistical Science

Multiple Hypothesis Testing in Microarray Experiments

Sandrine Dudoit, Juliet Popper Shaffer and Jennifer C. Boldrick

Source: Statist. Sci. Volume 18, Issue 1 (2003), 71-103.

Abstract

DNA microarrays are part of a new and promising class of biotechnologies that allow the monitoring of expression levels in cells for thousands of genes simultaneously. An important and common question in DNA microarray experiments is the identification of differentially expressed genes, that is, genes whose expression levels are associated with a response or covariate of interest. The biological question of differential expression can be restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and the responses or covariates. As a typical microarray experiment measures expression levels for thousands of genes simultaneously, large multiplicity problems are generated. This article discusses different approaches to multiple hypothesis testing in the context of DNA microarray experiments and compares the procedures on microarray and simulated data sets.

Keywords: Multiple hypothesis testing; adjusted p-value; family-wise Type I error rate; false discovery rate; permutation; DNA microarray.

Full-text: Access granted (open access)

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1056397487
Digital Object Identifier: doi:10.1214/ss/1056397487
Mathematical Reviews number (MathSciNet): MR1997066
Zentralblatt MATH identifier: 02068941

References

Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson Jr., J., Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O. and Staudt, L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403 503--511.
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96 6745--6750.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289--300.
Mathematical Reviews (MathSciNet): MR1325392
Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165--1188.
Mathematical Reviews (MathSciNet): MR1869245
Digital Object Identifier: doi:10.1214/aos/1013699998
Project Euclid: euclid.aos/1013699998
Beran, R. (1988). Balanced simultaneous confidence sets. J. Amer. Statist. Assoc. 83 679--686.
Mathematical Reviews (MathSciNet): MR963795
Boldrick, J. C., Alizadeh, A. A., Diehn, M., Dudoit, S., Liu, C. L., Belcher, C. E., Botstein, D., Staudt, L. M., Brown, P. O. and Relman, D. A. (2002). Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc. Natl. Acad. Sci. U.S.A. 99 972--977.
Braver, S. L. (1975). On splitting the tails unequally: A new perspective on one- versus two-tailed tests. Educational and Psychological Measurement 35 283--301.
Brown, P. O. and Botstein, D. (1999). Exploring the new world of the genome with DNA microarrays. Nature Genetics 21 33--37.
Buckley, M. J. (2000). The Spot User's Guide. CSIRO Mathematical and Information Sciences, North Ryde, NSW, Australia. Available at http://www.cmis.csiro.au/IAP/Spot/ spotmanual.htm.
Callow, M. J., Dudoit, S., Gong, E. L., Speed, T. P. and Rubin, E. M. (2000). Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Research 10 2022--2029.
Chu, G., Goss, V., Narasimhan, B. and Tibshirani, R. (2000). SAM (Significance Analysis of Microarrays)---Users guide and technical document. Technical report, Stanford Univ.
Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2002). Multiple hypothesis testing in microarray experiments. Technical Report 110, Division of Biostatistics, Univ. California, Berkeley. Available at http://www.bepress.com/ucbbiostat/ paper110/.
Mathematical Reviews (MathSciNet): MR1997066
Digital Object Identifier: doi:10.1214/ss/1056397487
Project Euclid: euclid.ss/1056397487
Dudoit, S., Yang, Y. H., Callow, M. J. and Speed, T. P. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statist. Sinica 12 111--139.
Mathematical Reviews (MathSciNet): MR1894191
Dunn, O. J. (1958). Estimation of the means of dependent variables. Ann. Math. Statist. 29 1095--1111.
Mathematical Reviews (MathSciNet): MR101589
Efron, B., Storey, J. D. and Tibshirani, R. (2001). Microarrays, empirical Bayes methods, and false discovery rates. Technical Report 2001-23B/217, Dept. Statistics, Stanford Univ.
Efron, B., Tibshirani, R., Goss, V. and Chu, G. (2000). Microarrays and their use in a comparative experiment. Technical Report 2000-37B/213, Dept. Statistics, Stanford Univ.
Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151--1160.
Mathematical Reviews (MathSciNet): MR1946571
Digital Object Identifier: doi:10.1198/016214501753382129
Finner, H. (1999). Stepwise multiple test procedures and control of directional errors. Ann. Statist. 27 274--289.
Mathematical Reviews (MathSciNet): MR1701111
Digital Object Identifier: doi:10.1214/aos/1018031111
Project Euclid: euclid.aos/1018031111
Gabriel, K. R. (1975). A comparison of some methods of simultaneous inference in MANOVA. In Multivariate Statistical Methods: Among-Groups Covariation (W. R. Atchley and E. H. Bryant, eds.) 61--80. Dowden, Hutchinson and Ross, Stroudsburg, PA.
Mathematical Reviews (MathSciNet): MR378238
Ge, Y., Dudoit, S. and Speed, T. P. (2003). Resampling-based multiple testing for microarray data analysis. TEST. To appear.
Mathematical Reviews (MathSciNet): MR1993286
Zentralblatt MATH: 1056.62117
Genovese, C. and Wasserman, L. (2001). Operating characteristics and extensions of the FDR procedure. Technical Report 737, Dept. Statistics, Carnegie Mellon Univ.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531--537.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75 800--802.
Mathematical Reviews (MathSciNet): MR995126
Zentralblatt MATH: 0661.62067
Digital Object Identifier: doi:10.2307/2336325
Hochberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures. Wiley, New York.
Mathematical Reviews (MathSciNet): MR914493
Zentralblatt MATH: 0731.62125
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6 65--70.
Mathematical Reviews (MathSciNet): MR538597
Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75 383--386.
Hommel, G. and Bernhard, G. (1999). Bonferroni procedures for logically related hypotheses. J. Statist. Plann. Inference 82 119--128.
Mathematical Reviews (MathSciNet): MR1736436
Digital Object Identifier: doi:10.1016/S0378-3758(99)00035-X
Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. J. Comput. Graph. Statist. 5 299--314.
Jogdeo, K. (1977). Association and probability inequalities. Ann. Statist. 5 495--504.
Mathematical Reviews (MathSciNet): MR448703
Kerr, M. K., Martin, M. and Churchill, G. A. (2000). Analysis of variance for gene expression microarray data. Journal of Computational Biology 7 819--837.
Krishnaiah, P. R. and Reising, J. M. (1985). Multivariate multiple comparisons. Encyclopedia of Statistical Sciences 6 88--95. Wiley, New York.
Lehmann, E. L. (1986). Testing Statistical Hypotheses, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR852406
Zentralblatt MATH: 0608.62020
Lipshutz, R. J., Fodor, S., Gingeras, T. R. and Lockhart, D. J. (1999). High density synthetic oligonucleotide arrays. Nature Genetics 21 20--24.
Lönnstedt, I. and Speed, T. P. (2002). Replicated microarray data. Statist. Sinica 12 31--46.
Mathematical Reviews (MathSciNet): MR1894187
Zentralblatt MATH: 1004.62086
Manduchi, E., Grant, G. R., McKenzie, S. E., Overton, G. C., Surrey, S. and Stoeckert Jr., C. J. (2000). Generation of patterns from gene expression data by assigning confidence to differentially expressed genes. Bioinformatics 16 685--698.
Mayo, D. and Spanos, A. (2002). A severe testing interpretation of Neyman--Pearson tests. Unpublished.
Morrison, D. F. (1990). Multivariate Statistical Methods, 3rd ed. McGraw-Hill, New York.
Mathematical Reviews (MathSciNet): MR212946
Zentralblatt MATH: 0183.20605
National Reading Panel (1999). Teaching children to read. Report, National Institute of Child Health and Human Development, National Institutes of Health.
Newton, M. A., Kendziorski, C. M., Richmond, C. S., Blattner, F. R. and Tsui, K. W. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology 8 37--52.
Pepe, M. S., Longton, G., Anderson, G. L. and Schummer, M. (2003). Selecting differentially expressed genes from microarray experiments. Biometrics 59. To appear.
Mathematical Reviews (MathSciNet): MR2012141
Digital Object Identifier: doi:10.1111/1541-0420.00016
Perou, C. M., Jeffrey, S. S., van de Rijn, M., Rees, C. A., Eisen, M. B., Ross, D. T., Pergamenschikov, A., Williams, C. F., Zhu, S. X., Lee, J. C. F., Lashkari, D., Shalon, D., Brown, P. O. and Botstein, D. (1999). Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc. Natl. Acad. Sci. 96 9212--9217.
Pollack, J. R., Perou, C. M., Alizadeh, A. A., Eisen, M. B., Pergamenschikov, A., Williams, C. F., Jeffrey, S. S., Botstein, D. and Brown, P. O. (1999). Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nature Genetics 23 41--46.
Pollard, K. S. and van der Laan, M. J. (2003). Resampling-based multiple testing with asymptotic strong control of type I error. Submitted.
Ramsey, P. H. (1978). Power differences between pairwise multiple comparisons. J. Amer. Statist. Assoc. 73 479--485.
Reiner, A., Yekutieli, D. and Benjamini, Y. (2001). Using resampling-based FDR controlling multiple test procedures for analyzing microarray gene expression data. Unpublished.
Rom, D. M. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika 77 663--665.
Mathematical Reviews (MathSciNet): MR1087860
Digital Object Identifier: doi:10.2307/2337008
Ross, D. T., Scherf, U., Eisen, M. B., Perou, C. M., Rees, C., Spellman, P., Iyer, V., Jeffrey, S. S., van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee, J. C. F., Lashkari, D., Shalon, D., Myers, T. G., Weinstein, J. N., Botstein, D. and Brown, P. O. (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24 227--234.
Scheffé, H. (1959). The Analysis of Variance. Wiley, New York.
Mathematical Reviews (MathSciNet): MR116429
Seeger, P. (1968). A note on a method for the analysis of significances en masse. Technometrics 10 586--593.
Shaffer, J. P. (1986). Modified sequentially rejective multiple test procedures. J. Amer. Statist. Assoc. 81 826--831.
Shaffer, J. P. (1995). Multiple hypothesis testing: A review. Annual Review of Psychology 46 561--584.
Shaffer, J. P. (2002). Multiplicity, directional (Type III) errors, and the null hypothesis. Psychological Methods 7 356--369.
Šidák, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 62 626--633.
Mathematical Reviews (MathSciNet): MR216666
Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751--754.
Mathematical Reviews (MathSciNet): MR897872
Zentralblatt MATH: 0613.62067
Digital Object Identifier: doi:10.2307/2336545
Sorić, B. (1989). Statistical ``discoveries'' and effect-size estimation. J. Amer. Statist. Assoc. 84 608--610.
Storey, J. D. (2001). The false discovery rate: A Bayesian interpretation and the q-value. Technical Report 2001-12, Dept. Statistics, Stanford Univ.
Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479--498.
Mathematical Reviews (MathSciNet): MR1924302
Digital Object Identifier: doi:10.1111/1467-9868.00346
Storey, J. D. and Tibshirani, R. (2001). Estimating the positive false discovery rate under dependence, with applications to DNA microarrays. Technical Report 2001-28, Dept. Statistics, Stanford Univ.
Tibshirani, R., Hastie, T., Narasimhan, B., Eisen, M., Sherlock, G., Brown, P. and Botstein, D. (2002). Exploratory screening of genes and clusters from microarray experiments. Statist. Sinica 12 47--59.
Mathematical Reviews (MathSciNet): MR1894188
Zentralblatt MATH: 1004.62085
Troendle, J. F. (1996). A permutational step-up method of testing multiple outcomes. Biometrics 52 846--859.
Mathematical Reviews (MathSciNet): MR1411735
Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98 5116--5121.
van der Laan, M. J. and Bryan, J. (2001). Gene expression analysis with the parametric bootstrap. Biostatistics 2 445--461.
Westfall, P. H. and Young, S. S. (1993). Resampling-Based Multiple Testing: Examples and Methods for $p$-Value Adjustment. Wiley, New York.
Westfall, P. H., Zaykin, D. V. and Young, S. S. (2001). Multiple tests for genetic effects in association studies. In Biostatistical Methods (S. Looney, ed.) 143--168. Humana, Totowa, NJ.
Wright, S. P. (1992). Adjusted $p$-values for simultaneous inference. Biometrics 48 1005--1013.
Yang, Y. H., Buckley, M. J., Dudoit, S. and Speed, T. P. (2002). Comparison of methods for image analysis on cDNA microarray data. J. Comput. Graph. Statist. 11 108--136.
Mathematical Reviews (MathSciNet): MR1963244
Digital Object Identifier: doi:10.1198/106186002317375640
Yang, Y. H., Dudoit, S., Luu, P. and Speed, T. P. (2001). Normalization for cDNA microarray data. In Microarrays: Optical Technologies and Informatics (M. L. Bittner, Y. Chen, A. N. Dorsel and E. R. Dougherty, eds.) 141--152. SPIE, Bellingham, WA.
Yekutieli, D. and Benjamini, Y. (1999). Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J. Statist. Plann. Inference 82 171--196.
Mathematical Reviews (MathSciNet): MR1736442
Digital Object Identifier: doi:10.1016/S0378-3758(99)00041-5

2008 © Institute of Mathematical Statistics