The Annals of Applied Statistics

An empirical Bayes mixture method for effect size and false discovery rate estimation

Omkar Muralidharan

Full-text: Open access

Abstract

Many statistical problems involve data from thousands of parallel cases. Each case has some associated effect size, and most cases will have no effect. It is often important to estimate the effect size and the local or tail-area false discovery rate for each case. Most current methods do this separately, and most are designed for normal data. This paper uses an empirical Bayes mixture model approach to estimate both quantities together for exponential family data. The proposed method yields simple, interpretable models that can still be used nonparametrically. It can also estimate an empirical null and incorporate it fully into the model. The method outperforms existing effect size and false discovery rate estimation procedures in normal data simulations; it nearly acheives the Bayes error for effect size estimation. The method is implemented in an R package (mixfdr), freely available from CRAN.

Article information

Source
Ann. Appl. Stat., Volume 4, Number 1 (2010), 422-438.

Dates
First available in Project Euclid: 11 May 2010

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1273584461

Digital Object Identifier
doi:10.1214/09-AOAS276

Mathematical Reviews number (MathSciNet)
MR2758178

Zentralblatt MATH identifier
1189.62004

Keywords
Empirical Bayes false discovery rate effect size estimation empirical null mixture prior

Citation

Muralidharan, Omkar. An empirical Bayes mixture method for effect size and false discovery rate estimation. Ann. Appl. Stat. 4 (2010), no. 1, 422--438. doi:10.1214/09-AOAS276. https://projecteuclid.org/euclid.aoas/1273584461


Export citation

References

  • Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnston, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • Abramovich, F., Grinshtein, V. and Pensky, M. (2007). On optimality of Bayesian testimation in the normal means problem. Ann. Statist. 35 2261–2286.
  • Allison, D. B., Gadbury, G. L., Heo, M., Fernandez, J. R., Lee, C.-K., Prolla, T. A. and Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data. Comput. Statist. Data Anal. 1 1–20.
  • Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Statist. 42 855–903.
  • Brown, L. D. (2008). In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies. Ann. Appl. Statist. 2 113–152.
  • Cai, T., Jin, J. and Low, M. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 2421–2449.
  • Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200–1224.
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing. J. Amer. Statist. Assoc. 99 96–104.
  • Efron, B. (2008a). Empirical Bayes estimates for large-scale prediction problems.
  • Efron, B. (2008b). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1–22.
  • Efron, B. (2009). Correlated z-values and the accuracy of large-scale statistical estimates.
  • Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Jin, J. and Cai, T. (2007). Estimating the null and the proportion of non-null effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
  • Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 4 1594–1649.
  • McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley-Interscience, New York.
  • Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373–393.
  • Muralidharan, O. (2009). Supplement to “An empirical Bayes mixture method for false discovery rate and effect size estimation”. Ann. Appl. Statist. DOI: 10.1214/09-AOAS276SUPPA, DOI: 10.1214/09-AOAS276SUPPB.
  • Newton, M. A., Noueiry, A., Sarkar, D. and Ahlquis, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 155–176.
  • Pan, W., Lin, J. and Le, C. T. (2003). A mixture model approach to detecting differentially expressed genes with microarray data. Functional and Integrative Genomics 3 117–124.
  • Pensky, M. (2006). Frequentist optimality of Bayesian wavelet shrinkage rules for Gaussian and non-Gaussian noise. Ann. Statist. 34 769–807.
  • Robbins, H. (1954). An empirical Bayes approach to statistics. In Proc. Thrid Berkeley Sympos. Math. Statist. Probab. 1 (J. Neyman, ed.) 157–163. Univ. California Press, Berkeley, CA.
  • Storey, J. D. (2002). A direct approach to false discovery rates. J. Roy. Statist. Soc. Ser. B 64 479–498.
  • Strimmer, K. (2008). A unified approach to false discovery rate estimation. BMC Bioinformatics 9 303.

Supplemental materials