The Annals of Applied Statistics

Empirical null and false discovery rate inference for exponential families

Armin Schwartzman

Full-text: Open access

Abstract

In large scale multiple testing, the use of an empirical null distribution rather than the theoretical null distribution can be critical for correct inference. This paper proposes a “mode matching” method for fitting an empirical null when the theoretical null belongs to any exponential family. Based on the central matching method for z-scores, mode matching estimates the null density by fitting an appropriate exponential family to the histogram of the test statistics by Poisson regression in a region surrounding the mode. The empirical null estimate is then used to estimate local and tail false discovery rate (FDR) for inference. Delta-method covariance formulas and approximate asymptotic bias formulas are provided, as well as simulation studies of the effect of the tuning parameters of the procedure on the bias-variance trade-off. The standard FDR estimates are found to be biased down at the far tails. Correlation between test statistics is taken into account in the covariance estimates, providing a generalization of Efron’s “wing function” for exponential families. Applications with χ2 statistics are shown in a family-based genome-wide association study from the Framingham Heart Study and an anatomical brain imaging study of dyslexia in children.

Article information

Source
Ann. Appl. Stat., Volume 2, Number 4 (2008), 1332-1359.

Dates
First available in Project Euclid: 8 January 2009

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1231424213

Digital Object Identifier
doi:10.1214/08-AOAS184

Mathematical Reviews number (MathSciNet)
MR2655662

Zentralblatt MATH identifier
1158.62047

Keywords
Multiple testing multiple comparisons mixture model Poisson regression genome-wide association brain imaging

Citation

Schwartzman, Armin. Empirical null and false discovery rate inference for exponential families. Ann. Appl. Stat. 2 (2008), no. 4, 1332--1359. doi:10.1214/08-AOAS184. https://projecteuclid.org/euclid.aoas/1231424213


Export citation

References

  • Abramowitz, M. and Stegun, I. A., eds. (1966). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th ed. Dover, New York.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • Efron, B. (2005b). Bayesians, frequentists and scientists. J. Amer. Statist. Assoc. 100 1–5.
  • Efron, B. (2007a). Correlation and large-scale simultaneous hypothesis testing. J. Amer. Statist. Assoc. 102 93–103.
  • Efron, B. (2007b). Size, power and false discovery rates. Ann. Statist. 35 1351–1377.
  • Efron, B. (2008). Simultaneous inference: When should hypothesis testing problems be combined? Ann. Appl. Statist. 2 197–223.
  • Efron, B. and Tibshirani, R. (1996). Using especially designed exponential families for density estimation. Ann. Statist. 24 2431–2461.
  • Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Everitt, B. S. and Bullmore, E. T. (1999). Mixture model mapping of brain activation in functional magnetic resonance images. Human Brain Mapping 7 1–14.
  • Genovese, C. R. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
  • Ghahremani, D. and Taylor, J. E. (2005). Empirical and theoretical false discovery rate analyses for fMRI data. Poster, Organization for Human Brain Mapping.
  • Herbert, A., Gerry, N. P., McQueen, M. B., Heid, I. M., Pfeufer, A., Illig, T., Wichmann, H.-E., Meitinger, T., Hunter, D., Hu, F. B., Colditz, G., Hinney, A., Hebebrand, J., Koberwitz, K., Zhu, X., Cooper, R., Ardlie, K., Lyon, H., Hirschhorn, J. N., Laird, N. M., Lenburg, M. E., Lange, C. and Christman, M. F. (2006). A common genetic variant is associated with adult and childhood obesity. Science 312 279–283.
  • Jin, J. and Cai, T. T. (2007). Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
  • Kong, S. W., Pu, W. T. and Park, P. J. (2006). A multivariate approach for integrating genome-wide expression data and biological knowledge. Bioinformatics 22 2373–2380.
  • Kotz, S., Balakrishnan, N. and Johnson, N. L. (2000). Bivariate and trivariate normal distributions. In Continuous Multivariate Distributions 1. Models and Applications 251–348. Wiley, New York.
  • Koudou, A. E. (1998). Lancaster bivariate probability distributions with Poisson, negative binomial and gamma margins. Test 7 95–110.
  • Lange, C., Silverman, E. K., Xu, X., Weiss, S. T. and Laird, N. M. (2003). A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics 4 195–206.
  • Lee, J., Shahram, M., Schwartzman, A. and Pauly, J. M. (2007). A complex data analysis in high-resolution SSFP fMRI. Magn. Reson. Med. 57 905–917.
  • Patel, J. K. and Read, C. B. (1996). Handbook of the Normal Distribution, 2nd ed. Dekker, New York.
  • Schwartzman, A., Dougherty, R. F., Lee, J., Ghahremani, D. and Taylor, J. E. (2008b). Empirical null and false discovery rate analysis in neuroimaging. Neuroimage. To appear. Available at http://dx.doi.org/10.1016/j.neuroimage.2008.04.182.
  • Schwartzman, A., Dougherty, R. F. and Taylor, J. E. (2005). Cross-subject comparison of principal diffusion direction maps. Magn. Reson. Med. 53 1423–1431.
  • Schwartzman, A., Dougherty, R. F. and Taylor, J. E. (2008a). False discovery rate analysis of brain diffusion direction maps. Ann. Appl. Statist. 2 153–175.
  • Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist. 31 2013–2035.
  • Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–205.
  • Sun, W. and Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102 901–912.
  • Van Steen, K., McQueen, M. B., Herbert, A., Raby, B., Lyon, H., DeMeo, D. L., Murphy, A., Su, J., Datta, S., Rosenow, C., Christman, M., Silverman, E. K., Laird, N. M., Weiss, S. T. and Lange, C. (2005). Genomic screening and replication using the same data set in family-based association testing. Nature Genetics 37 683–691.