The Annals of Statistics

Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing

T. Tony Cai and Jiashun Jin

Full-text: Open access

Abstract

An important estimation problem that is closely related to large-scale multiple testing is that of estimating the null density and the proportion of nonnull effects. A few estimators have been introduced in the literature; however, several important problems, including the evaluation of the minimax rate of convergence and the construction of rate-optimal estimators, remain open.

In this paper, we consider optimal estimation of the null density and the proportion of nonnull effects. Both minimax lower and upper bounds are derived. The lower bound is established by a two-point testing argument, where at the core is the novel construction of two least favorable marginal densities f1 and f2. The density f1 is heavy tailed both in the spatial and frequency domains and f2 is a perturbation of f1 such that the characteristic functions associated with f1 and f2 match each other in low frequencies. The minimax upper bound is obtained by constructing estimators which rely on the empirical characteristic function and Fourier analysis. The estimator is shown to be minimax rate optimal.

Compared to existing methods in the literature, the proposed procedure not only provides more precise estimates of the null density and the proportion of the nonnull effects, but also yields more accurate results when used inside some multiple testing procedures which aim at controlling the False Discovery Rate (FDR). The procedure is easy to implement and numerical results are given.

Article information

Source
Ann. Statist., Volume 38, Number 1 (2010), 100-145.

Dates
First available in Project Euclid: 31 December 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1262271611

Digital Object Identifier
doi:10.1214/09-AOS696

Mathematical Reviews number (MathSciNet)
MR2589318

Zentralblatt MATH identifier
1181.62040

Subjects
Primary: 62G05: Estimation 62G10: Hypothesis testing
Secondary: 62G20: Asymptotic properties

Keywords
Characteristic function empirical characteristic function Fourier analysis minimax lower bound multiple testing null distribution proportion of nonnull effects rate of convergence two-point argument

Citation

Cai, T. Tony; Jin, Jiashun. Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing. Ann. Statist. 38 (2010), no. 1, 100--145. doi:10.1214/09-AOS696. https://projecteuclid.org/euclid.aos/1262271611


Export citation

References

  • Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educ. Behav. Statist. 25 60–83.
  • Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that controls the false discovery rate. Biometrika 93 491–507.
  • Blanchard, G. and Roquain, É. (2007). Adaptive FDR control under independence and dependence. Available at arxiv:0707.0536v2.
  • Cai, T., Jin, J. and Low, M. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 2421–2449.
  • Celisse, A. and Robin, S. (2008). A leave-p-out based estimation of the proportion of null hypotheses. Available at arxiv:0804.1189.
  • Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • Donoho, D. and Jin, J. (2006). Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. Ann. Statist. 34 2980–3018.
  • Donoho, D. and Liu, R. C. (1991). Geometrizing rates of convergence, II. Ann. Statist. 19 633–667.
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1–22.
  • Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Erdelyi, A. (1956). Asymptotic Expansions. Dover, New York.
  • Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257–1272.
  • Finner, H., Dickhaus, T. and Roters, M. (2009). On the false discovery rate and on asymptotically optimal rejection curve. Ann. Statist. 37 596–618.
  • Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
  • Ibragimov, I. A., Nemirovskii, A. S. and Khas’minskii, R. Z. (1986). Some problems on nonparametric estimation in Gaussian white noise. Theory Probab. Appl. 31 391–406.
  • Jin, J. and Cai, T. (2006). Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. Available at arxiv.math/0611108v1.
  • Jin, J. and Cai, T. (2007). Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
  • Jin, J. (2008). Proportion of nonzero normal means: Universal oracle equivalences and uniformly consistent estimations. J. R. Stat. Soc. Ser. B Stat. Methodol. 70(3) 461–493.
  • Mallat, S. (1998). A Wavelet Tour of Signal Processing, 2nd ed. Academic Press, New York.
  • Meinshausen, M. and Rice, J. (2006). Estimating the proportion of false null hypothesis among a large number of independent tested hypotheses. Ann. Statist. 34 373–393.
  • Müller, P., Parmigiani, G., Robert, C. and Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. J. Amer. Statist. Assoc. 99 990–1001.
  • Newton, M., Kendziorski, C., Richmond, C., Blattner, F. and Tsui, K. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8 37–52.
  • Neuvial, P. (2008). Asymptotic properties of false discovery rate controlling procedures under independence. Electron. J. Stat. 2 1065–1110.
  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
  • Storey, J. D. (2002). A direct approach to false discovery rate. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
  • Sun, W. and Cai, T. (2007). The oracle and compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102 901–912.
  • Swanepoel, J. W. H. (1999). The limiting behavior of a modified maximal symmetric 2s-spacing with applications. Ann. Statist. 27 24–35.
  • van der Laan, M., Dudoit, S. and Pollard, K. (2004). Multiple testing (III): Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Technical report, Dept. Biostatistics, Univ. California, Berkeley.
  • West, M. (1987). On scale mixtures of normal distributions. Biometrika 3 646–648.
  • Zhang, C.-H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18 806–831.