The Annals of Statistics

Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing

T. Tony Cai and Jiashun Jin
Source: Ann. Statist. Volume 38, Number 1 (2010), 100-145.

Abstract

An important estimation problem that is closely related to large-scale multiple testing is that of estimating the null density and the proportion of nonnull effects. A few estimators have been introduced in the literature; however, several important problems, including the evaluation of the minimax rate of convergence and the construction of rate-optimal estimators, remain open.

In this paper, we consider optimal estimation of the null density and the proportion of nonnull effects. Both minimax lower and upper bounds are derived. The lower bound is established by a two-point testing argument, where at the core is the novel construction of two least favorable marginal densities f1 and f2. The density f1 is heavy tailed both in the spatial and frequency domains and f2 is a perturbation of f1 such that the characteristic functions associated with f1 and f2 match each other in low frequencies. The minimax upper bound is obtained by constructing estimators which rely on the empirical characteristic function and Fourier analysis. The estimator is shown to be minimax rate optimal.

Compared to existing methods in the literature, the proposed procedure not only provides more precise estimates of the null density and the proportion of the nonnull effects, but also yields more accurate results when used inside some multiple testing procedures which aim at controlling the False Discovery Rate (FDR). The procedure is easy to implement and numerical results are given.

First Page: Show Hide
Primary Subjects: 62G05, 62G10
Secondary Subjects: 62G20
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1262271611
Digital Object Identifier: doi:10.1214/09-AOS696
Zentralblatt MATH identifier: 1181.62040
Mathematical Reviews number (MathSciNet): MR2589318

References

Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
Mathematical Reviews (MathSciNet): MR2281879
Zentralblatt MATH: 1092.62005
Digital Object Identifier: doi:10.1214/009053606000000074
Project Euclid: euclid.aos/1151418235
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
Mathematical Reviews (MathSciNet): MR1325392
Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educ. Behav. Statist. 25 60–83.
Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that controls the false discovery rate. Biometrika 93 491–507.
Mathematical Reviews (MathSciNet): MR2261438
Zentralblatt MATH: 1108.62069
Digital Object Identifier: doi:10.1093/biomet/93.3.491
Blanchard, G. and Roquain, É. (2007). Adaptive FDR control under independence and dependence. Available at arxiv:0707.0536v2.
Cai, T., Jin, J. and Low, M. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 2421–2449.
Mathematical Reviews (MathSciNet): MR2382653
Zentralblatt MATH: 05241110
Digital Object Identifier: doi:10.1214/009053607000000334
Project Euclid: euclid.aos/1201012967
Celisse, A. and Robin, S. (2008). A leave-p-out based estimation of the proportion of null hypotheses. Available at arxiv:0804.1189.
Mathematical Reviews (MathSciNet): MR2411944
Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
Mathematical Reviews (MathSciNet): MR2065195
Zentralblatt MATH: 1092.62051
Digital Object Identifier: doi:10.1214/009053604000000265
Project Euclid: euclid.aos/1085408492
Donoho, D. and Jin, J. (2006). Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. Ann. Statist. 34 2980–3018.
Mathematical Reviews (MathSciNet): MR2329475
Zentralblatt MATH: 1114.62010
Digital Object Identifier: doi:10.1214/009053606000000920
Project Euclid: euclid.aos/1179935072
Donoho, D. and Liu, R. C. (1991). Geometrizing rates of convergence, II. Ann. Statist. 19 633–667.
Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
Mathematical Reviews (MathSciNet): MR2054289
Zentralblatt MATH: 1089.62502
Digital Object Identifier: doi:10.1198/016214504000000089
Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1–22.
Mathematical Reviews (MathSciNet): MR2431866
Digital Object Identifier: doi:10.1214/07-STS236
Project Euclid: euclid.ss/1215441276
Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
Mathematical Reviews (MathSciNet): MR1946571
Zentralblatt MATH: 1073.62511
Digital Object Identifier: doi:10.1198/016214501753382129
Erdelyi, A. (1956). Asymptotic Expansions. Dover, New York.
Mathematical Reviews (MathSciNet): MR78494
Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257–1272.
Mathematical Reviews (MathSciNet): MR1126324
Zentralblatt MATH: 0729.62033
Digital Object Identifier: doi:10.1214/aos/1176348248
Project Euclid: euclid.aos/1176348248
Finner, H., Dickhaus, T. and Roters, M. (2009). On the false discovery rate and on asymptotically optimal rejection curve. Ann. Statist. 37 596–618.
Mathematical Reviews (MathSciNet): MR2502644
Zentralblatt MATH: 1162.62068
Digital Object Identifier: doi:10.1214/07-AOS569
Project Euclid: euclid.aos/1236693143
Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
Mathematical Reviews (MathSciNet): MR2065197
Zentralblatt MATH: 1092.62065
Digital Object Identifier: doi:10.1214/009053604000000283
Project Euclid: euclid.aos/1085408494
Ibragimov, I. A., Nemirovskii, A. S. and Khas’minskii, R. Z. (1986). Some problems on nonparametric estimation in Gaussian white noise. Theory Probab. Appl. 31 391–406.
Mathematical Reviews (MathSciNet): MR866866
Jin, J. and Cai, T. (2006). Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. Available at arxiv.math/0611108v1.
Mathematical Reviews (MathSciNet): MR2325113
Digital Object Identifier: doi:10.1198/016214507000000167
Jin, J. and Cai, T. (2007). Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
Mathematical Reviews (MathSciNet): MR2325113
Digital Object Identifier: doi:10.1198/016214507000000167
Jin, J. (2008). Proportion of nonzero normal means: Universal oracle equivalences and uniformly consistent estimations. J. R. Stat. Soc. Ser. B Stat. Methodol. 70(3) 461–493.
Mathematical Reviews (MathSciNet): MR2420411
Zentralblatt MATH: 05563355
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.00645.x
Mallat, S. (1998). A Wavelet Tour of Signal Processing, 2nd ed. Academic Press, New York.
Mathematical Reviews (MathSciNet): MR1614527
Meinshausen, M. and Rice, J. (2006). Estimating the proportion of false null hypothesis among a large number of independent tested hypotheses. Ann. Statist. 34 373–393.
Müller, P., Parmigiani, G., Robert, C. and Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. J. Amer. Statist. Assoc. 99 990–1001.
Newton, M., Kendziorski, C., Richmond, C., Blattner, F. and Tsui, K. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8 37–52.
Neuvial, P. (2008). Asymptotic properties of false discovery rate controlling procedures under independence. Electron. J. Stat. 2 1065–1110.
Mathematical Reviews (MathSciNet): MR2460858
Digital Object Identifier: doi:10.1214/08-EJS207
Project Euclid: euclid.ejs/1227287693
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR848134
Zentralblatt MATH: 0617.62042
Storey, J. D. (2002). A direct approach to false discovery rate. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
Mathematical Reviews (MathSciNet): MR1924302
Zentralblatt MATH: 1090.62073
Digital Object Identifier: doi:10.1111/1467-9868.00346
Sun, W. and Cai, T. (2007). The oracle and compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102 901–912.
Mathematical Reviews (MathSciNet): MR2411657
Zentralblatt MATH: 05564419
Digital Object Identifier: doi:10.1198/016214507000000545
Swanepoel, J. W. H. (1999). The limiting behavior of a modified maximal symmetric 2s-spacing with applications. Ann. Statist. 27 24–35.
Mathematical Reviews (MathSciNet): MR1701099
Zentralblatt MATH: 0937.62051
Digital Object Identifier: doi:10.1214/aos/1018031099
Project Euclid: euclid.aos/1018031099
van der Laan, M., Dudoit, S. and Pollard, K. (2004). Multiple testing (III): Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Technical report, Dept. Biostatistics, Univ. California, Berkeley.
West, M. (1987). On scale mixtures of normal distributions. Biometrika 3 646–648.
Mathematical Reviews (MathSciNet): MR909372
Zentralblatt MATH: 0648.62015
Digital Object Identifier: doi:10.1093/biomet/74.3.646
Zhang, C.-H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18 806–831.
Mathematical Reviews (MathSciNet): MR1056338
Zentralblatt MATH: 0778.62037
Digital Object Identifier: doi:10.1214/aos/1176347627
Project Euclid: euclid.aos/1176347627

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics