The Annals of Statistics

Adapting to unknown sparsity by controlling the false discovery rate

Felix Abramovich, Yoav Benjamini, David L. Donoho, and Iain M. Johnstone

Full-text: Open access


We attempt to recover an n-dimensional vector observed in white noise, where n is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the p norm for p small. We obtain a procedure which is asymptotically minimax for r loss, simultaneously throughout a range of such sparsity classes.

The optimal procedure is a data-adaptive thresholding scheme, driven by control of the false discovery rate (FDR). FDR control is a relatively recent innovation in simultaneous testing, ensuring that at most a certain expected fraction of the rejected null hypotheses will correspond to false rejections.

In our treatment, the FDR control parameter qn also plays a determining role in asymptotic minimaxity. If q=lim qn∈[0,1/2] and also qn>γ/log(n), we get sharp asymptotic minimaxity, simultaneously, over a wide range of sparse parameter spaces and loss functions. On the other hand, q=lim qn∈(1/2,1] forces the risk to exceed the minimax risk by a factor growing with q.

To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new.

Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2⋅log(potential model size/actual model sizes). We exhibit a close connection with FDR-controlling procedures under stringent control of the false discovery rate.

Article information

Ann. Statist. Volume 34, Number 2 (2006), 584-653.

First available in Project Euclid: 27 June 2006

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C20: Minimax procedures
Secondary: 62G05: Estimation 62G32: Statistics of extreme values; tail inference

Thresholding wavelet denoising minimax estimation multiple comparisons model selection smoothing parameter selection


Abramovich, Felix; Benjamini, Yoav; Donoho, David L.; Johnstone, Iain M. Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 (2006), no. 2, 584--653. doi:10.1214/009053606000000074.

Export citation


  • Abramovich, F. and Benjamini, Y. (1995). Thresholding of wavelet coefficients as multiple hypotheses testing procedure. Wavelets and Statistics. Lecture Notes in Statist. 103 5--14. Springer, Berlin.
  • Abramovich, F. and Benjamini, Y. (1996). Adaptive thresholding of wavelet coefficients. Comput. Statist. Data Anal. 22 351--361.
  • Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. M. (2005). Adapting to unknown sparsity by controlling the false discovery rate. (Extended version.) Available at
  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (B. N. Petrov and F. Czáki, eds.) 267--281. Akadémiai Kiadó, Budapest.
  • Benedetto, J. and Anthony, T. (1993). A wavelet auditory model and data compression. Appl. Comput. Harmon. Anal. 1 3--28.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289--300.
  • Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educational and Behavioral Statistics 25 60--83.
  • Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika. To appear.
  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165--1188.
  • Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3 203--268.
  • Breiman, L. and Freedman, D. (1983). How many variables should be entered in a regression equation? J. Amer. Statist. Assoc. 78 131--136.
  • DeVore, R. A., Jawerth, B. and Lucier, B. (1992). Image compression through wavelet transform coding. IEEE Trans. Inform. Theory 38 719--746.
  • Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41 613--627.
  • Donoho, D. L. and Johnstone, I. M. (1994). Ideal denoising in an orthonormal basis chosen from a library of bases. C. R. Acad. Sci. Paris Sér. I Math. 319 1317--1322.
  • Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over $\ell_p$-balls for $\ell_q$-error. Probab. Theory Related Fields 99 277--303.
  • Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200--1224.
  • Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object (with discussion). J. Roy. Statist. Soc. Ser. B. 54 41--81.
  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B 57 301--369.
  • Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461--170.
  • Efron, B., Tibshirani, R. J., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151--1160.
  • Field, D. J. (1987). Relation between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Amer. A 4 2379--2394.
  • Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947--1975.
  • Foster, D. P. and Stine, R. A. (1999). Local asymptotic coding and the minimum description length. IEEE Trans. Inform. Theory 45 1289--1293.
  • Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035--1061.
  • George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731--747.
  • Hochberg, Y. and Tamhane, A. (1987). Multiple Comparison Procedures. Wiley, New York.
  • Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75 383--386.
  • Huang, J. and Mumford, D. (1999). Statistics of natural images and models. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1 541--547.
  • Johnstone, I. M. (1994). Minimax Bayes, asymptotic minimax and sparse wavelet priors. In Statistical Decision Theory and Related Topics V (S. Gupta and J. Berger, eds.) 303--326. Springer, New York.
  • Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594--1649.
  • Mallows, C. L. (1973). Some comments on $C_P$. Technometrics 15 661--675.
  • Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
  • Ruderman, D. L. (1994). The statistics of natural images. Network: Computation in Neural Systems 5 517--548.
  • Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239--257.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461--464.
  • Seeger, P. (1968). A note on a method for the analysis of significances en masse. Technometrics 10 586--593.
  • Shaffer, J. (1995). Multiple hypothesis testing. Ann. Rev. Psychology 46 561--584.
  • Simes, R. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751--754.
  • Simoncelli, E. P. (1999). Bayesian denoising of visual images in the wavelet domain. Bayesian Inference in Wavelet Based Models. Lecture Notes in Statist. 141 292--308. Springer, New York.
  • Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479--498.
  • Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187--205.
  • Tibshirani, R. and Knight, K. (1999). The covariance inflation criterion for adaptive model selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 529--546.
  • van de Geer, S. (1990). Estimating a regression function. Ann. Statist. 18 907--924.