Electronic Journal of Statistics

A strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimation

Ramji Venkataramanan and Oliver Johnson

Full-text: Open access

Abstract

In statistical inference problems, we wish to obtain lower bounds on the minimax risk, that is to bound the performance of any possible estimator. A standard technique to do this involves the use of Fano’s inequality. However, recent work in an information-theoretic setting has shown that an argument based on binary hypothesis testing gives tighter converse results (error lower bounds) than Fano for channel coding problems. We adapt this technique to the statistical setting, and argue that Fano’s inequality can always be replaced by this approach to obtain tighter lower bounds that can be easily computed and are asymptotically sharp. We illustrate our technique in three applications: density estimation, active learning of a binary classifier, and compressed sensing, obtaining tighter risk lower bounds in each case.

Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 1126-1149.

Dates
Received: June 2017
First available in Project Euclid: 27 March 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1522116041

Digital Object Identifier
doi:10.1214/18-EJS1419

Mathematical Reviews number (MathSciNet)
MR3780042

Zentralblatt MATH identifier
06864487

Subjects
Primary: 62G05: Estimation 62B10: Information-theoretic topics [See also 94A17] 62G07: Density estimation

Keywords
Minimax lower bounds Fano’s inequality compressed sensing density estimation active learning

Rights
Creative Commons Attribution 4.0 International License.

Citation

Venkataramanan, Ramji; Johnson, Oliver. A strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimation. Electron. J. Statist. 12 (2018), no. 1, 1126--1149. doi:10.1214/18-EJS1419. https://projecteuclid.org/euclid.ejs/1522116041


Export citation

References

  • [1] Aeron, S., Saligrama, V. and Zhao, M. (2010). Information theoretic bounds for compressed sensing., IEEE Trans. Inform. Theory 56 5111–5130.
  • [2] Assouad, P. (1983). Deux remarques sur l’estimation., Comptes rendus des séances de l’Académie des sciences. Série 1, Mathématique 296 1021–1024.
  • [3] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector., The Annals of Statistics 1705–1732.
  • [4] Birgé, L. (1986). On estimating a density using Hellinger distance and some other strange facts., Probability Theory and Related Fields 71 271–291.
  • [5] Birgé, L. (2005). A new lower bound for multiple hypothesis testing., IEEE Trans. Inform. Theory 51 1611–1615.
  • [6] Candes, E. J. and Tao, T. (2005). Decoding by linear programming., IEEE Trans. Inform. Theory 51 4203 - 4215.
  • [7] Candes, E. J. (2008). The Restricted Isometry Property and its implications for compressed sensing., Comptes Rendus Mathematique 346 589–592.
  • [8] Candes, E. J. and Davenport, M. A. (2013). How well can we estimate a sparse vector?, Applied and Computational Harmonic Analysis 34 317–323.
  • [9] Candes, E. J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information., IEEE Trans. Inform. Theory 52 489–509.
  • [10] Castro, R. M. and Nowak, R. D. (2008). Minimax bounds for active learning., IEEE Trans. Inform. Theory 54 2339–2353.
  • [11] Cover, T. M. and Thomas, J. A. (1991)., Elements of Information Theory. John Wiley, New York.
  • [12] Donoho, D. L. (2006). Compressed sensing., IEEE Trans. Inform. Theory 52 1289–1306.
  • [13] Giné, E. and Nickl, R. (2015)., Mathematical foundations of infinite-dimensional statistical models 40. Cambridge University Press.
  • [14] Guntuboyina, A. (2011). Lower bounds for the minimax risk using $f$-divergences, and applications., IEEE Trans. Inform. Theory 57 2386–2399.
  • [15] Hayashi, M. (2007). Error exponent in asymmetric quantum hypothesis testing and its application to classical-quantum channel coding., Physical Review A 76 062301.
  • [16] Hayashi, M. and Nagaoka, H. (2003). General formulas for capacity of classical-quantum channels., IEEE Trans. Inform. Theory 49 1753–1768.
  • [17] Ibragimov, I. A. and Khasminskii, R. Z. (1977). Estimation of infinite-dimensional parameter in Gaussian white noise., Doklady Akademii Nauk SSSR 236 1053–1055.
  • [18] Johnson, O. T. (2017). Strong converses for group testing in the finite blocklength regime., IEEE. Trans. Inform. Theory 63 5923-5933.
  • [19] Liu, J., Cuff, P. and Verdú, S. (2017). $E_\gamma $-Resolvability., IEEE Trans. Inform. Theory 63 2629–2658.
  • [20] Massart, P. (2007). Concentration Inequalities and Model Selection. In, Ecole d’Eté de Probabilités de Saint-Flour XXXIII - 2003 (J. Picard, ed.) Springer.
  • [21] Nagaoka, H. (2005). Strong converse theorems in quantum information theory. In, Proceedings of ERATO Workshop on Quantum Information Science 2001, Univ. Tokyo, Tokyo, Japan, September 6–8, 2001, Asymptotic Theory in Quantum Statistical Inference (M. Hayashi, ed.). World Scientific.
  • [22] Nakiboğlu, B. (2017). The Augustin center and the sphere packing bound for memoryless channels. In, IEEE Int. Symp. Information Theory 1401–1405.
  • [23] Polyanskiy, Y., Poor, H. V. and Verdú, S. (2010). Channel coding rate in the finite blocklength regime., IEEE Trans. Inform. Theory 56 2307–2359.
  • [24] Polyanskiy, Y. and Wu, Y. Lecture Notes on Information Theory. Online:, http://people.lids.mit.edu/yp/homepage/data/itlectures_v4.pdf.
  • [25] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $\ell _q$-balls., IEEE Trans. Inform. Theory 57 6976–6994.
  • [26] Sason, I. and Verdú, S. (2015). Upper bounds on the relative entropy and Rényi divergence as a function of total variation distance for finite alphabets. In, Proc. IEEE Inf. Theory Workshop-Fall 214–218.
  • [27] Sason, I. and Verdú, S. (2016). $f$-divergence Inequalities., IEEE Trans. Inform. Theory 62 5973–6006.
  • [28] Sason, I. and Verdú, S. (2018). Arimoto–Rényi conditional entropy and Bayesian $M$-ary hypothesis testing., IEEE Trans. Inform. Theory 64 4–25.
  • [29] Sibson, R. (1969). Information radius., Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 14 149–160.
  • [30] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning., Annals of Statistics 135–166.
  • [31] Tsybakov, A. B. (2009)., Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York.
  • [32] Vazquez-Vilar, G., Campo, A. T., Guillén i Fàbregas, A. and Martinez, A. (2016). Bayesian $M$-ary Hypothesis Testing: The Meta-Converse and Verdú-Han Bounds Are Tight., IEEE Trans. Inform. Theory 62 2324–2333.
  • [33] Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence., Annals of Statistics 1564–1599.
  • [34] Ye, F. and Zhang, C.-H. (2010). Rate Minimaxity of the Lasso and Dantzig Selector for the $\ell _q$-Loss in $\ell _r$-Balls., Journal of Machine Learning Research 11 3519–3540.
  • [35] Yu, B. (1997). Assouad, Fano, and Le Cam. In, Festschrift for Lucien Le Cam 423–435. Springer.