The Annals of Statistics

Cramér-type moderate deviations for Studentized two-sample $U$-statistics with applications

Jinyuan Chang, Qi-Man Shao, and Wen-Xin Zhou

Full-text: Open access

Abstract

Two-sample $U$-statistics are widely used in a broad range of applications, including those in the fields of biostatistics and econometrics. In this paper, we establish sharp Cramér-type moderate deviation theorems for Studentized two-sample $U$-statistics in a general framework, including the two-sample $t$-statistic and Studentized Mann–Whitney test statistic as prototypical examples. In particular, a refined moderate deviation theorem with second-order accuracy is established for the two-sample $t$-statistic. These results extend the applicability of the existing statistical methodologies from the one-sample $t$-statistic to more general nonlinear statistics. Applications to two-sample large-scale multiple testing problems with false discovery rate control and the regularized bootstrap method are also discussed.

Article information

Source
Ann. Statist., Volume 44, Number 5 (2016), 1931-1956.

Dates
Received: June 2015
First available in Project Euclid: 12 September 2016

Permanent link to this document
https://projecteuclid.org/euclid.aos/1473685264

Digital Object Identifier
doi:10.1214/15-AOS1375

Mathematical Reviews number (MathSciNet)
MR3546439

Zentralblatt MATH identifier
1272.68116

Subjects
Primary: 60F10: Large deviations 62E17: Approximations to distributions (nonasymptotic)
Secondary: 62E20: Asymptotic distribution theory 62F40: Bootstrap, jackknife and other resampling methods 62H15: Hypothesis testing

Keywords
Bootstrap false discovery rate Mann–Whitney $U$ test multiple hypothesis testing self-normalized moderate deviation Studentized statistics two-sample $t$-statistic two-sample $U$-statistics

Citation

Chang, Jinyuan; Shao, Qi-Man; Zhou, Wen-Xin. Cramér-type moderate deviations for Studentized two-sample $U$-statistics with applications. Ann. Statist. 44 (2016), no. 5, 1931--1956. doi:10.1214/15-AOS1375. https://projecteuclid.org/euclid.aos/1473685264


Export citation

References

  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289–300.
  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • Borovskich, Y. V. (1983). Asymptotics of $U$-statistics and Von Mises’ functionals. Soviet Math. Dokl. 27 303–308.
  • Cao, H. and Kosorok, M. R. (2011). Simultaneous critical values for $t$-tests in very high dimensions. Bernoulli 17 347–394.
  • Chang, J., Shao, Q. and Zhou, W.-X. (2016). Supplement to “Cramér-type moderate deviations for Studentized two-sample $U$-statistics with applications.” DOI:10.1214/15-AOS1375SUPP.
  • Chang, J., Tang, C. Y. and Wu, Y. (2013). Marginal empirical likelihood and sure independence feature screening. Ann. Statist. 41 2123–2148.
  • Chang, J., Tang, C. Y. and Wu, Y. (2016). Local independence feature screening for nonparametric and semiparametric models by marginal empirical likelihood. Ann. Statist. 44 515–539.
  • Charness, G. and Gneezy, U. (2009). Incentives to exercise. Econometrica 77 909–931.
  • Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835.
  • Chen, L. H. Y. and Shao, Q.-M. (2007). Normal approximation for nonlinear statistics using a concentration inequality approach. Bernoulli 13 581–599.
  • Chen, S. X., Zhang, L.-X. and Zhong, P.-S. (2010). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 105 810–819.
  • Chung, E. and Romano, J. P. (2013). Exact and asymptotically robust permutation tests. Ann. Statist. 41 484–507.
  • Chung, E. and Romano, J. (2016). Asymptotically valid and exact permutation tests based on two-sample $U$-statistics. J. Statist. Plann. Inference. 168 97–105.
  • Delaigle, A., Hall, P. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s $t$-statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 283–301.
  • Dudoit, S. and van der Laan, M. J. (2008). Multiple Testing Procedures with Applications to Genomics. Springer, New York.
  • Fan, J., Hall, P. and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal, Student’s $t$ or bootstrap calibration be applied? J. Amer. Statist. Assoc. 102 1282–1288.
  • Fan, J., Han, X. and Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence. J. Amer. Statist. Assoc. 107 1019–1035.
  • Ferreira, J. A. and Zwinderman, A. H. (2006). On the Benjamini–Hochberg method. Ann. Statist. 34 1827–1849.
  • Friguet, C., Kloareg, M. and Causeur, D. (2009). A factor model approach to multiple testing under dependence. J. Amer. Statist. Assoc. 104 1406–1415.
  • Hall, P. (1990). On the relative performance of bootstrap and Edgeworth approximations of a distribution function. J. Multivariate Anal. 35 108–129.
  • Hall, P. and Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics 47 757–762.
  • Helmers, R. and Janssen, P. (1982). On the Berry–Esseen theorem for multivariate $U$-statistics. In Math. Cent. Rep. SW 90 1–22. Mathematisch Centrum, Amsterdam.
  • Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statistics 19 293–325.
  • Jing, B.-Y., Shao, Q.-M. and Wang, Q. (2003). Self-normalized Cramér-type large deviations for independent random variables. Ann. Probab. 31 2167–2215.
  • Kochar, S. C. (1979). Distribution-free comparison of two probability distributions with reference to their hazard rates. Biometrika 66 437–441.
  • Koroljuk, V. S. and Borovskich, Y. V. (1994). Theory of $U$-Statistics. Mathematics and Its Applications 273. Kluwer Academic, Dordrecht.
  • Kosorok, M. R. and Ma, S. (2007). Marginal asymptotics for the “large $p$, small $n$” paradigm: With applications to microarray data. Ann. Statist. 35 1456–1486.
  • Kowalski, J. and Tu, X. M. (2007). Modern Applied $U$-Statistics. Wiley, Hoboken, NJ.
  • Lai, T. L., Shao, Q.-M. and Wang, Q. (2011). Cramér type moderate deviations for Studentized $U$-statistics. ESAIM Probab. Stat. 15 168–179.
  • Leek, J. T. and Storey, J. D. (2008). A general framework for multiple testing dependence. Proc. Natl. Acad. Sci. USA 105 18718–18723.
  • Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc. 107 1129–1139.
  • Li, G., Peng, H., Zhang, J. and Zhu, L. (2012). Robust rank correlation based screening. Ann. Statist. 40 1846–1877.
  • Liu, W. and Shao, Q.-M. (2010). Cramér-type moderate deviation for the maximum of the periodogram with application to simultaneous tests in gene expression time series. Ann. Statist. 38 1913–1935.
  • Liu, W. and Shao, Q.-M. (2014). Phase transition and regularized bootstrap in large-scale $t$-tests with false discovery rate control. Ann. Statist. 42 2003–2025.
  • Mann, H. B. and Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Statistics 18 50–60.
  • Nikitin, Y. and Ponikarov, E. (2006). On large deviations of nondegenerate two-sample $U$- and $V$-statistics with applications to Bahadur efficiency. Math. Methods Statist. 15 103–122.
  • Okeh, U. M. (2009). Statistical analysis of the application of Wilcoxon and Mann–Whitney $U$ test in medical research studies. Biotechnol. Molec. Biol. Rev. 4 128–131.
  • Shao, Q.-M. and Zhou, W.-X. (2016). Cramér type moderate deviation theorems for self-normalized processes. Bernoulli 22 2029–2079.
  • Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B. Stat. Methodol. 66 187–205.
  • Vandemaele, M. and Veraverbeke, N. (1985). Cramér type large deviations for Studentized $U$-statistics. Metrika 32 165–179.
  • Wang, Q. (2005). Limit theorems for self-normalized large deviation. Electron. J. Probab. 10 1260–1285 (electronic).
  • Wang, Q. (2011). Refined self-normalized large deviations for independent random variables. J. Theoret. Probab. 24 307–329.
  • Wang, Q. and Hall, P. (2009). Relative errors in central limit theorems for Student’s $t$ statistic, with applications. Statist. Sinica 19 343–354.
  • Wang, Q., Jing, B.-Y. and Zhao, L. (2000). The Berry–Esseen bound for Studentized statistics. Ann. Probab. 28 511–535.
  • Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics 1 80–83.
  • Zhong, P.-S. and Chen, S. X. (2011). Tests for high-dimensional regression coefficients with factorial designs. J. Amer. Statist. Assoc. 106 260–274.

Supplemental materials

  • Supplement to “Cramér-type moderate deviations for Studentized two-sample $U$-statistics with applications”. This supplemental material contains proofs for all the theoretical results in the main text, including Theorems 2.2, 2.4, 3.1 and 3.4, and additional numerical results.