## Annals of Statistics

### Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data

#### Abstract

In recent years, bootstrap methods have drawn attention for their ability to approximate the laws of “max statistics” in high-dimensional problems. A leading example of such a statistic is the coordinatewise maximum of a sample average of $n$ random vectors in $\mathbb{R}^{p}$. Existing results for this statistic show that the bootstrap can work when $n\ll p$, and rates of approximation (in Kolmogorov distance) have been obtained with only logarithmic dependence in $p$. Nevertheless, one of the challenging aspects of this setting is that established rates tend to scale like $n^{-1/6}$ as a function of $n$.

The main purpose of this paper is to demonstrate that improvement in rate is possible when extra model structure is available. Specifically, we show that if the coordinatewise variances of the observations exhibit decay, then a nearly $n^{-1/2}$ rate can be achieved, independent of $p$. Furthermore, a surprising aspect of this dimension-free rate is that it holds even when the decay is very weak. Lastly, we provide examples showing how these ideas can be applied to inference problems dealing with functional and multinomial data.

#### Article information

Source
Ann. Statist., Volume 48, Number 2 (2020), 1214-1229.

Dates
Received: July 2018
Revised: January 2019
First available in Project Euclid: 26 May 2020

Permanent link to this document
https://projecteuclid.org/euclid.aos/1590480052

Digital Object Identifier
doi:10.1214/19-AOS1844

Mathematical Reviews number (MathSciNet)
MR4102694

#### Citation

Lopes, Miles E.; Lin, Zhenhua; Müller, Hans-Georg. Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data. Ann. Statist. 48 (2020), no. 2, 1214--1229. doi:10.1214/19-AOS1844. https://projecteuclid.org/euclid.aos/1590480052

#### References

• Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley Series in Probability and Statistics. Wiley Interscience, New York.
• Arlot, S., Blanchard, G. and Roquain, E. (2010a). Some nonasymptotic results on resampling in high dimension. I. Confidence regions. Ann. Statist. 38 51–82.
• Arlot, S., Blanchard, G. and Roquain, E. (2010b). Some nonasymptotic results on resampling in high dimension. II. Multiple tests. Ann. Statist. 38 83–99.
• Balakrishnan, S. and Wasserman, L. (2018). Hypothesis testing for high-dimensional multinomials: A selective review. Ann. Appl. Stat. 12 727–749.
• Balakrishnan, S. and Wasserman, L. (2019). Hypothesis testing for densities and high-dimensional multinomials: Sharp local minimax rates. Ann. Statist. 47 1893–1927.
• Belloni, A., Chernozhukov, V., Chetverikov, D., Hansen, C. and Kato, K. (2018). High-dimensional econometrics and regularized GMM. arXiv:1806.01888.
• Bénasséni, J. (2012). A new derivation of eigenvalue inequalities for the multinomial distribution. J. Math. Anal. Appl. 393 697–698.
• Benko, M., Härdle, W. and Kneip, A. (2009). Common functional principal components. Ann. Statist. 37 1–34.
• Bentkus, V. (2003). On the dependence of the Berry–Esseen bound on dimension. J. Statist. Plann. Inference 113 385–402.
• Bunea, F. and Xiao, L. (2015). On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. Bernoulli 21 1200–1230.
• Cao, G., Yang, L. and Todem, D. (2012). Simultaneous inference for the mean function based on dense functional data. J. Nonparametr. Stat. 24 359–377.
• Chafaï, D. and Concordet, D. (2009). Confidence regions for the multinomial parameter with small sample size. J. Amer. Statist. Assoc. 104 1071–1079.
• Chang, J., Yao, Q. and Zhou, W. (2017). Testing for high-dimensional white noise using maximum cross-correlations. Biometrika 104 111–127.
• Chen, X. (2018). Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. Ann. Statist. 46 642–678.
• Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2015). Asymptotic theory for density ridges. Ann. Statist. 43 1896–1928.
• Chen, D. and Müller, H.-G. (2012). Nonlinear manifold representations for functional data. Ann. Statist. 40 1–29.
• Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786–2819.
• Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Anti-concentration and honest, adaptive confidence bands. Ann. Statist. 42 1787–1818.
• Chernozhukov, V., Chetverikov, D. and Kato, K. (2017). Central limit theorems and bootstrap in high dimensions. Ann. Probab. 45 2309–2352.
• Choi, H. and Reimherr, M. (2016). R package ‘$\mathtt{fregion}$’. https://github.com/hpchoi/fregion.
• Choi, H. and Reimherr, M. (2018). A geometric approach to confidence regions and bands for functional parameters. J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 239–260.
• Cressie, N. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist. Soc. Ser. B 46 440–464.
• Degras, D. A. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statist. Sinica 21 1735–1765.
• Deng, H. and Zhang, C. H. (2017). Beyond Gaussian approximation: Bootstrap for maxima of sums of independent random vectors. arXiv:1705.09528.
• Dezeure, R., Bühlmann, P. and Zhang, C.-H. (2017). High-dimensional simultaneous inference with the bootstrap. TEST 26 685–719.
• Fan, J., Shao, Q.-M. and Zhou, W.-X. (2018). Are discoveries spurious? Distributions of maximum spurious correlations and their applications. Ann. Statist. 46 989–1017.
• Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer, New York.
• Fienberg, S. E. and Holland, P. W. (1973). Simultaneous estimation of multinomial cell probabilities. J. Amer. Statist. Assoc. 68 683–691.
• Fitzpatrick, S. and Scott, A. (1987). Quick simultaneous confidence intervals for multinomial proportions. J. Amer. Statist. Assoc. 82 875–878.
• Goodman, L. A. (1965). On simultaneous confidence intervals for multinomial proportions. Technometrics 7 247–254.
• Hoeffding, W. (1965). Asymptotically optimal tests for multinomial distributions. Ann. Math. Stat. 36 369–408.
• Holst, L. (1972). Asymptotic normality and efficiency for certain goodness-of-fit tests. Biometrika 59 137–145.
• Horváth, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications. Springer Series in Statistics. Springer, New York.
• Horváth, L., Kokoszka, P. and Reeder, R. (2013). Estimation of the mean of functional time series and a two-sample problem. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 103–122.
• Hsing, T. and Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley Series in Probability and Statistics. Wiley, Chichester.
• Johnson, W. B., Schechtman, G. and Zinn, J. (1985). Best constants in moment inequalities for linear combinations of independent and exchangeable random variables. Ann. Probab. 13 234–253.
• Jung, S., Lee, M. H. and Ahn, J. (2018). On the number of principal components in high dimensions. Biometrika 105 389–402.
• Koltchinskii, V., Löffler, M. and Nickl, R. (2020). Efficient estimation of linear functionals of principal components. Ann. Statist. 48 464–490.
• Koltchinskii, V. and Lounici, K. (2017a). Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23 110–133.
• Koltchinskii, V. and Lounici, K. (2017b). Normal approximation and concentration of spectral projectors of sample covariance. Ann. Statist. 45 121–157.
• Lopes, M. E., Lin, Z. and Müller, H.-G. (2020). Supplement to “Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data.” https://doi.org/10.1214/19-AOS1844SUPP.
• Lounici, K. (2014). High-dimensional covariance matrix estimation with missing observations. Bernoulli 20 1029–1058.
• Naumov, A., Spokoiny, V. and Ulyanov, V. (2019). Bootstrap confidence sets for spectral projectors of sample covariance. Probab. Theory Related Fields 174 1091–1132.
• Paninski, L. (2008). A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inform. Theory 54 4750–4755.
• Quesenberry, C. P. and Hurst, D. C. (1964). Large sample simultaneous confidence intervals for multinomial proportions. Technometrics 6 191–195.
• Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York.
• Reiß, M. and Wahl, M. (2020). Nonasymptotic upper bounds for the reconstruction error of PCA. Ann. Statist. 48 1098–1123.
• Rice, J. A. (2007). Mathematical Statistics and Data Analysis. Duxbury Press, Pacific Grove CA.
• Sison, C. P. and Glaz, J. (1995). Simultaneous confidence intervals and sample size determination for multinomial proportions. J. Amer. Statist. Assoc. 90 366–369.
• Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
• Wang, H. (2008). Exact confidence coefficients of simultaneous confidence intervals for multinomial proportions. J. Multivariate Anal. 99 896–911.
• Wang, J.-L., Chiou, J.-M. and Müller, H.-G. (2016). Functional data analysis. Annu. Rev. Stat. Appl. 3 257–295.
• Wasserman, L., Kolar, M. and Rinaldo, A. (2014). Berry–Esseen bounds for estimating undirected graphs. Electron. J. Stat. 8 1188–1224.
• Zelterman, D. (1987). Goodness-of-fit tests for large sparse multinomial distributions. J. Amer. Statist. Assoc. 82 624–629.
• Zhang, X. and Cheng, G. (2017). Simultaneous inference for high-dimensional linear models. J. Amer. Statist. Assoc. 112 757–768.
• Zhang, J.-T., Cheng, M.-Y., Wu, H.-T. and Zhou, B. (2019). A new test for functional one-way ANOVA with applications to ischemic heart screening. Comput. Statist. Data Anal. 132 3–17.
• Zheng, S., Yang, L. and Härdle, W. K. (2014). A smooth simultaneous confidence corridor for the mean of sparse functional data. J. Amer. Statist. Assoc. 109 661–673.

#### Supplemental materials

• Supplement to “Bootstrapping max statistics in high dimensions: Near-parametric rates under weak variance decay and application to functional and multinomial data”. The supplement contains proofs of all theoretical results.