## Electronic Journal of Statistics

### Inference for high-dimensional split-plot-designs: A unified approach for small to large numbers of factor levels

#### Abstract

Statisticians increasingly face the problem to reconsider the adaptability of classical inference techniques. In particular, diverse types of high-dimensional data structures are observed in various research areas; disclosing the boundaries of conventional multivariate data analysis. Such situations occur, e.g., frequently in life sciences whenever it is easier or cheaper to repeatedly generate a large number $d$ of observations per subject than recruiting many, say $N$, subjects. In this paper, we discuss inference procedures for such situations in general heteroscedastic split-plot designs with $a$ independent groups of repeated measurements. These will, e.g., be able to answer questions about the occurrence of certain time, group and interactions effects or about particular profiles.

The test procedures are based on standardized quadratic forms involving suitably symmetrized U-statistics-type estimators which are robust against an increasing number of dimensions $d$ and/or groups $a$. We then discuss their limit distributions in a general asymptotic framework and additionally propose improved small sample approximations. Finally, the small sample performance is investigated in simulations and applicability is illustrated by a real data analysis.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 2743-2805.

Dates
First available in Project Euclid: 15 September 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1536976839

Digital Object Identifier
doi:10.1214/18-EJS1465

#### Citation

Sattler, Paavo; Pauly, Markus. Inference for high-dimensional split-plot-designs: A unified approach for small to large numbers of factor levels. Electron. J. Statist. 12 (2018), no. 2, 2743--2805. doi:10.1214/18-EJS1465. https://projecteuclid.org/euclid.ejs/1536976839

#### References

• [1] Ahmad, M. R., Werner, C. and Brunner, E. (2008). Analysis of High Dimensional Repeated Measures Designs: The One Sample Case., Computational Statistics and Data Analysis 53 416–427.
• [2] Bai, Z. and Saranadasa, H. (1996). Effect of highdimension: by an example of a two sample problem., Statistica Sinica 6 311–329.
• [3] Bathke, A. C. and Harrar, S. W. (2008). Nonparametric methods in multivariate factorial designs for large number of factor levels., Journal of Statistical Planning and Inference 138 588–610.
• [4] Bathke, A. C., Harrar, S. W. and Madden, L. V. (2008). How to compare small multivariate samples using nonparametric tests., Computational Statistics and Data Analysis 52 4951–4965.
• [5] Billingsley, P. (1968)., Convergence of probability measures. John Wiley & Sons, New York.
• [6] Box, G. E. P. (1954). Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification., The Annals of Mathematical Statistics 25 290–302.
• [7] Brunner, E. (2009). Repeated measures under non-sphericity. Proceedings of the 6th St. Petersburg Workshop on, Simulation.
• [8] Brunner, E., Becker, B. and Werner, C. (2010). Approximate distributions of quadratic forms in high-dimensional repeated-measures designs. Technical Report, Department Medizinische Statistik Georg-August-Universität, Göttingen
• [9] Brunner, E., Bathke, A. C. and Placzek, M. (2012). Estimation of Box’s $\epsilon$ for low- and high-dimensional repeated measures designs with unequal covariance matrices., Biometrical Journal 54 301–316.
• [10] Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence., Journal of the Royal Statistical Society: Series B 76 349–372.
• [11] Chen, S. X. and Qin, Y.-L. (2010). A Two-Sample Test for High-Dimensional Data with Applications to Gene-Set Testing., The Annals of Statistics 38 808–835.
• [12] Cramér, H. (1936). Ueber eine Eigenschaft der normalen Verteilungsfunktion., Mathematische Zeitschrift 41 405–414.
• [13] Croux, C., Rousseeuw, P. J. and Hössjer, O. (1994). Generalized S-estimators., Journal of the American Statistical Association 89 1271–1281.
• [14] Davis, C. S. (2002)., Statistical Methods for the Analysis of Repeated Measurements. Springer, New York.
• [15] Dümbgen, L. (1998). On Tyler’s M-Functional of Scatter in High Dimension., Annals of the Institute of Statistical Mathematics 50 471–491.
• [16] Friedrich, S., Brunner, E. and Pauly, M. (2017). Permuting longitudinal data in spite of the dependencies., Journal of Multivariate Analysis 153 255–265.
• [17] Geisser, S. and Greenhouse, S. W. (1958). An Extension of Box’s Result on the Use of the $F$ Distribution in Multivariate Analysis., Annals of Mathematical Statistics 29 885–891.
• [18] Greenhouse, S. W. and Geisser, S. (1959). On Methods in the Analysis of Profile Data., Psychometrika 24 95–112.
• [19] Hájek, J., Šidak, Z. and Sen, P. K. (1999)., Theory of Rank Tests. Academic Press, San Diego.
• [20] Happ, M., Harrar, S. W. and Bathke, A. C. (2016). Inference for low- and high-dimensional multigroup repeated measures designs with unequal covariance matrices., Biometrical Journal 58 810–830.
• [21] Harden, M. (2012). Das Behrens-Fisher-Problem für hochdimensionale Split-Plot-Designs. Master thesis, University Göttingen., http://www.ams. med.uni-goettingen.de/download/diplom/Master-Harden.pdf
• [22] Harrar, S. W. and Kong, X. (2016). High-dimensional multivariate repeated measures analysis with unequal covariance matrices., Journal of Multivariate Analysis 145 1–21.
• [23] Huynh, H. and Feldt, L. S. (1976). Estimation of the Box Correction for Degrees of Freedom From Sample Data in Randomized Block and Split-Plot Designs., Journal of Educational Statistics 1 69–82.
• [24] Johnson, R. and Wichern, D. (2007)., Applied multivariate statistical analysis. 6th Edition, Prentice Hall.
• [25] Jordan, W., Tumani, H., Cohrs, S., Eggert, S., Rodenbeck, A., Brunner, E., Rüther, E., Hajak, G. (2004). Prostaglandin-D-synthase (beta-trace) in healthy human sleep., Sleep 27 867–874.
• [26] Katayama, S., Kano, Y. and Srivastava, M. S. (2013). Asymptotic distributions of some test criteria for the mean vector with fewer observations than the dimension., Journal of Multivariate Analysis 116 410–421.
• [27] Kenward, M. G. and Roger, J. H. (2009). An improved approximation to the precision of fixed effects from restricted maximum likelihood., Computational Statistics & Data Analysis 53 2583–2595.
• [28] Keselman, H. J., Algina, J. and Kowalchuk, R. K. (2001). The analysis of repeated measures designs., British Journal of Mathematical and Statistical Psychology 54 1–20.
• [29] Konietschke, F., Bathke, A. C., Harrar, S. W. and Pauly, M. (2015). Parametric and Nonparametric Bootstrap Methods for General MANOVA., Journal of Multivariate Analysis 140 291–301.
• [30] Lecoutre, B. (1991). A Correction for the $\widetilde\epsilon$: Approximative Test in Repeated Measures Designs With Two or More Independent Groups., Journal of Educational Statistics 16, 371–372.
• [31] Liu, Z., Liu, B., Zheng, S. and Shi, N.-Z. (2017). Simultaneous testing of mean vector and covariance matrix for high-dimensional data., Journal of Statistical Planning and Inference 188 82–93.
• [32] Mathai, A. M. and Provost, S. B. (1992)., Quadratic forms in random variables. Marcel Dekker Inc., New York.
• [33] Nishiyama, T., Hyodo, M., Seo, T. and Pavlenko, T. (2013). Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices., Journal of Statistical Planning and Inference 143 1898–1911.
• [34] Pauly, M., Ellenberger, D. and Brunner, E. (2015). Analysis of high-dimensional one group repeated measures designs., Statistics 49(6) 1243–1261.
• [35] Pesarin, F. and Salmaso, L. (2012). A review and some new results on permutation testing for multivariate problems., Statistics and Computing 22 639–646.
• [36] Rao, C. R. and Mitra, S. K. (1971)., Generalized Inverse of Matrices and Its Applications. Wiley, New York.
• [37] Secchi, P., Stamm, A. and Vantini, S. (2013). Inference for the mean of large $p$ small $n$ data: A finite-sample high-dimensional generalization of Hotelling’s theorem., Electronic Journal of Statistics 7 2005–2031.
• [38] Skene, S. S. and Kenward, M. G. (2010). The analysis of very small samples of repeated measurements II: A modified Box correction., Statistics in Medicine 29 2838–2856.
• [39] Tyler, D. E., Critchley, F., Dümbgen L. and Oja, H. (2009). Invariant Co-ordinate Selection (with discussion)., Journal of the Royal Statistical Society, Series B 71 549–592.
• [40] Vallejo, G. and Ato, M. (2006). Modified Brown-Forsythe procedure for testing interaction effects in split-plot designs., Multivariate Behavioral Research 41 549–578.
• [41] Werner, C. (2004). Dimensionsstabile Approximation für Verteilungen von quadratischen Formen im Repeated-Measures-Design. Diploma Thesis, University of Göttingen., http://www.ams.med.uni-goettingen.de/ download/TECHREPORT.PDF
• [42] Witting, H. and Müller-Funke, U. (1995). Asymptotische Statistik: Parametrische Modelle und nicht-parametrische Funktionale, Stuttgart:, Teubner.
• [43] Zhan, D. and Hart, J. D. (2014). Testing equality of a large number of densities., Biometrika 101.2 449–464.
• [44] Zhang, J. T. (2005). Approximate and asymptotic distributions of chi-squared–type mixtures with applications., Journal of the American Statistical Association 100 273–285.