A comprehensive treatment of quadratic-form-based inference in repeated measures designs under diverse asymptotics

: Split-Plot or Repeated Measures Designs with multiple groups occur naturally in sciences. Their analysis is usually based on the classical Repeated Measures ANOVA. Roughly speaking, the latter can be shown to be asymptotically valid for large sample sizes n i assuming a ﬁxed number of groups a and time points d . However, for high-dimensional settings with d > n i , this argument breaks down and statistical tests are often based on (standardized) quadratic forms. Furthermore, analysis of their limit be- haviour is usually based on certain assumptions on how d converges to ∞ with respect to n i . As this may be hard to argue in practice, we do not want to make such restrictions. Moreover, sometimes also the number of groups a may be large compared to d or n i . To also have an impression about the behaviour of (standardized) quadratic forms as test statistic, we analyze their asymptotics under diverse settings on a , d and n i . In fact, we combine all kinds of combinations, where they diverge or are bounded in a uniﬁed framework. To this aim, we assume equal covariance matrices between all groups. Studying the limit distributions in detail, we follow Sattler and Pauly (2018) and propose an approximation to obtain critical values. The resulting test and its approximation approach are investigated in an extensive simulation study focusing on the exceptional asymptotic frameworks that are the main focus of this work.


Motivation and introduction
In many studies, it is possible to conduct and handle a large number of measurements, which makes high-dimensionality an increasingly important topic. In fact, high-dimensional repeated measure designs or split-plot designs for multiple groups are the objectives of many analyses in science. This is the case in life science, where test persons were examined multiple times during a study, or in the industry where some parameters are measured on a nearly continuous basis. Therein we consider d measurements from N subjects, which are divided into a independent and generally unbalanced groups where the i-th group contains n i observations. Moreover, factor levels on the groups or repeated measures are possible. For independent d-dimensional observation vectors X ik ∼ N d (μ i , Σ i ) null hypotheses regarding μ = (μ 1 , ..., μ a ) are investigated, where popular hypotheses are the existence of a group effect, a time effect as well as an interaction effect between time and group. For a classical repeated measures ANOVA design with d ≤ n i , this was treated for example in [6]. But in many cases, it is easier, cheaper, or ethically more justifiable to increase the number of repetitions rather than increasing the sample size. Therefore techniques are needed, which can handle the case of d > n i .
In the particular case with just two groups but a general distributional setting and without restriction on the dimension d, this was treated in [7]. For more groups and a more general setting regarding hypotheses, [9] uses a classical ANOVA F test statistic, which has just an exact F-distribution for very special covariance matrices. So under some conditions on n i /d or the relation between the dimension and some power of traces containing the covariance matrix, they developed a decent approximation for the test statistic.
In [10] they handle several cases with an increasing number of groups under some requirements on the covariance matrices and the relation between sample sizes and the number of factor levels. In contrast, [17] investigated the case with just one normally distributed group, but fewer assumptions on the covariance matrix and no specific relation between sample size and dimension.
[18] expand these results especially for a larger number of groups, which is also allowed to approach infinity, together with the sample sizes and the dimension. As a result of this, no restrictions on their respective convergence rate were made. However, this does not treat the small n large a case which was, e.g., treated by [2] or [3] for fixed dimensions d and balanced designs n i ≡ n.
The importance of such large a small n cases increased in the last years, for example, through more interest for personalized medicine, as mentioned in [1]. Here the idea is to develop treatments adapted to the properties of the patients, see for example [11]. A similar idea is in stratified medicine, where depending on common biological or other characteristics, appropriate therapies are developed for groups of patients. Therefore it is necessary to divide existing groups into subgroups with smaller numbers of subjects. Also, in other areas like insurance, there is a trend for more personalized products. Together with the frequent use of high-dimensional data, there is a demand for more comprehensive asymptotic frameworks.
Therefore, in addition to the large a small n case, we include the large d small n case, further combining both and developing a technique that can be used in each of these settings. To this end, we follow the same approach as [12] and assume homogenous covariance matrices with Σ i = Σ > 0, again with no further assumptions on the structure of the covariance matrix Σ. The homoscedastic setting allows some generalizations as well as a smaller number of other requirements on the underlying statistical model. This paper is organized as follows. Section 2 introduces the statistical model, the investigated hypotheses, and the notations used in the paper's remainder. In Section 3, the test statistic is presented, as well as their asymptotic behavior and an alternative small sample approximation. Section 4 contains simulations regarding the type-I-error rate and the tests' power, introduced in the previous chapters. The paper closes with a short conclusion. For brevity and readability, all proofs are shifted to the appendix.

Statistical model and hypotheses
We consider a homogenous split-plot design given by a independent and unbalanced groups of d-dimensional random vectors whereby each vector represents the measurement of one independent subject. It is assumed that mean vectors E(X i,1 ) = μ i = (μ i,t ) d t=1 ∈ R d and one positive definite covariance matrix Cov(X i,1 ) = Σ > 0 exist. As usual j = 1, . . . , n i denotes the individual subjects or units in group i = 1, . . . , a, a, n i ∈ N, so we have a total number of N = a i=1 n i random vectors. This framework allows a factorial structure regarding time, group or both, by splitting up the indices, accordingly, see [13] for example.
Within this model linear hypotheses of repeated measures ANOVA, formulated as  [17]. Here (·) − denotes some generalized inverse of the matrix and H 0 (H) can equivalently be written as H 0 (T ) : T μ = 0. As discussed in [18], T has the form T = T W ⊗ T S for projection matrices T W and T S . Now hypotheses of interest are for example given by (a) No group effect: No interaction effect between time and group: H ab 0 : (P a ⊗ P d ) μ = 0. Here, J d is the d-dimensional matrix only containing 1s and P d : It is often useful to split the expectation vector into its components to simplify the interpretation. With the common conditions i α i = t β t = i,t (αβ) it = 0, this can be done by expanding Here, α i ∈ R describes the i-th group effect, β t ∈ R the time effect at time point t and (αβ) it ∈ R the (i, t)-interaction effect between group and time. Thereby the above hypotheses can alternatively be formulated through

Test statistics and their asymptotics
In this work, we consider the following five different asymptotic frameworks, which are: This great diversity is exceptional and distinguishes the present proposal from nearly all other approaches. Most of the existing procedures just consider special cases of one of these cases (for example [7] (IV) with a = 1 or [17] (IV) with a = 2). Others allow for only one as [9] for (IV) or [2] for (I).
In contrast, our framework allows the combination of any of these assumptions. However, d → ∞ alone is not included as this would not allow the construction of consistent trace estimators of covariances which are later needed for inference. Moreover, the case n max = max(n 1 , ..., n a ) → ∞ with fixed a and d has already been studied in detail in the literature and is thus excluded here, see, e.g., [8] or [4] and the references cited therein.
It is apparent that in contrast to [18] and other papers, the common conditions of ni N → κ i ∈ (0, 1) are missing. This is relevant because it allows an appreciably larger amount of settings, especially for a → ∞. But it also clearly generalizes the model for the case of fixed a, e.g. in unbalanced settings, where we only let some group sample sizes converge to ∞.
To examine the validity of the null hypothesis H 0 (T ) : T μ = 0 unattached from the asymptotic framework, we use Q N = N · X T X. Here X = (X 1 , . . .
. . , a, denotes the vector of pooled group means. Unfortunately for many covariance matrices Σ, the random variable Q N tends to converge to infinity, for d → ∞ or a → ∞. To avoid this behaviour the standardized quadratic form given by is used, which also enables us to evaluate all limit distributions in detail. For normal distributed observations the expectation and variance of the quadratic form is known and it follows that ir . Observe, that for both values only the first factor tr(T S Σ) resp. tr((T S Σ) 2 ) depends on the unknown covariance matrix, while all other quantities are known from the test setting.
Applying the representation theorem for quadratic forms in normaly distributed random vectors from [16] we can rewrite the standardized statistic W N as Here λ s are the eigenvalues of T V N T in decreasing order, N ni Σ and (C s ) s is a sequence of independent χ 2 1 -distributed random variables. As a consequence, the asymptotic behaviour of the eigenvalues, determine the asymptotic limit distribution of W N . In fact, we obtain in generalization of [17] and [18]: ∼ χ 2 1 . Putting the results into context. [7] only considered case a) with r = 0. [18] at least found asymptotic results in case b) but for case a) they need r ∈ {0, 1}. So this theorem is not only distinct from other results through the variety of asymptotic settings. It also considerably enhances the continuum of limit distributions through a mixture of normal distribution and finite sums of weighted standardized χ 2 1 -distributed random variables. Furthermore, the if and only if relation shows the importance of the demands for the standardized eigenvalues and that it isn't possible to relax them.
To use this test statistic, it is necessary to construct proper estimators with the necessary properties. One of these is ratio-consistency, where we call an estimator θ n,d for θ ratio-consistent, if it holds θ n,d /θ P → 1. To get such estimators, we define .
Below we prove that they are unbiased and ratio consistent estimators for tr(T S Σ) and tr (T S Σ) 2 , respectivly, under both, the nullhypothesis and the alternative. This allows us to define the estimated version of our test statistic by The following Lemma justifies the usage of the estimated version instead of the exact one. Unfortunately, the calculation of the standardized eigenvalues β s is generally not simplified through homogeneity. Therefore it is nearly impossible to find an appropriate estimator which can be used in all our frameworks. Moreover, simulations showed that large sample sizes, dimensions or number of groups are necessary for a good approximation, which make quantiles based on Theorem 1 a) difficult to apply. For similar reasons, in [17] and [18] they used the quantiles of a random variable of the kind 3 for the degrees of freedom lead to a third moment approximation. In our homoscedastic model the usage of this random variable K f P is based on the following theorem.

Theorem 3. Under the conditions of Theorem 1 and one of the frameworks
With the well known rules for the kronecker product and traces we can decompose the parameter f P by The connection between f P and β 1 in the two extreme cases, i.e. β 1 → 0 if and only f P → ∞ and β 1 → 1 if and only if f P → 1, have been investigated in [17] for the case of a = 1 but also translate to the present framework.
Here we have to estimate the first part, while the second one η N,a just depends on the asymptotic setting and therefore is known. This allows us to use the same estimated traces for different hypothesis which differ only in T W .
Moreover, for η N,a → ∞, we also have f P → ∞, without estimation, because Otherwise, however, the behaviour of f P is unclear and we have to find consistent estimators for tr (T S Σ) 3 in all our different frameworks. This achieved by considering the class of estimators . These are based on suitable symmetrized U-statistics, while 1 = 2 = ... = 6 means that all indices are different. Afterwards these estimators for each individual group are combined, to get an estimator which uses the observations of each group, given by Together with the estimators from above, we can construct a consistent estimator for f P by f P := A 3 2 /C 2 1 · η N,a . Theorem 4. In all our frameworks (I)-(V), it holds that i) C 1 is an unbiased estimator for tr (T S Σ) 3 , where P denotes convergence in propabilty.
Through the usage of U-statistics with a kernel of order 6, for each estimator summations have to be done. In contrast, estimators based on observations from all groups would require much higher numbers. For example in [18] a i=1 6! · ni 6 summations are necessary. Due to homogeneity, we don't need this kind of estimator, but C 1 also requires 6!· a j=1 nj 6 summations, which is already really high, even for comparatively small samples sizes or numbers of groups. Thus, as in [18], the usage of subsampling versions of our estimators is reasonable to make them applicable in practice. Instead of summing up all possible index combinations of one group, the underlying idea is only to do this for a randomly chosen subset of combinations.
To define the subsampling version, it is first necessary to introduce some definitions and notations. A parameter υ ∈ (0, ∞) is chosen and used to define w i = υ · ni Here Combining them, allows to define the subsamling version of C 1 by This way of defining the number of subsampling repetitions w i , guarantees that the relation between the subsampled parts C 1,i resembles the relation between the original C 1,i . Although this can lead to great differences between the subsampling sizes for the different groups, it ensures that single groups' influence is not too big.
These results allow formulating a more useable version of K f P through the following theorem.

Theorem 6. The results of Theorem 3 remains valid if f P is replaced by
For the estimation of the unknown traces, it would also be possible to construct estimators that use observations from different groups. This is feasible and seems reasonable, but in practice, we would again need subsampling versions of these estimators, which take care of the dataset's structure. This is really complicated and therefore not usable in practice. So we avoid these difficulties by using estimators for the separate groups and combine them afterward.
Relaxing the assumption of homogeneous covariance matrices to which is essentially easier to fulfill, wouldn't change the validity of the previous results. From a theoretical point of view it would be even sufficient to assume tr((T S Σ 1 ) j ) = tr((T S Σ 2 ) j ) = ... = tr((T S Σ a ) j ) for j ∈ {1, 2, 3}, but this is nearly impossible to justify in practice.

Remark 1.
a) The equality of the covariance matrices is a central condition of our approach. Otherwise the structure of E(Q N ) and Var (Q N ) changes considerably, and properties of all estimators holds no longer. The consquences of a violation strongly depends on the setting and are difficult to assess. So if there exists an i ∈ N a with Σ i = Σ then tr((T S Σ i ) j )/ tr(T S Σ) j ) can be close to one for j = 1, 2 but stronlgy influence Q N , depending on the interplay, the sample size and the asymptotic framework. b) Therefore, in the frameworks (III)-(V), it is preferable to use the approach from [18], if the condition seems less plausible. c) All our introduced estimators are composed from estimators for the single groups. This allows to recognize groups, whose traces vary widley from the others, and therefore deteced groups with other covariance matrix and assess their influence.

Simulation
For an evaluation of the finite sample behavior of the introduced method, we have conducted extensive simulations regarding (i) their ability in keeping the nominal significance level and (ii) their power to detect certain alternatives in various scenarios.
Here we focus on the frameworks (I) and (II), which are the most interesting ones because they don't require the usual condition of increasing sample sizes. Therefore they are a strict expansion of the settings considered in [18].

Type-I error
To check the type-I error rate for α = 5% we consider small(d = 5, d = 50), moderate(d = 200) and large dimension(d = 600) and increasing the number of groups from 2 to 12. The sample sizes are fixed in a quite unbalanced setting given by n = (n 1 , ..., n 12 ) = (15,15,20,35,25,20,30,30,35,20,15,25). We used 10,000 simulation runs and chose υ = 0.05 for our subsampling type estimators. Thereby, the number of subsamling draws are between 251 and 81,158, one basis of the quite unbalanced setting. Higher values for υ would increase the accurancy but noticeable extend the computation time.
Two different null hypotheses are investigated to have a situation with β 1 → 0 as well as with β 1 → 1. These hypotheses are For both hypotheses the same distributional setting is choosen, with Σ as a autoregressive covariance matrix with parameter 0.6 e.g. (Σ) i,j = 0.6 |i−j| and μ i = 0 d for i = 1, ..., a, to achieve better comparabilty. For H b 0 it holds τ P ≡ 1 while the values for H a 0 can be seen in Table 1  All tests ψ z = 1 1(W N > z 1−α ), ψ χ = 1 1(W N > χ 2 1;1−α ) and ϕ N = 1 1{W N > Kf P ;1−α } are used while χ 2 1;1−α denotes the 1 − α quantile of a χ 2 1 distribution and Kf P ;1−α the 1 − α quantile of Kf P . It must be noted that in the following figures, we use different axes for each setting to make them as detailed as possible.
In Figure 1 it can be seen that for β 1 → 0, the usage of ψ χ results in too conservative test decisions, especially for larger dimension. So, in this case, a rate that is in most cases lower than 0.04 would lead to a raised number of rejections when the null hypothesis is true. However, ψ z has too high type-I error rates, especially in the case of small d=5. But, this improves for a higher dimension as well as a larger number of groups. For all dimensions, ϕ N shows by far the best type-I error control rates and performs well with comparatively low dimensions or just a few groups. It can be seen that the error rates have less fluctuation for higher numbers of groups. The reason for this is that for fixed comparatively small sample sizes, an increasing number of groups not only improves the approximation but also is necessary to get reliable estimators.
In contrast, there is nearly no difference between ψ χ and ϕ N in Figure 2. This similarity is not surprising because from Figure 1 we know that f P always has the value one. Furthermore, the small difference between both curves shows once more the good performance of the used estimators. Apart from that, again, the performance of ϕ N is quite good, particularly for a higher number of groups.
Using the test ϕ z that is based on the wrong limit distribution under H b 0 results in considerably larger type-I error rates between 0.065 and 0.085.
To sum up, ϕ N shows really good type-I error rates, overall settings, dimensions, and group numbers, even for substantially unbalanced sample sizes, containing groups with just a few observations.
From the simulation result given in [18], it directly follows that it is challenging to detect the one-point alternative for d = 50 depending on the hypothesis. For this reason, we here consider a much larger value for δ.
For the trend alternative (Figure 3), ϕ N has a high power for both null hypotheses where the power is essential higher for H b 0 . Increasing the number of groups also increases the power in both hypotheses. It is noticeable that for H a 0 increasing the number from 8 to 10 groups has substantially more effect than from 2 to 4 groups while for H b 0 it's vice versa. As expected, detecting the one-point alternative (Figure 4) is challenging for both hypotheses, so the power is low in both cases, even for larger δ-values in particular for H a 0 . This observation coincides with the power calculations from [18]. But it can be seen that an increasing number of groups increase the power essentially.
Finally, we considered a shift alternative( Figure 5), but just for H b 0 . As in other cases( [17], [18]), this alternative is comparatively easy to detect. This holds in particular for an increasing number groups.

Conclusion
The present paper investigated a procedure for homoscedastic split-plot designs under various settings containing different kinds of potential high-dimensionality. Under equal covariance matrices or similar conditions (as mentioned in Section 2), results for settings with, for example, a large number of small independent groups are found. These kinds of data sets nowadays get more important because there is a trend to divide data sets more, e.g., in the context of personalized medicine or personalized insurance. Different from existing approaches, we take this development into account by considering a variety of different frameworks.
We were able to expand the central theorem of [18] also to cover this case for the price of the additional assumption of equal covariance matrices. Moreover, we generalized it to some more cases, in some sense completing the scope of the theorem. For all settings, we approximate the critical value of the test statistic by a standardized χ 2 f distribution with appropriate f . To use these results, we developed estimators that can be used unattached of the asymptotic framework.
We conducted simulations to investigate the level of the resulting test as well as its power. The outcomes were convincing, especially for a larger number of groups.
Unfortunately, it is not that easy to verify the assumption of equal covariance matrices or just equal powers of traces. The most popular test under normality, Box's M-test [5], has quite good results but doesn't take care of our asymptotic frameworks. High-dimensional tests of equal covariance matrices are a field of great interest, which was, for example, investigated in [14] and [15]. We plan to combine their techniques with the results obtained in [19] in the near future.
Finally, various adjustments of estimators are planned to improve their performance when the homogeneity is violated.

Appendix
Proof of Theorem 1. For this proof, it is helpful to present the theorem in a more detailed way. ∼ χ 2 1 . The first two parts as well as the last one were proved in [18]. For part c) from Cramers theorem it is well known that it needs an infinite number of summands to get a normal distribution as limit distribution. So it exists a infinite amount M ⊂ N with The proof of part a) shows, that β → 0 for all ∈ M , and because of the decreasing order there exists an r ∈ N with b r > 0 and b r +1 = 0. Assume now that β → b for = 1, ..., r otherwise consider the subsequence where this holds. It remains to show that from it follows r = r as well as b = b . To this aim, we consider the Momentgenerating functions, so we know, Thus, applying the continous mapping theorem we have for all Now we consider the zero points of both sides, which are a consequence of the polynomial parts and can be written by 1 √ 2b resp. 1 √ 2b . It can be directly inferred from this that both polynomials has the same degree and therefore r = r. Moreover, both of them have the same zero points with the same multiplicity. So the coefficients are the same on both sides, and because of the decreasing order, it follows b = b for = 1, ..., r. Therefore the result follows.
Given the fact that framework III is not really high-dimensional, and I just partwise, it would be possible to use other more classical estimators for the unknown traces. Nevertheless, our focus was to develop preferably general estimators that can be used in various settings.

Lemma 1. With
we can define which is an unbiased and ratio consistent estimator for tr(T S Σ), in all of our frameworks.
Proof. It is obvious that this is a unbiased estimator of tr(T S Σ). With well known rules and analogous to [18] we calculate · O(tr 2 (T S Σ)).
Now we need a case analysis which is done for some of the following proofs. So the first one is in detail and the other proofs are shorter. At first we consider the case where n max → ∞. Then For the other case n max is bound and a → ∞. In this situation it holds So dividing by tr 2 (T S Σ) and then using the Tschebyscheff inequality leads to the results in both cases.
For the estimated version of the standardized quadratic form, one more estimator is needed.

Lemma 2. The estimator, given by
is an unbiased and ratio-consistent estimator of tr (T S Σ) 2 in all our asymptotic frameworks.
Proof. Again the unbiasedness is clear, and we consider the variance. We calculate, with Y i, 1 , 2 := T S (X i,k1 − X i,k2 ), Similar as before for n max → ∞ we get and for n max bound and a → ∞ Again the result follows by using Tschebyscheff's inequality.
With these theorems, the usage of the estimated standardized quadratic form can be justified.
Proof of Theorem 2. The result follows directly by theorem 3.2 from [18].
For the proof of Theorem 4, we need to show different properties that combined lead to the result.
Proof of Theorem 4. We conduct this proof in several steps: The results from [18] directly yield to E(C 1 ) = tr (T S Σ) 3 and Var (C i,1 ) which proves a) and b). Together with Tschebychefs inequality this leads to an unbiased ratio consistent estimator for tr (T S Σ) 3 .
For part c) we calculate

P. Sattler
Again this number is in O(n −1 max ) for n max → ∞ and in O(a −1 ) for a → ∞. So in both cases the result follows with the Tschebyscheff-inequality.
At last, the proof of part d) is done using the above results. A similar proof is part of [18], but we repeat it for better understanding.
With the last lemma it follows for both cases that  (1)) · 1 It is obvious that this estimator needs a sufficiently large amount of groups with at least six observations. Similar for the other estimators, which were introduced earlier. From a theoretical point of view, a scenario with n max ≤ 5 is part of our model. In practice, however, this setting is rarely examined. In this case, it would be possible to define some estimators which combine observations from different groups, which would be much more complicated than our estimators.
With Theorem A.9 Theorem A.10 and Theorem A.16 from [18] for the variance we get