Test for homogeneity with unordered paired observations

In some applications, an experimental unit is composed of two distinct but related subunits. The response from such a unit is $(X_{1}, X_{2})$ but we observe only $Y_1 = \min\{X_{1},X_{2}\}$ and $Y_2 = \max\{X_{1},X_{2}\}$, i.e., the subunit identities are not observed. We call $(Y_1, Y_2)$ unordered paired observations. Based on unordered paired observations $\{(Y_{1i}, Y_{2i})\}_{i=1}^n$, we are interested in whether the marginal distributions for $X_1$ and $X_2$ are identical. Testing methods are available in the literature under the assumptions that $Var(X_1) = Var(X_2)$ and $Cov(X_1, X_2) = 0$. However, by extensive simulation studies, we observe that when one or both assumptions are violated, these methods have inflated type I errors or much lower powers. In this paper, we study the likelihood ratio test statistics for various scenarios and explore their limiting distributions without these restrictive assumptions. Furthermore, we develop Bartlett correction formulae for these statistics to enhance their precision when the sample size is not large. Simulation studies and real-data examples are used to illustrate the efficacy of the proposed methods.


Introduction
In some applications, an experimental unit is made of two distinct but related subunits. The response from such a unit is (X 1 , X 2 ) but we observe only Y 1 = min{X 1 , X 2 } and Y 2 = max{X 1 , X 2 }; that is, the subunit identities are not observed or unobservable. We call (Y 1 , Y 2 ) unordered paired observations. We assume that (X 1i , X 2i ) τ , for i = 1, . . . , n, are independent and identically distributed (i.i.d.) normal random vectors: (1.1) We say that {(Y 1i , Y 2i )} n i=1 are uncorrelated when ρ = 0 and correlated when ρ = 0. This paper studies the homogeneity testing of the marginal distributions of X 1i and X 2i : H 0 : (μ 1 , σ 2 1 ) = (μ 2 , σ 2 2 ) versus H a : (μ 1 , σ 2 1 ) = (μ 2 , σ 2 2 ). (1.2) Unordered paired data occur in many applications, and there is a long research history. For instance, Hinkley (1973) analyzed such a data set from human genetics. The genetic blueprint of an individual is contained in 23 pairs of chromosomes. Each member of the pair is inherited from the corresponding chromosome pair of a parent. If we do not know the chromosome correspondences between the offspring and the parents, we lose the parental identities and end up with unordered paired observations. Olkin and Viana (1995) provide more examples. In visual acuity studies, we may record only a subject's extreme acuities (the "best" and "worst" acuities) without recording the corresponding eyes. In twin experiments, we obtain unordered paired observations without a label for each member of a twin pair; see Ernst et al. (1996) and Shekar et al. (2006) and the references therein. Furthermore, unordered data of a higher dimension are collected in various scientific disciplines. For example, Davies and Phillips (1988) provided an example of unordered data of dimension k. In the interim analysis of a double-blinded clinical trial of k treatments, we get the k order statistics without knowledge of the corresponding treatments; see also van der Meulen (2005) and Miller et al. (2009). In diffusion tensor (DT) brain imaging, the eigenvalues of the DT estimates for each brain voxel are viewed as unordered triples; see Yu et al. (2013) and the references therein.
With unordered paired observations, a fundamental question is whether or not X 1i and X 2i have the same distribution. Under Model (1.1), this is equivalent to testing the hypothesis specified in (1.2). Hinkley (1973) proposed a likelihood ratio test (LRT) procedure under the assumption that ρ = 0 and σ 2 1 = σ 2 2 . Li and Qin (2011) investigated this problem in a semiparametric setup. Other approaches can be found in Moore II (1973), Lauder (1977), Moore II et al. (1979), Carothers (1981), Efron et al. (1971), and Qin and Zhang (2005), among others. All these works assume that X 1i and X 2i are independent with equal variance. These assumptions may not hold in applications, and they can be severely violated, as evidenced by the examples in Section 6. Ignoring the dependence structure and/or imposing an incorrect equal-variance assumption can lead to unreliable inference conclusions: the type I error may be severely inflated or the power markedly decreased. This paper focuses on tests for (1.2). In particular, we study the LRT in four scenarios: (1) ρ = 0 and σ 2 1 = σ 2 2 ; (2) ρ = 0; (3) σ 2 1 = σ 2 2 ; and (4) no assumption on ρ, σ 2 1 , and σ 2 2 . Investigating the asymptotic behavior of these LRT statistics is technically challenging. The well-developed theory (Wilks, 1938;Chernoff, 1954;Self and Liang, 1987;Drton, 2009) is not applicable because of the undesirable mathematical properties (see (2.3) in Section 2) of the log-likelihood function. In addition, an important byproduct of the theory for the corresponding LRT statistics is the asymptotic behavior of the maximum likelihood estimators (MLEs) for (μ 1 , μ 2 , σ 2 1 , σ 2 2 ). Interestingly, we have shown that the asymptotic behavior depends on whether ρ = 0 is known or ρ is unknown. The convergence rates of these parameter estimates depend on the scenario.
We observe that the limiting distributions of the LRT statistics under H 0 are not sufficiently accurate approximations to their finite-sample distributions when n is not large. To enhance the approximation precision of the limiting distributions, we adjust the statistics based on the Bartlett correction (Bartlett, 1937;Lawley, 1956). Simulation results confirm the efficacy of the adjustment.
We organize the rest of the paper as follows. Section 2 introduces the LRT statistics for (1.2) and studies their asymptotic behavior under H 0 . Section 3 provides the sketch of the proofs. Section 4 presents the adjusted limiting distributions of our statistics for data of limited sample size. Section 5 contains simulation studies, and Section 6 gives real-data examples. Some discussion is provided in Section 7, and the technical details are relegated to Section 8.

Main results
The LRT is an essential tool in statistical inference, especially under the parametric model assumption; see Wilks (1938); Chernoff (1954); Self and Liang (1987); Drton (2009), and the references therein. In this section, we present LRT statistics and study their properties for testing (1.2) under model assumptions on ρ and whether or not σ 2 1 = σ 2 2 . We first derive the log-likelihood function with unordered paired observations. For any y 1 < y 2 , we have Therefore, the joint density function of (Y 1 , Y 2 ) is given by where φ(x 1 , x 2 ; θ) denotes the bivariate normal density function with parameters θ = (μ 1 , μ 2 , σ 1 , σ 2 , ρ) τ specified in (1.1). The log-likelihood function based on {(Y 1i , Y 2i )} n i=1 and Model (1.1) is: This likelihood function is the basis for our subsequent development.

Unordered uncorrelated paired data
In this section, we assume that ρ = 0 is known; problem (1.2) is reduced to and we use the notational convention that the entries ofθ areμ 1 ,μ 2 , and so on. Note thatθ,θ, andθ are MLEs of θ under various constraints. The LRT statistics for testing the null hypothesis (1.2) against two alternatives, specified by σ 1 = σ 2 and σ 1 = σ 2 respectively, are given by Theorem 1 below establishes the asymptotic distributions of R n,1 and R n,2 as well as the convergence rates ofθ andθ under H 0 . For presentational continuity, we relegate its proof to Section 8. Let D → denote "convergence in distribution." We use 0.5χ 2 0 + 0.5χ 2 1 for an equal mixture of χ 2 0 and χ 2 1 , with χ 2 0 being the distribution with a point mass at zero.
Deriving the asymptotic null distributions of R n,1 and R n,2 is technically challenging. We make the following comments. Let μ = (μ 1 + μ 2 )/2 and Δ = (μ 1 − μ 2 )/2 so that μ 1 = μ + Δ and μ 2 = μ − Δ; we have This fact implies that the Fisher information matrix of θ under the null hypothesis degenerates and undermines the basis for the elegant classical results (Wilks, 1938;Chernoff, 1954;Self and Liang, 1987;Drton, 2009). The crucial step in obtaining the asymptotic null distribution of the LRT is a quadratic approximation inθ − θ to the log-likelihood ratio function. Following this path, we need to consider a fourth-order Taylor expansion to obtain a quadratic approximation in (θ−θ) 2 and so on. Fortunately, we find that the sandwich technique of  and  overcomes the technical obstacles caused by (2.3).

Unordered correlated pair data
In this section, we study the LRTs for (1.2) with ρ being an unknown parameter.
Similarly to the strategy for (2.2), we define the LRT statistics for (1.2) with ρ being an unknown parameter: Theorem 2 below establishes the asymptotic distributions of R * n,1 and R * n,2 as well as the convergence rates ofθ * andθ * under their respective H 0 . The proof is given in Section 8.
The limiting cumulative distribution function (c.d.f.) of R * n,2 is given by: being the c.d.f. of the standard normal distribution. We use this expression to evaluate the asymptotic quantile and the p-value for the corresponding test.
• When ρ is unknown, σ 2 + /η 2 = (1 + ρ)/(1 − ρ) is unknown; the estimation of σ 2 + is unable to determine the estimation of η 2 . The MLEs of (μ, σ 2 + ) based on (3.1) continue to have the n −1/2 convergence rates. However, β 0 and η 2 need to be estimated from (3.2), which is the likelihood of a normal mixture model. When η 2 is unknown, this model is not strongly identifiable (Chen and Chen, 2003) or not identifiable in the second order (Ho and Nguyen, 2016). The MLEs of β 0 and η 2 can achieve convergence rates of only n −1/8 and n −1/4 respectively; see Theorem 1 of Chen and Li (2009) and Proposition 3.16 of Ho and Nguyen (2019). These are in line with the convergence rates stated in Part (a) of Theorem 2.
We first show that the MLEs for θ = (μ 1 , μ 2 , σ 1 , σ 2 , ρ) τ are consistent. This is achieved by showing that for any estimatorθ of θ, if it leads to a sufficiently large value of the likelihood function, namely n (θ) − (θ 0 ) ≥ C for some constant C > −∞, then under the null model,θ is a consistent estimator of θ. We show that this conclusion is generally valid whether or not the assumption(s) σ 1 = σ 2 and/or ρ = 0 are imposed. Therefore, this ensures the consistency of the MLEs of θ derived from different assumptions.
With some algebra, we are able to obtain the asymptotic quadratic forms of the LRTs; these together with (3.6) lead to the asymptotic distribution of the corresponding LRTs claimed in Theorems 1 and 2.
• If ρ = 0 is known and σ 2 1 = σ 2 2 , we have denotes the kth element of the vector D i , and this convention is applicable to all vectors hereafter.

Adjusted limiting distributions
One drawback of general asymptotic results is that they may offer poor approximations to the corresponding finite-sample distributions. The convergence rates of the parameter estimators given in Theorems 1 and 2 are much lower than those of the MLEs from the regular parametric models. This adversely affects the approximation accuracy of the asymptotic distributions to the finite-sample distributions of the LRT statistics. To improve the approximation precision when n is not very large, we use the Bartlett correction. The Bartlett correction was originally proposed by Bartlett (1937) to improve the approximation of the limiting distribution of an LRT under the regular model, and it was generalized by Lawley (1956). Let T n be the LRT statistic for a hypothesis testing problem, which asymptotically follows a χ 2 d distribution under H 0 . Define b 1 according to the expansion: Barndorff-Nielsen and Hall (1988) showed that under appropriate regularity conditions, the Bartlett correction reduces the error rate of T n from n −1 to n −2 . Hence, χ 2 d better approximates the distribution of T * n than that of T n ; in other words, (1 +b 1 /n)χ 2 d provides a better approximation to the distribution of T n than χ 2 d does. In short, the idea of the Bartlett correction is to adjust the limiting distribution such that its first moment matches that of the LRT up to order O(n −1 ); this helps to improve the approximation accuracy of the limiting distribution to the LRT.
In this spirit, we search for accurate approximate distributions for R n,1 , R n,2 , R * n,1 , and R * n,2 as follows. Recall that R and R * are the limiting distributions of R n,2 and R * n,2 . Let We aim to find p n , r n , p * n , and r * n so that the above distributions have first moments very close to the first moments of their corresponding test statistics for a wide range of n values. High-order asymptotic techniques can be used, but they may involve complicated analytical tools with little assurance of the quality of the end products. The computer experiment approach of Chen and Li (2011) is more effective and practical, and it matches the spirit of the data science.
The experiment works as follows. We consider a sufficiently wide range of values for n. For each n, we simulate a large number of data sets, with each data set composed of n i.i.d. unordered paired observations. Due to the invariance property of the LRT statistics, each data set is generated from the standard bivariate normal distribution. Based on these data sets, we obtain the simulated first moments of R n,1 , R n,2 , R * n,1 , and R * n,2 . We choose p n so that the simulated first moment of R n,1 matches the first moment of F n1 . We then look for a regression model for p n versus n. Similar procedures are applied to obtain regression models for r n , p * n , and r * n .
Step 2. Obtain N values of R n,1 and therefore its simulated first moment, denotedp n . Matchp n with the first moment of F n1 to find p n =p n .
Step 3. Fit a regression model to (n, p n ) with p n being the response and n being the covariate.
We postulate the following nonlinear but parametric regression models: with a and b being regression parameters, and n accounting for imperfect fit. Applying Steps 1-2 outlined above leads to the p n , r n , p * n , and r * n values in Table  1. Fitting the nonlinear regression models (4.1)-(4.4) to the data in Table 1 gives us the fitted values of a and b. With these values, we calculate the approximate p-values with the following adjusted limiting distributions: For each nonlinear regression model in (4.1)-(4.4), we calculate the squared correlations between the predicted values and the responses. These are 98.9%, 99.4%, 99.7%, and 99.5%, indicating that models (4.1)-(4.4) fit the data in Table  1 very well. We have implemented the four LRT statistics with the proposed adjusting limiting distributions in an R package; it is available upon request.

Data generation
Because of the invariance property, we need only study the LRT tests based on data generated from distributions with standardized parameter values.
In each case, we generate (X 1 , X 2 ) from model (1.1) with one of the above parameter settings. Then, we obtain Y 1 = min{X 1 , X 2 } and Y 2 = max{X 1 , X 2 }. We repeat the process to obtain n unordered pairs (Y 1 , Y 2 ).
Based on each set of n unordered pairs, we compute the values of R n,1 , R n,2 , R * n,1 , and R * n,2 and carry out the tests for H 0 without checking that the model for generating the data satisfies the conditions for the tests. We record the rejection rates based on 50, 000 repetitions; the results are presented in the next section.

Results
We calculate the rejection rate of each test at the significance levels α = 10%, 5%, and 1%. The rejection percentages under the null models are summarized in Table 2.
When ρ = 0, X 1 and X 2 are independent. The assumptions for all the LRTs, R n,1 , R n,2 , R * n,1 , and R * n,2 , are satisfied. However, as shown in the first section of Table 2, if their limiting distributions are applied without adjustment, the resulting tests are inaccurate: their type I errors markedly exceed the nominal significance levels. The adjustment proposed in Section 4 is very helpful. After the adjustment, the type I errors of all the tests are close to the nominal levels. The precision is impressive since the adjustment works well even when n is as small as 25.
When ρ = ±0.25 or ±0.5, the model assumptions for R n,1 and R n,2 are violated. When we apply the tests, the type I errors are either near zero when ρ = 0.25 or 0.5 or seriously inflated when ρ = −0.25 or −0.5. In contrast, because of their invariance property, R * n,1 and R * n,2 continue to perform well: with their limiting distributions adjusted, they have satisfactory precision in the type I errors.
To further illustrate the effects of the adjustment on the limiting distributions, Figure 1 presents the type I errors (%) of our LRTs at the 5% significance level when 100 ≤ n ≤ 2500 and ρ = 0. The trends for the 10% and 1% significance levels are similar and are omitted. The plots show that the type I errors of R n,1 , R n,2 after the adjustment are within a 0.2% band of the nominal level for large n and a 0.4% band otherwise; similar results are observed for R * n,1 . For R * n,1 , the approximation accuracy shows no clear improvement as n increases, but the type I errors are between 5% and 5.4%, which is sufficiently accurate for typical applications. We have also run the simulation for n = 3000, 3500, . . . , 5000. The trends are similar to the results for n = 2500 and are omitted.  Next, we compare the powers of R n,1 , R n,2 , R * n,1 , and R * n,2 under the alternatives. All combinations of n, ρ, μ, and σ are incorporated, as described in Section 5.1. Their powers, summarized in Table 3, are computed at the 5% significance level based on the adjusted limiting distributions. We observe that when ρ = 0, R n,1 and R n,2 have higher powers than R * n,1 and R * n,2 ; when ρ = 0.25, R n,1 and R n,2 have higher powers in most cases; when ρ is increased to 0.5, R * n,1 and R * n,2 are much more powerful; when ρ = −0.25 and −0.5, R n,1 and R n,2 are more powerful, but at the cost of the inflated type I errors reported in Table 2; a test with a markedly inflated type I error is generally not recommended. From Tables 2 and 3, we also observe that both the type I errors and the powers of R n,1 and R n,2 decrease as ρ increases, whereas the powers of R * n,1 and R * n,2 increase as ρ increases. We use the reparameterization in Section 3.1 to provide some insight into these results. For simplicity, we consider the case where σ 2 1 = σ 2 2 = σ 2 , under which testing (1.2) is equivalent to testing H 0 : β 0 = 0 based on observations from the mixture model 0.5N (β 0 , η 2 ) + 0.5N (−β 0 , η 2 ).
The tests based on R n,1 and R n,2 are established assuming ρ = 0 is known. In view of (3.3), this is equivalent to assuming σ 2 + = 0.5σ 2 + 0.5ρσ 2 and η 2 = 0.5σ 2 − 0.5ρσ 2 are equal. Thus, when the true value of ρ is positive, the approach behind these two tests overestimates η 2 . The estimator based on the likelihood (3.2), on the other hand, is asymptotically unbiased for the variance of the mixture distribution 0.5N (β 0 , η 2 ) + 0.5N (β 2 0 , η 2 ), which equals β 2 0 + η 2 . With η 2 underestimated and β 2 0 + η 2 unbiasedly estimated, we obtain an underestimated β 2 0 , asymptotically. As ρ gets larger, the overestimation of η 2 and underestimation of β 2 0 become more severe. This explains the decrease in the type I errors and powers of R n,1 and R n,2 . Similar arguments are applicable to the case where ρ < 0.
Tests based on R * n,1 and R * n,2 are established without a known-ρ assumption. The test problems become testing H 0 : β 0 = 0 based on observations from the mixture model: The powers of these tests are determined by the value of |β 0 |/η = |μ 1 − μ 2 |/ 2(1 − ρ)σ 2 . Therefore, if the true value of ρ increases, the powers of R * n,1 and R * n,2 increase. This is in line with the trend observed in Table 3.

Data from karyotype analysis
This example considers 40 unordered pairs of the lengths of the longer and shorter arms of chromosome II of Larix decidua from 40 specimens; so n = 40. The data are available in Table 1 of Matérn and Simak (1968). The test results from R n,1 , R n,2 , R * n,1 , and R * n,2 for (1.2) are as follows: • R n,1 = 14.91 and R n,2 = 17.71. Calibrated by the adjusted limiting distributions, the asymptotic p-values of R n,1 and R n,2 are 7 × 10 −5 and 2 × 10 −4 . • R * n,1 = 1.08 and R * n,2 = 16.69. Calibrated by the adjusted limiting distributions, the asymptotic p-values of R * n,1 and R * n,2 are 0.21 and 4 × 10 −4 . The maximum likelihood estimate of (μ 1 , μ 2 , σ 1 , σ 2 , ρ) is found to be Note thatρ * = −0.73 suggests strong negative correlation between X 1i and X 2i . As revealed in the simulation studies reported in the bottom section of Table 2, R n,1 and R n,2 are therefore not reliable because they are designed for ρ = 0. Moreover, the fitted valuesμ * 1 andμ * 2 are very close, butσ * 1 andσ * 2 are significantly different. Hence, R * n,1 is unsuitable because it is designed for the case where σ 1 = σ 2 . We recommend R * n,2 , which is designed to detect departures from either equal-mean or equal-variance hypotheses. Matérn and Simak (1968) proposed a method for testing H 0 : μ 1 = μ 2 , and the outcome is not significant at the 5% level. We observe that their test is different from ours, since their H 0 does not require σ 2 1 = σ 2 2 . Our test based on R * n,2 rejects (μ 1 , σ 2 1 ) = (μ 2 , σ 2 2 ) with high significance. In fact, we can use the classical LRT for their hypothesis. In particular, let where n (θ) is as in (2.1),θ * is given in Section 2.2, anď We observe (proof omitted) that the model under H 0 : μ 1 = μ 2 is regular when (μ 1 , σ 2 1 ) = (μ 2 , σ 2 2 ). Therefore, R n has a χ 2 1 limiting distribution. The test results are R n = 3.21 with p-value = 0.07, which is in line with the conclusion of Matérn and Simak (1968). In summary, both R n and Matérn and Simak (1968) indicate that μ 1 and μ 2 are not empirically distinguishable based on the given data; however, R * n,2 suggests that (μ 1 , σ 2 1 ) = (μ 2 , σ 2 2 ) should be rejected with high significance. The difference in the two distributions comes from the difference in their variances.

C-band area of human chromosome data
This example consists of normalized measurements of the C-band area on the No. 9 chromosome pair (Mason et al., 1975). The measurements are based on three groups: the father, mother, and offspring. These groups respectively have 40, 18, and 31 unordered pairs of normalized measurements of the C-band area. The data are available in Table 1 of Lauder (1977). We analyze the group of fathers as an example; the analysis of the other groups is similar. We constructed R n,1 , R n,2 , R * n,1 , and R * n,2 and the corresponding p-values from the adjusted limiting distributions. The results are as follows: • R n,1 = 6.51 and R n,2 = 9.47 with n = 40. Calibrated by the adjusted limiting distributions, the asymptotic p-values of R n,1 and R n,2 are 6.6 × 10 −3 and 8.9 × 10 −3 . • R * n,1 = 10.74 and R * n,2 = 13.48 with n = 40. Calibrated by the adjusted limiting distributions, the asymptotic p-values of R * n,1 and R * n,2 are 7.5×10 −4 and 1.9 × 10 −3 .
Note thatρ * = 0.46 suggests strong postive correlation between X 1i and X 2i . Moreover,μ * 1 andμ * 2 are quite different whereasσ * 1 ≈σ * 2 . These suggest that R * n,1 is the most suitable test while R * n,2 is also a possibility. Note that R * n,1 is sharper than R * n,2 with a smaller p-value. Lauder (1977) used the aforementioned data to study the transmission of the heritage properties on the chromosome from parents to children. The validity of the analysis hinges on (μ 1 , σ 1 ) = (μ 2 , σ 2 ) in the model for the unordered pair data of the father or the mother, but the paper did not verify this assumption. Our work fills this gap and provides numerical evidence to support the analysis.

Discussion
We have proposed methods for testing the homogeneity of unordered pair data. With the LRT approach based on the likelihood of the bivariate normal random vectors, we considered the testing problem (1.2) for four scenarios: (1) ρ = 0 and σ 2 1 = σ 2 2 ; (2) ρ = 0; (3) σ 2 1 = σ 2 2 ; and (4) no assumption for these parameters. We derived both the limiting distributions of the LRTs and the convergence rates of the MLEs of the unknown parameters. Furthermore, in the spirit of the Bartlett correction, we proposed the adjusted limiting distribution for each LRT. By simulation, we demonstrated that the adjusted limiting distributions provide accurate approximations to the finite-sample distributions of the corresponding LRTs even when the sample size is as small as n = 25.
Throughout this paper, we have focused on unordered paired observations from normal random vectors. This leads to many interesting but open-ended research topics. For example, the problem becomes more challenging when the data are n unordered k-tuples. Our methods and theory serve as a useful starting point for the general problem. One obstacle is that the joint density function of an unordered k-tuple involves k! terms. This makes the asymptotic expansion of the log-likelihood extremely complicated and therefore introduces tremendous difficulty in the technical development. Another obstacle is that the reparameterization in Section 3.1 plays an important role in the theoretical development. For unordered k-tuples with k > 2, we have yet to find a comparable reparameterization. Because of these difficulties, we leave this theoretical development to future research. It would also be interesting to consider unordered k-tuples that follow the general exponential family distributions. In addition to the aforementioned technical difficulties, we may need to model the correlation structure among the k-tuples. We expect that copula models (Nelsen, 2006) will be useful for this problem.

Some useful lemmas
In this section, we present three lemmas. Lemma 1 is a technical preparation that provides an upper bound on the number of observations in a set. Lemma 2 indicates that under the null model any estimator of θ with a large likelihood value is consistent for θ. Lemma 3 strengthens Lemma 2 by providing specific convergence rates. Lemma 1. As n → ∞, we have, almost surely, where I(·) is the indicator function.
Proof. Note that is the empirical measure of the two-dimensional stripe formed by the inequality This class of stripes can divide n points in two-dimensional space into at most a polynomial number of different subsets. By Pollard (1990), this property implies the uniform strong law of large numbers: The distribution of Z 2 − β 0 − β 1 Z 1 is normal with variance at least 1. Based on this, we have P (|Z 2 − β 0 − β 1 Z 1 | ≤ 1/4) ≤ 0.2 for any β 0 , β 1 . Hence, almost surely, This completes the proof.
Proof. The classical consistency proof of the MLE such as that in Wald (1949) is essentially done for models with a compact parameter space. If necessary, one may first compactify the parameter space or show that, with probability one, the MLE is inside a compact parameter subspace. We adopt the second approach by showing that as n → ∞, 0 ≤η ≤ M 0 and 0 ≤σ + ≤ M 0 almost surely for some positive 0 and M 0 . For convenience, we specifically choose M 0 = exp(4) and 0 such that when η < 0 , log(η) + (1/64)/η 2 ≥ 2. The existence of such an 0 is obvious. We start withη. Note that we have decomposed n (θ) − n (θ 0 ) into a sum of two terms. For the first term, according to the classical result for the LRT under regular models (Serfling, 2000), it is clear that When in the second term the variance parameter η > M 0 = exp(4), we have By the law of large numbers, we have almost surely. This implies that when η > M 0 , * n,2 (β 0 , β 1 , η) − * n,2 (0, 0, 1) ≤ −2n (8.4) and subsequently, uniformly for η in this range, * n,2 (β 0 , β 1 , η) − * n,2 (0, 0, 1) → −∞.
In conclusion, theη value satisfying the lemma condition must almost surely fall within the interval [ 0 , M 0 ]. That is, this result shows that we may reduce the parameter space of (β 0 , β 1 , η) to R 2 × [ 0 , M 0 ] for asymptotic considerations.
Next, we strengthen the results of Lemma 2. We first define some notation. Let It can be seen that E(A i ) = 0, E(B i ) = 0, A i and B i are uncorrelated, and Further, we introduce two parameter vectors of lengths 2 and 4: In the following, we use |x| and x to denote the L 1 and L 2 norms of the vector x, respectively.
(8.15) By straightforward algebra, we find where the unwanted term B i [4] is the fourth element of vector B i . Its coefficient is easily verified to be {β 2 0 + (η 2 − 1)} 2 = o p (|s 2 |). This allows us to obtain a neater expression by absorbing it into the higher-order term, concluding that In short, we have shown that The above algebraic manipulations are typical of the techniques employed in  and . The same techniques, which are tedious but not sophisticated, give We provide some details below for (8.20); the result in (8.21) can be similarly obtained. Note that Repeating the procedures for assessing the order of n i=1 (3) Recall that (β 0 ,β 1 ,η) = (0, 0, 1) + o p (1), so the above upper bound is applicable to * n,2 (β 0 ,β 1 ,η) − * n,2 (0, 0, 1). This completes the proof of (b). Finally, we come to (c). Combining (a) and (b) and the conditions in Lemma 2, we have which is possible only if boths 1 = O p (n −1/2 ) ands 2 = O p (n −1/2 ). This leads to the order assessments in (c) and completes the proof of the entire lemma.

Remark:
The MLE of θ always satisfies the condition (8.2) with C = 0. Hence, Lemma 2 implies the consistency of the MLE of θ, and Lemma 3 establishes the asymptotic order of the MLE.

Proof of Theorem 1
The difference between Theorems 1 and 2 is that in the former we consider ρ 0 = 0 to be known when formulating the test statistic. This makes it helpful to reorganize the entries of A i and B i and the corresponding entries of s 1 and s 2 . After that, we will refine the results in Lemma 3 and apply the refined results to prove (a) and (b).
When ρ 0 = 0 is known, we have σ + = σ − . Let Every entry of s 1 and s 2 is a linear combination of the entries of t, possibly with an O p ( t 2 ) difference when these parameter values approach their default null values. We enumerate these entries as follows. Because every entry of s 1 and s 2 is virtually a linear combination of the entries of t, we can reorganize the entries of A i and B i into D i such that Recall that D i is defined in (3.5): with E(D i ) = 0 and var(D i ) = Σ D = diag(1, 1, 1/4, 1, 2). Using the central limit theorem, we have With the above preparation, we refine the results in Lemma 3 to the following lemma.
Proof. Combining the results of (a) and (b) in Lemma 3, we have (8.32) With (8.29) and (8.30), after some algebra, (8.32) can be simplified to as claimed in part (a) of the lemma. For (b), with the lemma assumption n (θ)− n (θ 0 ) ≥ C > −∞, (8.33) implies thatt has the order O p (n −1/2 ). The order assessment in (b) then follows the definition of t in (8.28).
We are now ready for Theorem 1. Note that the MLEs in both Theorem 1(a) and 1(b) satisfy the conditions of Lemma 4 with C = 0. Hence, the order conclusions of the MLEs in both Theorem 1(a) and 1(b) have been established in Lemma 4. We next derive the limiting distributions.
We rewrite R n,1 defined in (2.2) as withθ being the maximum point of the reduced model where (μ 1 , σ 1 ) = (μ 2 , σ 2 ). Since the reduced model is regular, by standard techniques such as those in Serfling (2000): Next, note thatθ is the maximum point of the reduced model where σ 1 = σ 2 = σ. This makes β 1 = ξ = 0 and subsequently for t under the reduced model, t = (μ, β 2 0 /2 + (σ 2 + − 1), β 2 0 , 0, 0) τ . Nevertheless, Lemma 4 is applicable to the above form of t as long as it is close to its counterpart in the null model. Applying Lemma 4(a) toθ, we have Note that the range of the supremum conforms to the form of t in the reduced model and the fact that t[3] = β 2 0 ≥ 0. The specific coefficient values are due to the value of Σ D .
To derive the limiting distribution of R n,1 , we need to show that the upper bound in (8.35) can be attained. Letθ be the estimator of θ such that the corresponding With some straightforward algebra, the correspondingθ values oft exist and satisfyμ If we apply the Taylor expansion, the order assessment in (8.36) leads to the following approximation: Sinceθ is the maximum point of n (θ), 2{ n (θ) − n (θ 0 )} is not smaller than the value in (8.37). The sandwich technique of  and  or the squeeze theorem can be applied to obtain D → N (0, 1). Hence, R n,1 has the limiting distribution 0.5χ 2 0 + 0.5χ 2 1 . This completes the proof of part (a). We now prove conclusion (b). In this case, the range of t has only an intrinsic restriction as seen in the expression t = (μ, β 2 0 /2 + (σ 2 + − 1), β 2 0 , β 2 1 , β 0 β 1 ) τ .

Proof of Theorem 2
The test problem in Theorem 2 is different from that of Theorem 1 because we do not assume knowledge of the ρ 0 value. The parameter vector is now θ = (μ 1 , μ 2 , σ 1 , σ 2 , ρ) τ including the correlation coefficient ρ.
Since the MLEs in Theorem 2(a) and (b) both satisfy the conditions of Lemma 3 with C = 0, Lemma 3(c) can be used to establish the order con-clusions of the MLEs in Theorem 2. Next, we focus on the limiting distribution results. We start with some preparation.