Identifiability in linear mixed effects models

In mixed effects model, observations are a function of fixed and random effects and an error term. This structure determines a very specific structure for the variances and covariances of these observations. Unfortunately, the specific parameters of this variance/covariance structure might not be identifiable. Nonidentifiability can lead to complications in numerical estimation algorithms or worse, to incorrect or ambiguous inference. We study the identifiability of normal linear mixed effects models. We derive necessary and sufficient conditions of identifiability and we study identifiability in some commonly used variance-covariance structures. The results are particularly timely, given the recent interest in linear mixed effects models within the longitudinal and functional data analysis literature. With that in mind, we extend our discussion to identifiability in models for scalar responses depending on function-valued covariates.


Introduction
Mixed effects models have moved beyond the original simple models, becoming more complicated, containing more parameters.However, two different sets of parameters producing the same covariance matrix for the observations may cause problems for parameter estimation and for inference.In principle, parameter identifiability is the first thing that should be verified when building a model.As Demidenko (2004, p.118) states, "identifiability may be viewed as a necessary property for the adequacy of a statistical model".
We study identifiability in normal linear mixed effects models.In the next section, we define the classical mixed effects model and show that in an unrestricted form and a specific restricted form, the model is not identifiable.Sections 3 and 4 give sufficient conditions to identify parameters in various models.In Section 5 we discuss identifiability of extended models.

Motivation
In this section, we give the definition of nonidentifiability.Then we introduce the general unrestricted classical linear mixed effects model and give two models that are not nonidentifiable.Definition 2.1 Let y be the vector of observable random variables with distribution function P θ where θ is in the parameter space Θ.This probability model for y is not identifiable if and only if there exist θ, θ * ∈ Θ, with θ = θ * and P θ = P θ * .
Throughout the paper, we assume all random variables follow normal distributions.As the normal distribution is uniquely characterized by its first two moments, identifiability of a normal distribution function then reduces to the identifiability of its mean vector and covariance matrix (Demidenko, 2004, Proposition 10, p.118).
In the standard linear mixed effects model, y is the observable random vector of length n and X and Z are known, non-random design matrices with dimensions n × p, n > p and n × q, n > q respectively.We assume throughout that both X and Z have full column rank.Then The random effects vector u and the error vector are unobservable.This model has been studied and applied by, for instance, McCulloch and Searle (2001).
We also assume that Σ and Σ u do not have common elements.That is, we sometimes assume that Θ = Θ ⊗ Θu where Θ ⊆ Θ and Θu ⊆ Θ u , and Θ contains all n × n positive definite symmetric matrices and Θ u contains all q × q positive definite symmetric matrices.We take as our operating definition of identifiability the ability to identify the parameters β, Σ and The linear mixed effects model has become popular in the analysis of longitudinal and functional data.For instance, the response of individual i at time t can be modelled as α j φ j (t)+ u ik ψ k (t) plus error, where the α j 's are fixed population effects and the u ik 's are individual-specific random effects.
In practice, one usually assumes a more specific structure for Σ , such as Restrictions may lead to identifiability, and such restrictions and their effects on identifiability will be discussed in the next two sections.

Simple sufficient conditions of identifiability
In this section, we find sufficient conditions of identifiability of model (1) assuming Θ = Θ.
A further examination of (2) gives us the following sufficient conditions.
Clearly, if Σ u is known, then ZΣ u Z is known, and so Σ is completely determined.
If Σ is known, then the model is identifiable.To see this, consider ( 2) where K is known and K + I is of full column rank, then the model is identifiable.Suppose by way of contradiction the model is not identifiable.Then (2) holds for (Σ u , Σ ) = (Σ * u , Σ * ) both in Θ with, by assumption, ZΣ u Z = KΣ and ZΣ * u Z = KΣ * .Substituting these expressions into (2) yields Since K + I is of full rank, we must have Σ = Σ * .But, as shown in the previous paragraph, this implies that Σ u = Σ * u .The last condition is similar to a common condition for identifiabililty in simple linear regression models with measurement errors.The model assumes where x i is observed with error having variance σ 2 u .The response y i then has variance σ 2 u + σ 2 .One of the common conditions of model identifiability is to assume the ratio σ 2 u /σ 2 is known.The inverse Σ −1 appearing in our last condition could be viewed as multivariate version of "denominator".
If there are any supplementary data, we may then be able to find an estimate of Σ u , Σ or K and we can treat this estimate as the true value.
The sufficient conditions for identifiability can then be satisfied.

Sufficient conditions of identifiability for a structured Σ
As we observed from Examples 2.1 and 2.2, the model is not identifiable even if we restrict Σ u to be a scalar multiple of a known matrix.In this section, we study the effect of putting restrictions on Σ .In and Proof : Nonidentifiability of the model is equivalent to the existence of (Σ , Σ u ) and (Σ * , Σ * u ) in Θ, not equal, satisfying (2).Note that this is equivalent to having (Σ , Σ u ) and (Σ * , Σ * u ) in Θ with Σ * = Σ satisfying Suppose the model is nonidentifiable.We premultiply (5) by Z , postmultiply it by Z and then pre-and postmultiply by (Z Z) −1 to get This gives (4).To derive (3), premultiply (6) by Z, postmultiply (6) by Z to get which, by (5), is the same as Premultiplying ( 8) by the idempotent matrix H Z gives Substituting (8) into the right side of the above yields (3).
To prove the converse, we want to show that (3) and (4) lead to (5).It is clear from (4) that ( 7) holds.If we can show that (8) holds then we are done since substituting (8) into the right side of (7) yields (5).To show (8), from (3) and the symmetry of Σ − Σ * , we see that Premultiplying the above identity by the idempotent matrix for the left side of the equation, we see that (8) holds. 2 The proofs of the next two corollaries are in the appendix, Section 6.
Corollary 4.1 Let 1 be an n-vector with each element being one.Suppose that the distribution of 1 , . . ., n is exchangeable, that is, the covari- Suppose the parameter space is B ⊗ Θ ⊗ Θ u .model ( 1) is identifiable if and only if H Z J = J.
Comments.The condition H Z J = J means the sum of the elements of each row of H Z is equal to one, and this is an easy condition to check.For the case that q = 1, i.e.Z is a column vector (z 1 , . . ., z n ) where z i = 0, , where The model is identifiable if and only if Z is not a constant vector.
When q = 2, suppose we have the usual simple linear regression model with centered covariates: Then .
and each row of H Z sums to one.Thus unfortunately, the model is not identifiable under this Z combined with the exchangable covariance structure.Comments.Again, the condition on H Z is easy to check.Consider the case q = 1.As we have seen, diagonal elements of H Z equal z 2 i / z 2 j , i = 1, . . ., n.Therefore, the model is identifiable if and only if Z does not have n − 1 zero elements.Consider q = 2 with Z as in (9).The model is identifiable provided, for all i, (1/n) + z 2 i / z 2 j doesn't equal 1.So typically, the model is identifiable.
The following corollary provides a sufficient condition for identifiability, a condition that can sometimes be easily checked.Consider (3).Note that the rank of H Z (Σ − Σ * ) is at most q, since the rank of H Z is q.Thus, for (3) to hold, we must be able to find some Σ and Σ * with the rank of Σ − Σ * less than or equal to q.This proves the following.

Corollary 4.3 Suppose that Θ ⊆ Θ. Then model (1) with parameter space
Clearly Σ − Σ * = (σ 2 − σ * 2 )R is invertible, and so is of rank n, which we have assumed is greater than q.Thus, the model is identifiable.
To show the model in Example 4.2 below is identifiable, we need the following lemma which is a result in (Graybill, 1983, p.285) Lemma 4.1 Let T be the n × n Toeplitz matrix with ones on the two parallel subdiagonals and zeroes elsewhere.Given two scalars a 0 and a 1 , the eigenvalues of the n × n matrix C = a 0 I + a 1 T are Suppose that n − 1 > q.Let the components of have the MA(1) covariance structure, i.e. of the form σ 2 (I + ρT).
In longitudinal or functional data analysis, usually there are N individuals with the ith individual modelled as in (1): Statistical inference is normally based on the joint model, the model of these N individuals.The following corollary gives sufficient conditions for identifiability of the joint model.The intuition behind the result is that, if we can identify Σ u from one individual, then we can identify all of the Σ i 's.10) is identifiable, then the joint model is identifiable.

Proof:
We notice each individual model (10) shares a common parameter, the covariance matrix Σ u .If one individual model uniquely determines Σ u and its Σ i , the identified Σ u will then yield identifiability of all the individual Σ i 's since, if Demidenko shows that the joint model is identifiable if at least one matrix Z i is of full column rank and N i=1 (n i − q) > 0. Using our argument in the previous paragraph, the condition N i=1 (n i − q) > 0 can be dropped.Furthermore, our result can be applied to more general Σ 's.

Extensions
In this section, we discuss identifiability of a model in functional regression for a functional predictor y(•) and a scalar response w.We derive a necessary and sufficient condition of nonidentifiability for this model.
We model y(t) as j α j φ j (t) + k u k ψ k (t) plus error, with the φ j 's and ψ k 's known, the α j 's unknown and the u k 's unknown and random.The dependence of the response w on y is modelled through an unknown functional coefficient, β: w = β 0 + β(t) [y(t)−E(y(t))] dt+η where η is mean 0 normal noise.Thus, for appropriately defined ρ and with u = (u 1 , . . ., u q ) , we can write The predictor y is observed at a sequence of discretized points and the observed values are contained in the vector y, which then follows model We assume model ( 1) and (11).For our purpose of identifiability discussion here, we consider the unknown parameters to be (β 0 , β) and θ = (Σ , Σ u , σ 2 η , ρ).We suppose that (β 0 , β) ∈ B ⊆ ⊗ p and that θ ∈ Θ ⊆ Θ = Θ ⊗ Θ u ⊗ + ⊗ q where Θ and Θ u are as before and + is the set of positive real numbers.
To study identifiability, we must study the distribution of the random vector (y , w).We see that E(y) = Xβ, E(w) = β 0 and (y , w) has covariance matrix We know the parameter β is identifiable if the matrix X is of full column rank.The identifiability of β 0 is also clear.So we focus on identifying the covariance parameters θ.
Our discussion in Section 2 suggests the unrestricted model won't be identifiable.In fact, we can construct an example following Example 2.1 to show the existence of nonidentical θ and θ * both in Θ such that (2) holds and H Z J = J, it is clear that (3) is satisfied.We now show that, for any Σ u ∈ Θ u , there exists s * > 1 so that Σ * u defined as in ( 4) is positive definite whenever 1 < s < s * .This will show that the model is not identifiable, which contradicts our assumption.Plugging Σ − Σ * = (σ 2 − σ * 2 )J into (4) yields By assumption 1 Z = 0 and Z is of full column rank, the matrix (Z Z) −1 Z JZ(Z Z) −1 is non-negative definite and of rank one since J = 11 .Let λ be its nonzero and thus the largest eigenvalue of (Z Z) −1 Z JZ(Z Z) −1 .Let λ m be the smallest eigenvalue of the matrix Σ u , and let s * = λ m /(λσ 2 ) + 1.For any x ∈ q , x = 0, x Σ * u x > 0 by the following argument.
Theorem 4.1 below, we give a necessary and sufficient condition of nonidentifiability, a condition that relies mainly on the design matrix Z via H Z = Z(Z Z) −1 Z .The theorem leads to four corollaries: Corollaries 4.1 and 4.2 give necessary and sufficient conditions for identifiability when Σ arises from an exchangeable covariance structure or is diagonal.Corollary 4.3 states an easily checked condition on Σ that guarantees identifiability of the model.That corollary is then applied to two commonly used error structures.Using Corollary 4.4, we can generalize a known identifiability result, giving a shorter proof under weaker conditions.Theorem 4.1 Let Θ ⊆ Θ and define H Z = Z(Z Z) −1 Z .Then model (1) with parameter space B ⊗ Θ is nonidentifiable if and only if there ex-

Corollary 4 . 2
Suppose that Θ equals the collection of all diagonal positive definite n × n matrices.Then model(1) with parameter space B ⊗ Θ ⊗ Θ u is identifiable if and only none of the diagonal elements of H Z is equal to one.
the parameter space.Now we apply Corollary 4.3 to show model identifiability under the "multiple of a known positive definite matrix" and the "MA(1)" covariance structures respectively in next two examples.Example 4.1 Multiple of a known positive definite matrix Fix R, symmetric and positive definite, and suppose (

( 1 )
. James (2002), Müller (2005) and Heckman and Wang (2007) consider this modelling of y and w and propose different approaches to estimate the functional coefficient, β.

) Example 5 . 1 Lemma 6 . 1
Example 2.1 continued.Let 0 < a < 1, Σ * u and Σ * be as in Example 2.1, and let ρ * = ρ/(1 − a) and σ * η 2 = σ 2 η − aρ Σ u ρ/(1 − a).It is not hard to see (12) and (13) are satisfied.If, in addition, we restrict a < σ 2 η /(σ 2 η + ρ Σ u ρ), we see that σ * η 2 is positive.The following theorem gives a necessary and sufficient condition of nonidentifiability.Given two scalars a and b, the characteristic equation of the matrix C = (a − b)I + bJ in λ is (a + (n − 1)b − λ) (a − b − λ) n−1 , and hence n − 1 characteristic roots are equal to a − b and one root is equal to a + (n − 1)b.Proof :To prove the corollary, we use Theorem 4.1 and a proof by contradiction.First suppose that the model is identifiable and suppose, by way of contradiction,H Z J = J.Fix Σ ∈ Θ .Let s > 1, σ * 2 = sσ2 and ρ * = (ρ − 1)/s + 1.It is not hard to check −1/(n − 1) < ρ * < 1. Define Σ * = σ * 2 [(1 − ρ * )I + ρ * J].Then Σ − Σ * = (σ 2 − σ * 2 )J and, since Suppose that Θ = Θ ⊗ Θu where Θu contains all matrices of the form σ 2 R where R is positive definite and known, and σ 2 > 0. To see that this model is not identifiable, use the same argument as in Example 2.1 and note that the constructed Corollary 4.4 reduces the verification of a joint model's identifiability to the individuals'.For instance, if the i-th individual model has Z i of full column rank and Σ i = σ 2 I n i , where n i is the length of y i , then this individual model is identifiable by Corollary 4.1 and thus so is the joint model.Note