Aﬃne-invariant rank tests for multivariate independence in independent component models

: We consider the problem of testing for multivariate independence in independent component (IC) models. Under a symmetry assumption, we develop parametric and nonparametric (signed-rank) tests. Unlike in independent component analysis (ICA), we allow for the singular cases involving more than one Gaussian independent component. The proposed rank tests are based on componentwise signed ranks, `a la Puri and Sen. Unlike the Puri and Sen tests, however, our tests (i) are aﬃne-invariant and (ii) are, for adequately chosen scores, locally and asymptotically optimal (in the Le Cam sense) at prespeciﬁed densities. Asymptotic local powers and asymptotic relative eﬃciencies with respect to Wilks’ LRT are derived. Finite-sample properties are investigated through a Monte-Carlo study.


Introduction
In many sampling and experimental designs, multiple measurements are obtained on each observational unit, resulting in multivariate observation vectors. It is often of interest to explore whether two or several subvectors are interrelated. This typically requires a test of independence between two vectors. To be more specific, let X 1 , . . . , X n be a sample of i.i.d. observations from a p-variate distribution with cumulative distribution function (cdf) F and write X i = (X (1) i , X (2) i ) , i = 1, . . . , n, where X (1) i and X (2) i are p 1 -variate and p 2 -variate subvectors, respectively (p = p 1 + p 2 ). We wish to test the null hypothesis H 0 : X (1) i and X (2) i are independent.
If the observations X 1 , . . . , X n come from a multinormal distribution with mean μ and covariance matrix Σ = Σ 11 Σ 12 Σ 21 Σ 22 , (1.1) the null hypothesis of independence says that the p 1 × p 2 matrix Σ 12 is zero. The likelihood ratio test for H 0 : Σ 12 = 0 is due to Wilks (1935) and rejects the null at asymptotic level α if and only if where the sample covariance matrix S is partitioned as Σ in (1.1) and where χ 2 d;1−α stands for the α-upper quantile of the χ 2 d distribution. Another classical test of independence based on the sample covariance matrix is the Pillai (1955) trace test, which rejects the null (still at asymptotic level α) if and only if (2) i ) , i = 1, . . . , n (throughout, A = (tr (AA )) 1/2 is the Frobenius norm of the matrix A).
It is important to note that Wilks' test and Pillai's test may be used for testing independence even when the normality assumption is not met. This is due to the fact that the covariance matrix functional Σ = Σ(F ) has the socalled independence property: if X (1) i and X (2) i are independent, then Σ 12 (F ) = 0. Under H 0 with finite second-order moments, W and P are asymptotically equivalent (that is, W = P + o P (1) as n → ∞; see (5.1)), so that they admit the same asymptotic null (χ 2 p1p2 ) distribution and share the same asymptotic powers under any sequence of contiguous alternatives; see Section 5.
Note also that both W and P are invariant under the group of block-affine transformations x → Ax + b, associated with any p-vector b and any invertible matrix A of the form A = diag(A 11 , A 22 ), where A is p ×p ( = 1, 2); throughout, diag(B 1 , B 2 , . . . , B m ) stands for the block-diagonal matrix with diagonal blocks B 1 , B 2 , . . . , B m . This block-affine-invariance property-in the sequel, we will simply write affine-invariance-is natural in cases where the components of X (1) i and X (2) i do not have any fixed specified meaning or label, so that the observations (subvectors) could have been taken in another coordinate system as well. Affine-invariance also ensures distribution-freeness of W and P (under the null) with respect to the variance-covariance structures of both X (1) i and X (2) i . Aiming at invariance with respect to componentwise monotone increasing transformations as well (hence, at distribution-freeness and validity under broad conditions, excluding any moment assumption), Puri and Sen (1971) proposed a class of nonparametric tests based on componentwise rankings and componentwise score functions K 1 , . . . , K p defined over (0, 1) (the latter are normalized so that E[K r (U )] = 0 and E[K 2 r (U )] = 1 for all r, where U is uniformly distributed over (0, 1)). As Wilks' statistic in (1.2), the proposed test statistic is where the p × p rank-based covariance matrixS (still partitioned as Σ in (1.1) above) has entry (r, s) given by where R ir denotes the rank of X ir among X 1r , . . . , X nr (here, X ir stands for the rth component of X i ). The classical sign covariance matrix and Spearman's rho matrix are obtained as special cases (through sign and Wilcoxon score functions, respectively). At the null of independence, W K , under general assumptions, is asymptotically χ 2 p1p2 . The resulting tests extend to any dimension the univariate (p 1 = p 2 = 1) quadrant test of Blomqvist (1950) (sign score function) and the popular univariate test due to Spearman (1904) (Wilcoxon score function). The test statistic W K is invariant under monotone transformations of the components of the X i 's, but is not affine-invariant (in the sense described above). Most importantly, using componentwise ranks of the standardized subvectorŝ Affine-invariant nonparametric tests for independence have been developed as well. Gieser and Randles (1997) proposed a simple nonparametric test that generalizes the quadrant test and is based on the Randles (1989) concept of interdirection counts. Taskinen, Kankainen and Oja (2003) proposed a related but more practical affine-invariant extension of the quadrant test based on spatial signs. Later, Taskinen, Oja and Randles (2005) developed invariant tests which are multivariate extensions of the univariate tests due to Kendall (1938) and Spearman (1904). Their tests are based on interdirection counts, spatial signs and spatial ranks, and provide intuitive, practical and robust alternatives to multivariate normal theory methods. To make the test statistics affine-invariant, both subvectors are standardized before spatial signs and ranks are formed. Taskinen, Kankainen and Oja (2004) developed rank score tests based on the spatial signs and the ranks of the lengths of standardized marginal vectors. All these affine-invariant nonparametric tests avoid any moment assumption.
in general do not have an elliptical distribution under the alternative.
In this paper, we consider a more flexible and natural model, namely the independent component (IC) model, where the p components of Z i = (Z (1) i , Z (2) i ) are assumed to be symmetric mutually independent random variables. This model is widely used by engineers in blind source separation problems, and is related to independent component analysis (ICA). After some possible permutation of the components of Z i , the null of independence still is H 0 : Λ 12 = Λ 21 = 0 in this IC model. But we argue that the latter (i) is more natural (since alternatives to independence belong to the IC model, whereas dependent marginal in the "elliptical" model in (1.5) are not elliptical) and (ii) is also richer (since parametric IC submodels are obtained by fixing p univariate densities, namely those of the marginals of Z (1) i and Z (2) i , which allows, e.g., for heterogeneous tail weights across marginals). Note that, although the central symmetry assumption on Z i may seem strong at first sight, it is of course much weaker than the elliptical symmetry assumption required by the affine-invariant nonparametric tests above.
In those IC models, we adopt the same methodology as in Ilmonen and Paindaveine (2011) 1 and define classes of affine-invariant parametric and nonparametric tests of multivariate independence. The nonparametric procedures are based on the componentwise signed ranks of the estimated (in the null model) independent components. Similarly as the affine-invariant nonparametric procedures designed for the elliptical model, the tests we propose do not require any moment assumption. Our tests however have two important advantages over their affine-invariant nonparametric competitors: (i) as explained above, they are defined in a model of dependence that is much more satisfactory than the elliptical one, and (ii) they allow for local and asymptotical optimality (in the Le Cam sense) at prespecified densities (provided that they are based on adequate score functions K r ).
An important issue in the paper will be the singularity arising in IC models when the assumption that at most one independent component is Gaussian, is violated. In ICA (that is, in a point estimation context), this standard assumption is made throughout since it essentially guarantees identifiability of Λ; see Section 2 or Ilmonen and Paindaveine (2011). However, in the problem of testing for multivariate independence considered here, this assumption is much too strong, as it would, e.g., rule out the multinormal case. We therefore must study carefully the resulting possible singularity of IC models, which, as we will see, has a deep impact on the asymptotic distributions of our optimal tests. To the best of our knowledge, the problem of investigating the nature of this singularity has never been touched in the literature.
The paper is organized as follows. Section 2 describes the IC model under consideration and states its uniform local asymptotic normality (ULAN) property. Section 3 exploits this ULAN structure to define optimal parametric tests of independence in IC models. Nonparametric (signed-rank) versions of these tests are proposed in Section 4. The properties of the classical Gaussian tests (the Wilks and Pillai tests) in IC models are investigated in Section 5.1, whereas Section 5.2 derives the asymptotic relative efficiencies of our nonparametric tests with respect to those Gaussian competitors. Section 6 discusses the practical implementation of the proposed tests and Section 7 investigates their finitesample properties through a Monte-Carlo study. Finally, the Appendix collects technical proofs.

IC models, ULAN, and multivariate independence
In this section, we define the IC models in which we will test for multivariate independence, and state the ULAN property on which the construction of the proposed optimal tests will be based. We then introduce the problem of testing for multivariate independence in such models.

IC models and ULAN
Denote by G p a subset of the collection of invertible p × p real matrices Λ obtained by fixing the order and "signs" of columns in some prespecified way, in the sense that, if Λ ∈ G p , then the only matrix ΛP S that also belongs to G p is Λ itself, where P and S are any permutation and sign-change matrices, respectively (i.e., matrices respectively obtained by permuting the columns of the p-dimensional identity matrix I p or by changing signs of the entries of the same). For instance, one can let the sign of the entry with largest absolute values in each column be positive, and then order columns in such a way that those largest absolute values form an increasing sequence (in case of ties, one can then base the ordering/signs on the basis of the second largest absolute values, etc.) Now, further denote by F the collection of probability density functions (pdf's) g (with respect to the Lebesgue measure on R p ) of absolutely continuous random vectors Z = (Z 1 , . . . , Z p ) whose marginals are (i) mutually independent, (ii) symmetric about the origin (−Z r D = Z r for all r), and (iii) standardized so that Med[Z 2 r ] = χ 2 1;.5 for all r. For g ∈ F, we will often decompose g into g(z) =: p r=1 g r (e r z) in the sequel, where e r denotes the rth vector of the canonical basis of R p .
We then throughout assume that the following independent component (IC) model holds.
Assumption (A). For some μ ∈ R p , Λ ∈ G p , and g ∈ F, the p-variate observations X 1 , . . . , X n are generated by In the sequel, we denote the corresponding hypothesis by P n μ,Λ,g or P n ϑ,g , and the marginals of Z i are called the independent components (ICs). The location μ is of course a welldefined parameter since it is the unique center of symmetry of the common distribution of the X i 's. Also, it follows from Theis (2004) that, in cases where at most one IC is Gaussian (cases we do not want to restrict to in the sequel; see the comments at the end of this section for more details), the parameters Λ and g are identifiable. Note indeed that, since Λ ∈ G p and g ∈ F, the order, scales, and signs of the ICs are fixed. For the sake of illustration, we will later consider Gaussian and t-distributed ICs. We say the rth IC is Gaussian (resp., is t ν , ν > 0) if and only if g r (z) = (2π) −1/2 exp(−z 2 /2) (resp., g r (z) = c ν σ −1 ν (1+ν −1 σ −2 ν z 2 ) −(ν+1)/2 , where c ν is a normalization constant). Here, σ ν is such that Med[Z 2 r ] = χ 2 1;.5 if Z r has pdf g r (note that lim ν→∞ σ ν = 1). When compared to variance-based standardizations, the median-based one we use throughout has the advantage of avoiding any moment assumption (other such standardizations might be adopted, though; see, e.g., Chen and Bickel (2006)).
As the density g remains unspecified in practice, the semiparametric IC model P n = {P n ϑ,g : ϑ ∈ Θ, g ∈ F} is to be considered. Proposition 2.1 below, which is crucial for the construction of optimal tests in Section 3, states that most fixed-g parametric submodels P n g = {P n ϑ,g : ϑ ∈ Θ} are ULAN. More precisely, ULAN requires that the noise density g belongs to the collection F ULAN of densities in F that (i) are absolutely continuous with respect to the Lebesgue measure (in the sequel, we let ϕ gr := −ġ r /g r , whereġ r stands for the a.e.-derivative of g r ) and (ii) have finite second-order moments (σ 2 gr := ∞ −∞ z 2 g r (z)dz < ∞ for all r), finite Fisher information for location (I gr := ∞ −∞ ϕ 2 gr (z)g r (z) dz < ∞ for all r), and finite Fisher information for scale (J gr := ∞ −∞ z 2 ϕ 2 gr (z)g r (z) dz < ∞ for all r).
If g ∈ F ULAN , J gr > 1 for all r (see, e.g., Hallin and Paindaveine (2006)). As for the other factors, I gr σ 2 gr ≥ 1, where the equality holds if and only if g r is Gaussian. Hence, the information matrix Γ ϑ,g is nonsingular if and only if at most one IC is Gaussian. Actually, it can be shown that if exactly q ICs are Gaussian, then the rank of Γ ϑ,g is p(p + 1) − q(q − 1)/2 (an explicit expression of the Moore-Penrose pseudo-inverse of Γ ϑ,g is readily obtained from (A.3) in the Appendix).
This possible singularity of IC models is well-known. In standard ICA, i.e., in a point estimation context, it is actually assumed that at most one IC is Gaussian, which guarantees identifiability (up to postmultiplication by some permutation, sign-change, and scale matrices) of the mixing matrix Λ to be estimated; see, e.g., Theis (2004). The ULAN result above sheds some light on the nature of this singularity in an asymptotic sense, in terms of Fisher information matrices. In the problem of testing for multivariate independence however (see Section 2.2 below), the parameter Λ need not be fully identified, and having several Gaussian ICs does not hurt (for instance, testing for multivariate independence at the multinormal model, where all ICs are Gaussian, clearly makes sense). Still, a careful treatment of the possible singularity of Γ ϑ,g will be required when studying the asymptotic properties of the proposed tests.

Multivariate independence in IC models
Assume that the true model is the semiparametric IC model P n above. As in the Introduction, we partition the observations into X i = (X is a p -variate vector ( = 1, 2), and partition accordingly Z i , μ, and Λ into (Z (1) i , Z (2) i ) , (μ 1 , μ 2 ) , and respectively. We consider the problem of testing H IC 0 : Λ 12 = Λ 21 = 0 (under which the subvectors X (1) i and X (2) i are independent) against the alternative that at least one entry in Λ 12 or Λ 21 is non-zero. Clearly, the location vector μ, the marginal null mixing matrices Λ ( = 1, 2), and the noise density g are nuisance parameters.
Denoting by M(Ω) the vector space that is spanned by the columns of the full-rank (p + p 2 ) × (p + p 2 1 + p 2 2 ) matrix (2.1) the null can be written as H IC 0 : ϑ ∈ M(Ω)(∩Θ), hence imposes a set of linear constraints on ϑ. This plays an important role in the sequel, as the form of optimal (in the Le Cam sense) tests for linear constraints on the parameters of ULAN models is well-known; see Section 3.1.
Clearly, ϑ is not specified under the null. As we will see in Section 3.1, our parametric tests will be based on a sequence of estimatorsθ (n) = (μ (n) , (vecΛ (n) ) ) satisfying Assumption (B) below-the nonparametric ones actually require slightly different estimators, satisfying Assumption (B ); see Section 4.2.
(iv) locally asymptotically discrete: for all ϑ ∈ M(Ω), and all c > 0, there exists an M = M (c) > 0 such that the number of possible values ofθ n in balls of the form {t ∈ R p(p+1) : n 1/2 t − ϑ ≤ c} is bounded by M , uniformly as n → ∞.
Assumption (B) is extremely mild; (B(i)) stresses thatθ (n) should be obtained by fitting the null model (typically by running two separate ICAs on the subvectors X (1) i and X (2) i , i = 1, . . . , n). The rate required in (B(ii)) is the regular one in ICA (see, e.g., Chen and Bickel (2006)). The (natural) affineequivariance ofμ (n) andΛ (n) in Assumption (B(iii)) will guarantee the affineinvariance of the proposed tests. Finally, Assumption (B(iv))-which is needed for the parametric versions of our tests only (compare with Assumption (B ) in Section 4.2)-is a purely technical requirement, with little practical implications (for fixed sample size, any estimator indeed can be considered part of a locally asymptotically discrete sequence). Most importantly, as far as the asymptotic properties of our tests are concerned, it turns out that no best choice ofθ (n) does exist in the class of estimators satisfying Assumption (B) (or (B )); we will indeed show that the asymptotic behavior of our tests is not affected by this choice (see however Section 6.1 for a discussion of finite-sample issues and a recommended practical solution).
We end this section with the following important remark: the problem of testing the null H 0 , under which X (1) i and X (2) i are independent, only imperfectly translates, in the semiparametric IC model P n , into that of testing H IC 0 : Λ 12 = Λ 21 = 0. Indeed, whereas H IC 0 ⊂ H 0 always holds, H 0 ⊂ H IC 0 may fail to hold at some noise densities g ∈ F; an extreme example is the following: if for any orthogonal matrix O and any p-vector μ. More generally, the number of Gaussian ICs in each subvector plays a crucial role in the non-equivalence between H 0 and H IC 0 , which is thus clearly related to the possible singularity of IC models.
Although the primary objective of this paper is to test the null H 0 of multivariate independence, the main focus in the next sections will be on the null H IC 0 , mainly because, as already mentioned, the form of optimal tests for the latter is known in this ULAN setup. This implies, however, that the tests we will derive for H IC 0 will have to be reevaluated when considered as tests for H 0 . As we will see, our tests will also appear as excellent procedures for the null H 0 of interest.

Optimal parametric tests
In this section, we build optimal tests for H IC 0 under the assumption that the underlying noise density g is known to be some fixed f ∈ F ULAN (this highly unrealistic assumption will be relaxed later). We start with a definition of the optimality concept that is considered in the paper and a reference to Le Cam (1986) explaining how to define optimal tests in the present context. We then provide an explicit expression for the optimal test statistics and derive their asymptotic distributions, both under the null and under sequences of contiguous alternatives.

Local and asymptotic most stringency in ULAN models
Let C α be the collection of α-level tests for some generic testing problem H 0 versus H 1 . We say that the test φ is most stringent in C α if and only if at P (of course we regard here φ, φ 0 , and φ as test functions taking values in [0, 1]). In other words, a test is most stringent at level α if and only if it minimizes the maximum (in P ∈ H 1 ) lack of power (at P ) with respect to the maximal power that can be achieved (at P ) by an α-level test.
Lucien Le Cam showed that when a ULAN result such as that of Proposition 2.1 holds at g = f , a locally and asymptotically most stringent test (see Le Cam (1986), Section 10.9, for a precise definition of what is meant here by "locally and asymptotically") for a linear null hypothesis of the form whereθ is an estimator satisfying Assumption (B(i)-(ii), (iv)) and where B − denotes the Moore-Penrose pseudoinverse of B (that is, the unique matrix C such that BCB = B, CBC = C, (BC) = BC, and (CB) = CB).

Optimal parametric tests for multivariate independence
It follows from the previous sections that a locally and asymptotically most stringent test for H IC 0 in the parametric IC model P n f = {P n ϑ,f : ϑ ∈ Θ} rejects the null for large values of the test statistic in (3.1), where Δ (n) ϑ,f and Γ ϑ,f are respectively the central sequence and information matrix in Proposition 2.1 and where Ω is the matrix in (2.1).
We now show that Q f can be rewritten under a simple explicit form, which makes clear why Q f might detect some possible dependence between X (1) i and X (2) i . First note that, since both Ω and Γ ϑ,f are block-diagonal and since Ω 1 = I p , For obvious reasons, the following result is crucial to obtain explicit expressions for Q f .
if both f r and f s are Gaussian, and otherwise.
If f r and f s are Gaussian (with variance one, since f ∈ F), I fr = 1/σ 2 fr = 1 = 1/σ 2 fs = I fs , which explains that α r,s (f ) and β r,s (f ) then cannot be defined through (3.3). In all other cases, γ r,s (f )γ s,r (f ) < 1 (see the comments after Proposition 2.1), so that the quantities in (3.3) are thus well-defined. Applying Lemma 3.1 to (3.2) and writingẐ i for Z i (θ) straightforwardly yields Now, since the rows (hence also the columns) ( − 1)p + 1, . . . , ( − 1)p + p 1 ( ∈ S 1 ) and ( − 1)p + p 1 + 1, . . . , ( − 1)p + p ( ∈ S 2 ) of the symmetric matrix M f contain zeros only, we obtain that where the vec d operator is defined by As M f is symmetric and positive semidefinite, Q f can be interpreted as a squared norm of the generalized cross-covariances n −1/2 Tθ ,f between the estimated residuals Z (1) i (θ) and Z (2) i (θ). The word "generalized" stresses that the residuals Z i (θ) are weighted by ϕ f , which allows for achieving (local and asymptotic) optimality at f . Note that, at the multinormal, those generalized cross-covariances boil down to the standard ones (see (3.6) below). This intuitive interpretation for Q f makes clear why This decomposition of Q f into a sum of (asymptotically independent) quadra- 2.1); incidently, note that these pairs would not be asymptotically independent in asymmetric IC models, which would lead to much more complicated test statistics. If both f r and f s are Gaussian, then which, as we will show, is asymptotically χ 2 1 under the null. In all other cases, Q f ;r,s is asymptotically χ 2 2 under the null. This explains that the number of degrees of freedom in the asymptotic null distribution of More precisely, we have the following result, which summarizes the asymptotic properties of the test based on Q f (below, χ 2 d (δ) denotes the noncentral chi-square distribution with d degrees of freedom and noncentrality parameter δ).

Theorem 3.1. Let Assumptions (A) and (B) hold at
is locally and asymptotically most stringent, at asymptotic level α, It is easy to check that, ifθ is affine-equivariant (in the sense of Assumption (B(iii)), then Q f is affine-invariant.

Two particular cases
Consider the case for which all ICs are Gaussian, that is, the multinormal case (f = φ, say). Then q (φ) = p ( = 1, 2), α r,s (φ) = β r,s (φ) = 1/4 for all r, s, ϕ φ (z) = z, and so that the corresponding test rejects H IC 0 at asymptotic level α as soon as This test is valid at the multinormal only. Under finite second-order moments, however, it can be robustified via some standard "studentization", which actually yields-as we show in Section 5-a test that is asymptotically equivalent, under the null (hence also under sequences of contiguous alternatives), to the classical Wilks and Pillai tests of multivariate independence described in the Introduction.

Optimal signed-rank tests
The main drawback of the parametric tests φ f above is their lack of robustness, as they in general do not meet the asymptotic α-level constraint if the noise density is misspecified. In this section, we robustify those tests by defining asymptotically distribution-free (signed-rank-based ) counterparts, and investigate the properties of the resulting nonparametric tests.

Signed ranks and invariance
The signed ranks of the residuals for the indicator function of set A). When no ambiguity is possible, we will not stress the dependence in ϑ.
Restricting to signed-rank tests (i.e., to tests that are measurable with respect to the signed ranks of the residuals) is justified by standard invariance arguments, which, in IC models, take the following form. Denote by H the collection of transformations h of R p defined by h((z 1 , . . . , z p ) ) = (h 1 (z 1 ), . . . , h p (z p )) , where the functions h r , r = 1, . . . , p are continuous, odd, and monotone increasing functions that fix (χ 2 1;.5 ) 1/2 and +∞. For each ϑ = (μ , (vec Λ) ) ∈ Θ, consider then the group of componentwise monotone increasing transformations (of (R p ) n ) G ϑ = g ϑ h : h ∈ H , defined by g ϑ h (X 1 . . . , X n ) = (Λh(Z 1 (ϑ)) + μ, . . . , Λh(Z n (ϑ)) + μ). It is easy to check that the corresponding maximal invariant is the collection of signed ranks (S i (ϑ), R i (ϑ), i = 1, . . . , n). Now, the null submodel with value ϑ of the parameter-that is, the family P n ϑ : The invariance principle therefore suggests restricting to tests that are measurable with respect to the corresponding maximal invariants, i.e., restricting to signed-rank tests. Since, moreover, P n ϑ is generated by G ϑ for each ϑ, signed-rank tests are strictly distribution-free with respect to the noise density g (actually, since ϑ is to be estimated, only asymptotic invariance-hence, also asymptotic distributionfreeness-will be achieved).
In the discussion above, distribution-freeness is with respect to g ∈ F. Note that distribution-freeness with respect to ϑ(∈ M(Ω)) will also follow from invariance arguments; the relevant group of transformations, in this case, is the group of affine transformations defined in the Introduction (see the top of page 2374).

Componentwise signed-rank statistics
The proposed signed-rank statistics will involve score functions that must satisfy the following assumption.
Assumption (C). The score functions K, L : (0, 1) p → R p are of the form where U is uniformly distributed over (0, 1), and (ii) can be expressed as the difference of two continuous monotonically increasing functions Assumption (C) is needed for Hájek's classical projection result for linear signed-rank statistics, which actually only requires square-integrability of the scores (see, e.g., Chapter 3 of Puri and Sen (1985)). As we will see in the proof of Lemma 4.2 below, controlling the replacement of ϑ with an estimate however requires the reinforcement of square-integrability into Assumption (C(ii)).
Lemma 4.1. Let Assumptions (A) and (C) hold. Then, for any ϑ ∈ Θ and g ∈ F, E[ T ϑ,K,L − T ϑ,K,L;g 2 ] = o(1) as n → ∞, under P n ϑ,g , where, writing Again, to obtain proper test statistics, appropriate estimators need be substituted for nuisance parameters. Actually, we will need to consider specific centerings for each set of scores, or, more precisely, statistics of the form where the estimatorsθ K :=θ L , (vecΛ (n) ) ) , n ∈ N) satisfy Assumptions (B(i)-(iii)) and are such that L , andΛ (n) are invariant under permutations of the observations. Assumption (B (ii)) is extremely mild, but Assumption (B (i)) may appear quite peculiar. While estimators satisfying Assumptions (B ) are described in Section 6, we point out that the latter assumption could actually be replaced with the same Assumption (B) as for parametric tests, but at the expense of second-order moment assumptions-the replacement of ϑ with a single estimatorθ (n) = (μ (n) , (vecΛ (n) ) ) could indeed then be controlled through appropriate asymptotic linearity results, in the same way as in Lemma A.4(ii) (see the Appendix) for the parametric tests, but this would require ULAN, hence finite second-order moments. Since we want to avoid any moment assumption, we rather adopt Assumption (B ), but the considerations above imply that, if finite moments are not an issue, any estimatorθ (n) satisfying Assumption (B) can then be used in our signed-rank tests.
Jointly with Lemma 4.1, the following result then provides the key for defining distribution-free counterparts to the parametric test φ f introduced in Section 3.

Definition of the proposed tests
It directly follows from the representation result in Lemma 4.1 above that, for any f ∈ F ULAN , the signed-rank statisticT ϑ,f : Hence, the test rejecting H IC 0 at asymptotic level α as soon asQ ϑ,f =T ϑ,f M fTϑ,f > χ 2 d(f );1−α , will inherit, under noise density f , the optimality properties of φ f . However, unlike φ f , this signed-rank test is distribution-free under H IC 0 , hence has asymptotic level α under any noise density g ∈ F. Of course, the actual test (φ f , say) is based onQ f :=T f M fTf , withT f :=Tθ K f ,θ L f ,K f ,L f , but we will show (see Theorem 4.1) that this estimation of ϑ (i) does not affect optimality at f and (ii) actually only weakens (strict) distribution-freeness into asymptotic distribution-freeness (which is sufficient to ensure asymptotic level α under any noise density g ∈ F).
More generally, the (K, L)-score version of the proposed signed-rank tests is the testφ K,L that rejects H IC 0 for large values of for all u ∈ (0, 1)). Sinceφ K f ,L f =φ f , the testsφ K,L extend the f -score ones defined above.
The nonparametric testsφ K,L are to be interpreted in the same way as the parametric ones from Section 3: they reject the null of multivariate independence when the norm of some cross-covariance matrix -in this case, the signed-rank one n −1/2Tθ K ,θ L ,K,L -is too large.

Some particular cases
Before stating the asymptotic properties ofφ K,L , we first consider some particular cases. We start with the important particular case for which there exist λ r , r = 1, . . . , p such that L r (u) = λ r K r (u) for all u ∈ (0, 1) (note that Assumption (C(i)) then implies that λ r = 1/E[K 2 r (U )]). One can then takeθ where (with obvious notation) where the p-variate score functionK is obtained from K by replacing K r with . , p; this shows thatQ K,L then has the simple structure of the Puri and Sen (1971) test statistics. However, we point out that the tests based onQ K,L , unlike the Puri and Sen ones, are affine-invariant.
Three classical score functions and corresponding tests are of this type. (i) Sign tests are obtained for constant score functions (K r (u) = 1 for all r). The resulting test statistic is (ii) Wilcoxon-type tests are associated with linear score functions (K r (u) = √ 3u for all r) and reject H IC 0 for large values of for all r, where Φ stands for the cdf of the standard normal distribution-yield the van der Waerden test statistic Q vdW := n C vdW 2 , (4.5) It is easy to check thatQ vdW coincides with the signed-rank test statisticQ φ achieving optimality at the multinormal (that is, that based on the score func- The resulting van der Waerden test is therefore the distribution-free counterpart to the Gaussian test based on (3.7).
As shown in Theorem 4.1 below,Q vdW =Q φ -asQ S andQ W -is asymptotically χ 2 p1p2 under the null. For any other noise density f , however, the nonparametric test statistic achieving optimality at f , namelyQ f , gives rise to a larger number of degrees of freedom (and to a more complicated structure than that of (4.1)-(4.2)). As an example, we consider the nonparametric counterpart Q φν to Q φν in (3.8), that is, the signed-rank test statistic designed to achieve optimality when the rth IC is t νr , r = 1, . . . , p. Letting where F 1,ν stands for the cdf of the Fisher-Snedecor distribution with 1 and ν degrees of freedom, it is easy to show that which (see again Theorem 4.1 below) is asymptotically χ 2 2p1p2 under H IC 0 (irrespective of the underlying noise density g ∈ F).

Asymptotic properties of the proposed signed-rank tests
Lemma 4.1 implies that, at noise density g,T ϑ,K,L and T ϑ,K,L;g have the same asymptotic behavior under the null, hence also under sequences of contiguous alternatives. UnlikeT ϑ,K,L , the random variable T ϑ,K,L;g is a sum of i.i.d. terms, hence can be studied easily. Defining where we let δ r,s (K, , standard arguments (mainly Le Cam's third lemma) then yield the following lemma (see the Appendix for the proof).

Lemma 4.3. Let Assumptions (A) and (C) hold.
Then, for any ϑ ∈ Θ,T ϑ,K,L is asymptotically normal with mean zero and mean H K,L;g (I p ⊗ Λ −1 )τ 2 under P n ϑ,g (g ∈ F) and under P n ϑ+n −1/2 τ,g (τ = (τ 1 , τ 2 ) ∈ R p × R p 2 , g ∈ F ULAN ), respectively, and covariance matrix H K,L under both. Now, let d(K, L) := 2p 1 p 2 − q 1 (K, L)q 2 (K, L), where q (K, L), = 1, 2 is the number of indices r ∈ S (see Lemma 3.1 for the definition of S ) such that We can then state the main theorem of this paper. K f ,L f is locally and asymptotically most stringent, at asymptotic level α, for ∪ ϑ∈M(Ω) ∪ g∈F {P n ϑ,g } against ∪ ϑ / ∈M(Ω) {P n ϑ,f }. Three comments are in order. First, Part (i) of the result confirms that the asymptotic null distribution of the proposed nonparametric test statistics only depends on the adopted score functions K and L; the resulting asymptotic distribution-freeness in particular is not affected by the (typically unknown) number of underlying Gaussian marginals. Second, since Lemma A.6 in the Appendix), the local asymptotic powers ofφ f under noise density f coincide with those of φ f in Theorem 3.1 (as expected, since both tests share the same local and asymptotic optimality properties at f ). Third, we stress once more that, unlike the Puri and Sen (1971) tests, our signed-rank tests-when based on affine-equivariant estimators in the sense of Assumption (B(iii))-are affine-invariant.

Wilks' and Pillai's tests in IC models and AREs
As mentioned in the Introduction, the most classical parametric tests for multivariate independence are the Wilks (1935) and Pillai (1955) tests in (1.2) and (1.3), respectively. In this section, we first investigate the asymptotic properties of these tests in IC models, and then evaluate the performances of our signed-rank tests by computing their asymptotic relative efficiencies (AREs) with respect to those classical benchmarks.

Wilks' and Pillai's tests in IC models
Writing S = I p + (S − I p ) and S = I p + (S − I p ), = 1, 2, in (1.2) and performing a Taylor expansion, it can be shown that (we use the same notation as in the Introduction) as n → ∞, under any null distribution with finite second-order moments, where Σ = diag(Σ 11 , Σ 22 ) stands for the common population covariance matrix of the X i 's. For any g in the collection F 2 of noise densities in F with finite second-order moments, define S g := diag(σ 2 g1 , . . . , σ 2 gp ). In the IC model under consideration, Σ, at P n ϑ,g , with ϑ ∈ M(Ω) and g ∈ F 2 , is given by (2) g Λ 22 ) (with obvious notation). Direct computations yield that, under P n ϑ,g , still with ϑ ∈ M(Ω) and g ∈ F 2 , hence also under sequences of contiguous alternatives, The fact that W is equal to (5.2) now makes clear why Wilks' test can be regarded as a robustified version of the parametric Gaussian test in (3.7). The following result summarizes the asymptotic properties of T ϑ,φ,g . (e r e s ⊗ e s e r ) for any g ∈ F 2 . Then, for any ϑ ∈ Θ, T ϑ,φ,g is asymptotically normal with mean zero and mean H φ;g (I p ⊗Λ −1 )τ 2 under P n ϑ,g (g ∈ F 2 ) and under P n ϑ+n −1/2 τ,g (τ = (τ 1 , τ 2 ) ∈ R p × R p 2 , g ∈ F ULAN ), respectively, and covariance matrix H φ under both.
The asymptotic properties of Wilks' test (hence, also of Pillai's; see (5.1)) in IC models easily follow from (5.3) and Lemma 5.1.

Theorem 5.1. Let Assumption (A) hold, and denote by φ Wilks
Wilks the test that rejects H IC 0 as soon as W > χ 2 p1p2;1−α . Then, Wilks is locally and asymptotically most stringent, at asymptotic level α, This also shows that Wilks' test does not require finite fourth-order moments (as it is often stated), but second-order ones only; this follows from (5.1) and the fact that, unlike √ n vec(S − Σ), √ n(vec S 12 ) is asymptotically normal under the null as soon as the common distribution of the X i 's has a finite covariance matrix.

Asymptotic relative efficiencies
We here compare the performances of the proposed signed-rank testsφ K,L with Wilks' (equivalently, with Pillai's) through asymptotic relative efficiencies (AREs). If the score functions K, L are such that d(K, L) = p 1 p 2 , these AREs are simply obtained from Theorems 4.1 and 5.1 by computing the ratios of the noncentrality parameters in the asymptotic non-null distributions ofφ K,L and φ Wilks . If d(K, L) > p 1 p 2 , however, the degrees of freedom in the limiting distributions ofφ K,L and φ Wilks do not match and a direct use of the ratio of the noncentrality parameters is no longer valid. We then use the extension of the concept of Pitman ARE to cases where the limiting distributions of the competing tests are of different types; see Nyblom and Mäkeläinen (1983) and Möttönen, Hüsler and Oja (2003). In such a case, the resulting relative efficiency may depend on the common asymptotic level α and power β of the tests. The general result is the following.
( 5.7) whereK, δ r,s , and S g were defined in Sections 4.4, 4.5, and 5.1, respectively. (ii) the Wilcoxon testφ W and the van der Waerden testφ vdW satisfy inf ϑ,τ,g where the infima are taken over all ϑ, τ ∈ R p+p 2 and g ∈ F ULAN for which the corresponding B is symmetric. Moreover, forφ vdW , the lower bound is reached if and only if all ICs are Gaussian.
Part (ii) of this result establishes the very good uniform efficiency properties of our Wilcoxon and-overall-of our van der Waerden signed-rank tests, under type 1 alternatives (this restriction is associated with the symmetry of B in Proposition 5.2(ii)). Such uniform efficiency results, for location problems, were first derived in Hodges and Lehmann (1956) and Chernoff and Savage (1958), for Wilcoxon scores and van der Waerden scores, respectively. As for Part (i), it provides, for arbitrary sequences of alternatives and any dimensions p 1 and p 2 , a very simple expression for the AREs of our Puri and Sen type signed-rank tests (the ones based on (4.1)-(4.2)) with respect to Wilks'.
Unfortunately, such a simple expression does not exist for the other proposed signed-rank tests, namely those for which d(K, L) > p 1 p 2 . To give some insight on the AREs of the latter with respect to Wilks' test, we consider the AREs, under identically distributed ICs (with common pdf g 1 , say), of the signed-rank test (φ f1 , say) designed to achieve optimality when both ICs share some (non-Gaussian) pdf f 1 (we may safely exclude the case for which f 1 is Gaussian since the resulting test, namelyφ vdW , is then of the Puri and Sen type). Lengthy yet straightforward calculations yield that, under the bivariate type j (j=1,2) alternatives in (5.5)-(5.6), we have (5.8), we stick to the same numerator/denominator structure as in (5.4), and we allow for zero noncentrality parameters in the denominator, with obvious interpretation). In particular, at g 1 = f 1 , we simply have which, when f 1 is the pdf of the t ν distribution (with ν > 2, so that ULAN holds), gives (1 + s j ) 2 , j = 1, 2. (5.10) For type 1 alternatives, it is clear that, since c α,β 1 < c α,β 2 for all α, β, the AREs in (5.10) are strictly smaller than one for large ν (e.g., if α = .05 and β = .80, these AREs are .916 and .852 for ν = 5 and ν = 8, respectively; see Table 1), so that Wilks' test asymptotically dominates, when both ICs are t ν , the signedrank test that is optimal under such conditions. This of course is puzzling at first sight. However, our concept of optimality, namely most stringency (see Section 3.1), clearly does not imply that the optimal tests are most powerful under all alternatives, but only that their lack of power with respect to the best test for any fixed alternative is minimal. What occurs in the AREs (5.8)-(5.10) is totally in line with most stringency: our optimal testsφ f1 (f 1 non-Gaussian) pay a price in terms of efficiency along type 1 alternatives (which allows for the superiority of Wilks' there) in order to gain some power along type 2 alternatives, where the local asymptotic powers of Wilks' are equal to the nominal level α. The AREs of the optimal testsφ f1 (f 1 non-Gaussian) with respect to Wilks' under (non-Gaussian) Type 2 alternatives may then be considered as being infinite; see (5.8)-(5.10) again.
Note that Cov[X 1 , X 2 ] = 0 under type 2 alternatives, which explains why such alternatives are more difficult to detect than those of type 1. At the multinormal, X 1 and X 2 are then independent, and type 2 "alternatives" actually belong to the null; hence, optimal tests at the multinormal model can concentrate on being most powerful along type 1 alternatives. That is exactly what Wilks' test and our van der Waerden testφ vdW do. Away from the multinormal, however, tests with more degrees of freedom are needed to discriminate between the null and type 2 alternatives. This also explains why, in Q φ∞ (see (3.9)), the term Q new , which is the limit (as ν → ∞) of a quadratic form allowing to detect type 2 alternatives, may be dropped without affecting optimality at the multinormal model.

Practical implementation
The practical implementation of our parametric tests (resp., of our nonparametric tests) crucially relies on the existence of an estimatorθ (n) = (μ (n) , (vecΛ (n) ) ) satisfying Assumption (B) (resp., the existence of a couple of estimatorsθ L , (vecΛ (n) ) ) satisfying Assumption (B )). Also, our nonparametric tests turn out to be strongly affected, when d(K, L) > p 1 p 2 , by the slow convergence of our test statistics to their limiting distributions. This section discusses these issues and provides practical solutions.

Estimation of ϑ
As stated in Theorems 3.1 and 4.1, the asymptotic properties of our tests do not depend on the choice ofθ,θ K , andθ L (we drop superscripts (n) in this section); still, their finite-sample properties may be affected by this choice. Here, we suggest using a particular class of practical estimates that are robust and easy to implement.
To describe conveniently these estimates, we define G unit k as the collection of matrices Λ ∈ G k for which each column has Euclidean norm one, and consider the k-dimensional IC model X = ΛZ + μ, (6.1) where μ ∈ R k , Λ ∈ G unit k , and Z( D = −Z) has independent marginals (ICs). Note that the ICs are not standardized in (6.1), but that the requirement Λ ∈ G unit k (rather than just Λ ∈ G k ) plays the same role in the mutual identification of Λ and (the distribution of) Z as the standardization of Z in the IC models from Section 2.1.
Consider then two k-variate scatter matrix functionals S a and S b (recall that, if F X is the cdf of a k-variate random vector X, a scatter matrix functional S is a k ×k matrix-valued functional such that S(F X ) is positive definite, symmetric, and affine-equivariant in the sense that S(F AX+b ) = AS(F X )A for any k × k invertible matrix A and any k-vector b). If X comes from the IC model (6.1), the affine-equivariance of S a and S b entails that (see also (7) in Nordhausen, Oja and Paindaveine (2009) Since D is diagonal (this follows from the independence and the symmetry of the marginals of Z; see Theorem 7 in Tyler et al. (2009)), the columns of Λ are made of eigenvectors of S b (F X )(S a (F X )) −1 ; here the order, signs, and norms of these eigenvectors are fixed by the requirement Λ ∈ G unit k . Of course, the resulting estimatorΛ is obtained by replacing S a and S b with root-n consistent estimatesŜ a andŜ b in (6.2). Actually, one should not bother too much about signs and norms of the columns ofΛ since our tests are invariant under reflections (about zero) and rescaling of the estimated ICs (which explains that we may safely consider here the IC model in (6.1) rather than the one in Section 2.1). However, if they involve score functions (K r , L r , say, to adopt the notation used in our nonparametric tests) that are not homogeneous across components (i.e., that depend on r), then our tests are not invariant under permutations of the estimated ICs, and it is crucial to order the score functions K r , L r so that the ordering matches that of the underlying ICs (note that failing to achieve this matching would actually affect only local powers and optimality properties of our tests, but not their asymptotic validity). Typically, if the score functions are the ones allowing to achieve optimality at f ∈ F (i.e., are given by K f , L f ), then one way to achieve this matching is to reorder the score functions (equivalently, the marginal densities f r of f ) so that the diagonal a (F f ) (where F f denotes the cdf associated with f ) have the same vector of ranks as of the diagonal entries of the matrixD ordered defined throughŜ bŜ −1 aΛ =ΛD ordered (withΛ ∈ G unit k ). Now, if 2 ≤ q < k ICs are identically distributed, the corresponding q eigenvalues of S b (F X )(S a (F X )) −1 do coincide, which implies that only the subspace of those q ICs can be recovered through (6.2) but not the individual ICs themselves; in such cases, our tests are not valid, unless those q ICs are Gaussiansee Nordhausen, Oja and Paindaveine (2009) for a discussion. However, if one feels that excluding (non-Gaussian) identically distributed ICs is too much of an assumption, then it is always possible to resort to another estimator of Λ in the literature; but we feel that the estimators we propose in this section are not only easy to compute in practice, but are also very well in line with the nonparametric sign-and-rank spirit of our tests (see the practical estimators we propose below).
Focusing on the problem of testing for independence, we consider the null model (Assumption (B(i)) indeed states thatθ (orθ K /θ L ) should be obtained by fitting the null model) . . . , n, (6.3) where the marginals of the i.i.d. noise inputs Z i = (Z (1) i , Z (2) i ) , i = 1, . . . , n are independent and symmetric about the origin (again, adopting the requirement Λ ∈ G unit p or rather standardizing noise inputs does not affect the behavior of our tests, so that one may here adopt the IC model he/she is most comfortable with). By using, for each = 1, 2, two different p -variate scatter matrix estimatesŜ ( ) a andŜ ( ) b , we can as above define (separately) estimatorsΛ based on X ( ) i , i = 1, . . . , n. The estimator we then propose for ϑ = (μ , (vec Λ) ) iŝ ϑ = (μ , (vecΛ) ) , witĥ  ( = 1, 2) are root-n consistent, this estimatorθ then clearly fulfills Assumptions (B(i)-(iii)); note that, in relation with (B(iii)), the transformed data set A 22 ) and any p-vector b. After appropriate discretization, the estimatorθ therefore satisfies Assumption (B), hence can be used in the parametric tests of Section 3. Our nonparametric tests of Section 4 however require a couple of estimatorsθ K = (μ K , (vecΛ) ) andθ L = (μ L , (vecΛ) ) satisfying Assumption (B ). While one can use the same estimatorΛ as above, we need to define appropriate location estimatesμ K andμ L . Forμ K (one can defineμ L accordingly), we propose adopting the location estimator obtained when using K-score location R-estimators in (6.4) above. More precisely, we suggest usinĝ where the rth component ofξ K is defined as an arbitrary "zero" of the step function where the signed ranksŜ ir (t)K ir (t) are those of y ir − t, i = 1, . . . , n, with y ir := [e r (Λ −1 11 X (1) i )] for r ≤ p 1 and y ir := [e r−p1 (Λ −1 22 X i )] for r > p 1 . By "zero", we here mean an arbitrary value t 0 for which h K r (t − 0 ) × h K r (t + 0 ) ≤ 0; in order to defineμ K unambiguously, we could always choose the such zero that is closest to the sample median of the y ir 's, i = 1, . . . , n. Beyond being robust and root-n consistent without any moment condition, the resulting estimatorŝ μ K andμ L also satisfy Assumption (B (i)-(ii)) (see the end of Section A.2 for a proof), hence can be used in our nonparametric tests.
Finally, we point out that, to avoid any moment assumption in the construction above, our choices forŜ are the Tyler (1987) and Dümbgen (1998) scatter matrix estimates, respectively. These statistics are weakly affineequivariant only, in the sense that, for invertible matrices A and vectors b with appropriate dimensions,Ŝ(AX 1 + b, . . . , AX n + b) is proportional (but in general not equal) to AŜ(X 1 , . . . , X n )A ; however, it is easy to check that this weak affine-equivariance is sufficient to guarantee the (standard) affine-equivariance of the resuting estimators of ϑ. For Tyler's estimate, we need a simultaneous location estimate; a natural choice is the Hettmansperger and Randles (2002) estimate. These estimates are root-n consistent under very weak assumptions, which do not involve any moment condition.

An alternative critical value
Our signed-rank tests in Section 4 are based on the fact that the statistics are asymptotically χ 2 d(K,L) under the null. For some of these statistics however (typically, this happens when d(K, L) > p 1 p 2 ), this convergence is very slow and the test rejecting the null at asymptotic level α whenQ K,L > χ 2 d(K,L);1−α is strongly conservative (hence, biased) for small to moderate sample sizes. Consequently, we recommend using the following alternative critical values, which allow for implementing bias-corrected versions of our nonparametric tests.
If ϑ(∈ M(Ω)) was known, one could use the test statisticQ ϑ,ϑ,K,L -hence, also their estimated versions above-may be regarded as approximate ones forQ K,L . As we show through a Monte-Carlo experiment in the next section, this approach works very well in practice.

Simulations
In this section, we report the results of Monte-Carlo experiments that were conducted to study the small-sample performances of the proposed signed-rank tests and to see how they compete with Wilks'. It was also of interest to investigate how well the finite-sample behaviors of the various procedures were in accordance with the asymptotic results of the previous sections, and we therefore compared non-null rejection frequencies with the corresponding asymptotic powers.
We first considered the bivariate case p 1 = p 2 = 1 and fixed the location μ and the null value of the mixing matrix Λ to 0 and I 2 , respectively (this is without loss of generality since all tests here are affine-invariant). For various distributions of the ICs (see below) and some selected values of δ (including the null value δ = 0), we generated N = 5, 000 independent samples of sizes n = 100, 200, and 500 from the bivariate IC models . . . , n, (7.1) where D is as in (5.5) or (5.6) (type 1 and type 2 alternatives, respectively). For each sample, we performed Wilks' test and the following six signed-rank tests, all at asymptotic nominal level 5%: the sign testφ S , the Wilcoxon testφ W , the van der Waerden testφ vdW (achieving Le Cam optimality in the multinormal case), and the testsφ tν 1 tν 2 , with (ν 1 , ν 2 ) = (3, 3), (3, 5), and (5, 5), based on the (bivariate version of the) statistics in (4.6) (achieving Le Cam optimality when the ICs are t ν1 and t ν2 ). For the last three tests (for which d(K, L) = 2, the convergence to the null asymptotic χ 2 2 distribution appeared to be quite slow, and we therefore used the alternative critical values of Section 6.2 (evaluated, for each such test and each sample size n, on the basis of M = 10, 000 independent samples of n i.i.d. bivariate standard normal observations). For all other tests, the critical values were simply based on the asymptotic χ 2 1 approximation of the null distributions.
For type 1 alternatives, rejection frequencies are reported, as functions of δ, in Figure 1  (2) i are t 1 ). When second-order moments are finite (that is, in the designs considered in Figures 1-2), we also present the corresponding asymptotic powers-computed from Theorems 4.1(ii) and 5.1(ii). The results show that, when both ICs are Gaussian, Wilks' test slightly dominates the asymptotically optimal van der Waerden test (and of course the other signed-rank tests) at sample size 100, but that this dominance fades out as the sample size increases. When the ICs are t 3 and t 5 , the signed-rank tests (excluding the sign test, as expected) are a bit more powerful than Wilks' test, which is in accordance with the AREs of Section 5.2. In each case, finite-sample powers seem to converge quickly to the asymptotic ones. When both ICs are t 1 , it is seen that Wilks' test (which requires finite second-order moments) exhibits poor performances, while the signed-rank tests behave as expected: in particular, the "closer" (t ν1 , t ν2 ) to the underlying couple of IC distributions (t 1 , t 1 ), the better the performances of the asymptotically χ 2 2 testsφ tν 1 tν 2 .
We then consider type 2 alternatives, for which rejection frequencies are reported in Figure  (2) i are t 1 ). In the setup of Figure 3, all tests based on a statistic that is asymptotically χ 2 1 under the null should, according to our asymptotic results, exhibit poor asymptotic powers (in particular, asymptotic powers of Wilks' test should coincide with the nominal level 5%). Quite surprisingly, Wilks' seems to gain some power at the sample sizes considered (we have checked, however, that this unexpected behavior of Wilks' test disappears for larger sample sizes). The finite-sample powers of the signed-rank testsφ tν 1 tν 2 , which here are by far the most powerful ones, converge to the limiting ones, although relatively slowly. For type 2 alternatives with t 1 ICs (Figure 4(b)), Wilks' test, which is extremely conservative, is again strongly dominated by the signed-rank testsφ tν 1 tν 2 .  Wilks' test (φ Wilks ), the sign test (φ S ), the Wilcoxon signed-rank test (φ W ), the van der Waerden signed-rank test (φ vdW ), and various signed-rank tests based on (tν 1 , tν 2 )-score functions (φt ν 1 tν 2 ); see Section 7 for details.
For p 1 = p 2 = 2, the resulting rejection frequencies (and asymptotic powers) are reported in Figure 5 and Figure 6, for type 1 and type 2 alternatives, respectively. These figures show that our tests behave similarly as for p 1 = p 2 = 1 and  Wilks' test (φ Wilks ), the sign test (φ S ), the Wilcoxon signed-rank test (φ W ), the van der Waerden signed-rank test (φ vdW ), and various signed-rank tests based on (tν 1 , tν 2 )-score functions (φt ν 1 tν 2 ); see Section 7 for details. that they are not affected by the estimation of nuisance parameters through the ICA procedures proposed in Section 6 (at least when based, as it was the case here, on the Tyler (1987) and Dümbgen (1998) scatter matrix estimates and on the location estimatesμ K andμ L from (6.5)). Figure 7 provides the corresponding results for type 1 alternatives in the higher-dimensional setup p 1 = p 2 = 10. Clearly, at the same sample sizes as in the low-dimensional cases above, the tests still behave in an excellent agreement with the fixed-p asymptotic theory. In contrast, simulations results not reported here indicate that, for high-dimensional type 2 alternatives, such an agreement shows only for larger sample sizes. Note (2) i having a t 3 marginal and a t 5 marginal, of Wilks' test (φ Wilks ), the sign test (φ S ), the Wilcoxon signed-rank test (φ W ), the van der Waerden signedrank test (φ vdW ), and various signed-rank tests based on (tν 1 , tν 2 , tν 3 tν 4 )-score functions (φt ν 1 tν 2 tν 3 tν 4 ); see Section 7 for details. that our tests still remain of high practical value even for such high-dimensional type 2 alternatives, since many ICA applications, e.g., in signal processing, typically offer large sample sizes.
Waerden test statistic, and the signed-rank test statistics based on t 3 -scores only, t 5 -scores only, and on mixed t 3 -t 5 scores (the three corresponding t-score tests are those used in Figures 1, 5 and 7). Whenever p 1 = p 2 > 1, the implementation of all these test statistics requires evaluating the Tyler and Dümbgen scatter matrices, and the corresponding (still, averaged) computing times are then also shown in Figure 8. The results reveal that, as expected, the computational effort increases both with the dimension and with the sample size. The more complex test statistics, relying on t-scores, are more costly than the sign, (2) i respectively having their first five (resp., last five) marginals that are t 3 distributed (resp., t 5 distributed), of Wilks' test (φ Wilks ), the sign test (φ S ), the Wilcoxon signed-rank test (φ W ), the van der Waerden signed-rank test (φ vdW ), and various signedrank tests based on (tν 1 , . . . , tν 10 )-score functions (φt ν 1 ...tν 10 ); see Section 7 for details (the test denoted asφt 3 ...t 3 t 5 ...t 5 t 3 ...t 3 t 5 ...t 5 is the one that is optimal in the distributional setup considered).
Wilcoxon and van der Waerden ones. Clearly, as the sample size increases, the time to compute the Dümbgen scatter matrix is increasingly large compared to the time required to compute the full test statistic (this can be corrected for by using a less robust scatter matrix). Most importantly, for all dimensions and sample sizes considered, computing times remain quite small for every nonparametric test statistic, with a maximum value of less than 2.5 seconds for sample size n = 500 in dimension p 1 = p 2 = 10. Computing times (in seconds), averaged over 100 replications, of the sign test statistic (S), the Wilcoxon test statistic (W ), the van der Waerden test statistic (vdW), and various signed-rank test statistics based on t 3 -scores only (t 3 ), t 5 -scores only (t 5 ), and on mixed t 3t 5 scores (t 3 -t 5 ). The distributional setups are those considered under the null in Figure 1, Figure 5 and Figure 7, for p 1 = p 2 = 1, p 1 = p 2 = 2 and p 1 = p 2 = 10, respectively. Whenever p 1 = p 2 > 1, the total computing time is partitioned into the (still, averaged) times used to compute the Tyler scatter matrix (green), the Dümbgen scatter matrix (red), and the remaining time used to compute the test statistic.
As a conclusion, this Monte-Carlo study shows that (i) all tests succeed in meeting the asymptotic 5% level constraint at all sample sizes; (ii) the nonnull rejection frequencies of the proposed signed-rank tests are compatible with the corresponding asymptotic powers and AREs derived in the previous sections; most importantly, while they compete reasonably well with the other tests under type 1 alternatives, the optimal signed-rank tests (that are based on a higher number of degrees of freedom) outperform the other tests under (non-Gaussian) type 2 alternatives. Again, this is totally in line with the optimality concept (namely, most stringency) considered. Finally, the computational efforts required remains quite low, which makes the proposed tests applicable in practice.

A.3. Proofs of Lemmas 4.3 and 5.1
Proof of Lemma 4.3. Under P n ϑ,g , the multivariate CLT yields that T ϑ,K,L;g is asymptotically normal with mean zero and covariance matrix H K,L . Under P n ϑ+n −1/2 τ,g , the asymptotic normality of T ϑ,K,L;g with mean H K,L;g (I p ⊗Λ −1 )τ 2 and covariance matrix H K,L follows as usual, by (i) establishing the joint normality (under P n ϑ,g ) of T ϑ,K,L;g and log(dP n ϑ+n −1/2 τ,g /dP n ϑ,g ), then (ii) applying Le Cam's third Lemma (the required joint normality follows from a routine application of the classical Cramér-Wold device).
Proof of Lemma 5.1. Along the same lines as in the proof of Lemma 4.3, the result under P n ϑ,g easily follows from the multivariate CLT, and the one under P n ϑ+n −1/2 τ,g can be obtained by establishing the joint normality (under P n ϑ,g ) of T ϑ,φ,g and log(dP n ϑ+n −1/2 τ,g /dP n ϑ,g ), then applying Le Cam's third Lemma.