Smooth test for equality of copulas

A smooth test to simultaneously compare $K$ copulas, where $K \geq 2$ is proposed. The $K$ observed populations can be paired, and the test statistic is constructed based on the differences between moment sequences, called copula coefficients. These coefficients characterize the copulas, even when the copula densities may not exist. The procedure employs a two-step data-driven procedure. In the initial step, the most significantly different coefficients are selected for all pairs of populations. The subsequent step utilizes these coefficients to identify populations that exhibit significant differences. To demonstrate the effectiveness of the method, we provide illustrations through numerical studies and application to two real datasets.


Introduction and motivations
Copulas have been extensively studied in the statistical literature and their field of application covers a very wide variety of areas (see for instance the book of [14] and references therein).The problem of goodness-of-fit for copulas is, therefore, an important topic and can deserve many situations as in insurance to compare the dependence between portfolios (see for instance [31]), in finance to compare the dependence between indices (see for instance the book of [7]), in biology to compare the dependence between genes (see [16]), in medicine to compare diagnosis (see for instance [12]), or more recently in ecology to compare dependence between species (see [10]).
In the one-sample case, many testing methods have been proposed in the context of parametric families of copulas (see for instance the review paper of [9], or more recently [23], [6], and [5]).
In the two-sample case, an important reference is the work of [27].They proposed a nonparametric test based on the integrated square difference between the empirical copulas.Their approach requires the continuity of partial derivatives of copulas which allows to obtain an approximation of the distribution under the null.Their test is convergent and is adapted to independent as well as paired populations, and an R package 'TwoCop' is available (see [26]).
When K > 2, [24] proposed an innovative work to compare K copulas.More recently [25] developed a second test statistic with a very original idea based on a generalized Szekely-Rizzo inequality.These tests are consistent and can also be used to test radial symmetry and exchangeability of copulas.However, [24,25] restricted his study to the case of samples of the same size.More precisely both procedures consist of dividing the sample into sub-samples and testing the equality of the associated sub-copulas.Therefore, testing the equality of copulas from independent samples cannot be achieved by these works.Furthermore, in both cases the null distribution is intractable and the author needs a multiplier bootstrap method to implement these tests.Such bootstrap approach for copulas was initiated in [28].Another extension of [27] is proposed in [4] when the K populations are observed independently, but the proposed test statistic seems to work only for testing the simultaneous independence of the K populations.
Recently, [22] studied a nonparametric copula estimator which showed very good numerical results.In this paper, we propose to address the problem of K-copulas comparison with a new approach based on such estimators.We do not directly compare the empirical copulas, but we compare their projections on the basis of Legendre polynomials.We restrict our study to continuous variables whose populations can be paired.Then it makes possible to simultaneously compare the dependence structures of various populations, such as various portfolios in insurance, as well as to compare the same population followed over several periods, such as medical cohorts.Moreover, the procedure is valid for the case of several independent samples with different sample sizes, which is important for applications and a novelty compared to the works cited above, even if the works of [24,25] could certainly be generalized in this direction.
Our method is a data-driven procedure derived from the Neyman's smooth tests theory (see [21]).These smooth tests are omnibus tests and detect any departure from the null.In our case, we consider the orthogonal projections of the copula densities on the basis of Legendre polynomials and we compare their coefficients.For each pair of populations, a penalized rule is introduced to select automatically the coefficients that are the most significantly different.A second penalized rule selects the number of populations to be compared.Thus the procedure is a data-driven method with two selection steps.Under the null, due to the penalties, the rules select only one pair of populations and only one coefficient, leading to a chi-square asymptotic null distribution.Then the test is very simple and easy to implement.This is another major difference from the work of [24,25] where the null distribution does not have an explicit form in general and where a multiplier bootstrap is used to calculate the p-values.We also prove that the test procedure detects any fixed alternative and gives us information on the reject decision.More precisely, the second penalized rule is calibrated to detect the populations that differ most significantly.Then in case of rejection, we can find the pairs of populations that contributed the most to the value of the test statistic.We can also proceed to a two-by-two test to search similar populations.In practice, we have developed an R package 'Kcop' which is available on the Comprehensive R Archive Network (CRAN) to implement the K-sample procedure.
A numerical study shows the good behaviour of the test.We apply this approach on two datasets related to biology and insurance.The first one is the very well-known Iris dataset.While this dataset is very famous there was no simultaneous comparison between the 4-dimensional dependence structures of the three species involved.We therefore propose to apply the smooth test to compare the dependence between sepals and petals, thus providing a new analysis.The second dataset is a large medical insurance database with possibly paired data and concerns claims from three years: 1997, 1998 and 1999.We apply the smooth test on several variables from this dataset illustrating the idea of risk pooling and price segmentation.All these results can be reproduced using the 'Kcop' package.
The paper is organized as follows: in Section 2 we specify the null hypothesis considered in this paper and we set up the notation.Section 3 presents the method in the two-sample case.In Section 4 we extend the result to the K (K > 2) sample case and in Section 5 we proceed with the study of the convergence of the test under alternatives.Section 6 is devoted to the numerical study and Section 7 contains real-life illustrations.Section 8 discusses extensions and connections.
All proofs are located in Appendix A. The adaptation to the dependent case is straightforward and is summarized in Appendix B, where all results are rewritten in this context.A method for automating test parameters is available in Appendix C. Additionally, Appendices D to I contain supplementary materials, including various complements, additional simulations, and comparisons.

Notation and null hypotheses
Let X = (X 1 , . . ., X p ) be a p-dimensional continuous random vector with joint cumulative distribution function (cdf) F X , and with unique copula defined by where F j denotes the marginal cdf of X j .Writing U j := F j (X j ), for j = 1, . . ., p, we have for all u j ∈ [0, 1] , such that L n is of degree n and satisfies (see Appendix D for more details): where δ jk = 1 if j = k and 0 otherwise.The random variables U i are uniformly distributed and we have the following decomposition where as soon as f U exists and belongs to the space of all square-integrable functions with respect to the Lebesgue measure on [0, 1] p , that is, if Write j = (j 1 , . . ., j p ) and 0 = (0, . . ., 0).We can observe that ρ 0 = 1.Moreover, since by orthogonality we have E(L ji (U i )) = 0, for all i = 1, . . ., p, we see that ρ j = 0 if only one element of j is non null.When the copula density exists and is square integrable, we deduce from (1) that, for all u 1 , . . ., u p ∈ [0, 1], where I j (u) = u 0 L j (x)dx, and N p * stands for the set {j = (j 1 , . . ., j p ) ∈ N p ; j ̸ = 0}.The sequence (ρ j ) j∈N p * will be referred to as the copula coefficients (as in [22]).Since U is bounded, all copula coefficients exist.The following result, due to [29] or [17], shows that such a sequence characterizes the copula.Moreover, it shows that assumption (2) is unnecessary.Proposition 1.Let (ρ j ) j∈N p and (ρ ′ j ) j∈N p be two sequence of copula coefficients associated to copulas C and C ′ , respectively.Then Thereby, the copula is determined by its sequence of copula coefficients, a property that holds even when condition (2) is not satisfied, and the copula density may not exist.Consequently, for any continuous random vectors, the comparison of their copulas coincides with the comparison of their copula coefficients.This equivalence holds true even when the random vectors lack a density or possess densities that are not square-integrable.We will use this characterization to construct the test statistic.
We consider K continuous random vectors, namely 1 , . . ., X (1)  p ), . . ., with joint cdf F (1) , . . ., F (K) , and with associated copulas C 1 , . . ., C K , respectively.Assume that we observe K iid samples from X (1) , . . ., X (K) , possibly paired, denoted by (X i,1 , . . ., X i,p ) i=1,...,n1 , . . ., (X The following assumption will be needed throughout the paper: we assume that for all 1 ≤ ℓ < m ≤ K, min(n ℓ , n m ) → ∞, and Write n = (n 1 , . . ., n K ).Hence, it will cause no confusion if we write n → +∞ when all n i → +∞, and for a series of univariate random variable (Q n ) n∈N the notation We consider the problem of testing the equality against From Proposition 1, testing the equality (5) remains to test the equality of all copula coefficients, that is against , where ρ (k) stands for the copula coefficients associated to C k .
We will denote by F (ℓ) j the marginal cdf of the jth component of X (ℓ) and we write For testing (6), we estimate the copula coefficients by where i,j ), and F denotes the empirical distribution function associated to F .Such estimators ρ (ℓ) j1...jp have been extensively studied in [22] where it is shown their excellent behavior.Considering the null hypothesis H 0 as expressed in (6), our test procedure is based on the sequences of differences with the convention that r (ℓ,m) j = 0 when only one component of j is different from zero.This is due to the orthogonality of the Legendre polynomials, leading ρ In order to select automatically the number of copula coefficients, for any vector j = (j 1 , . . ., j p ), we will denote by the L 1 norm and for any integer d > 1, we write The set S(d) contains all non null positive integers j = (j 1 , . . ., j p ) with L 1 norm equal to d and such that j k < d, for all k = 1, . . ., p.We will denote by c(d) := d+p−1 d − p the cardinality of S(d) and we introduce a lexicographic order on j ∈ S(d) as follows: This order will be used to compare successively the copula coefficients.

Two-sample case
We first consider the two-sample case when K = 2 to detail the construction of the test statistics.We want to test We restrict our attention to the iid case, the paired case with n 1 = n 2 being briefly described in Appendix B. To compare the copulas associated with X (1)  and X (2) , we introduce a series of statistics derived from the differences between their copula coefficients.Specifically, for 1 ≤ k ≤ c(2), we define and, for d > 2 and 1 These statistics are embedded and we have for 2

It follows that
contains information enabling the comparison of the copula coefficients ρ (1) j and ρ (2) j up to the norm ∥j∥ 1 = d and ord(j, d) = k.Consequently, for a large value of d, it will be possible to compare the coefficient of high orders using r (1,2) j , while the parameter k allows the exploration of all values of j for the given order.To simplify notation, we write such a sequence of statistics as It can be observed that if j belongs to H(k) then ∥j∥ 1 ≤ k.Moreover, we have the following relation: for all k ≥ 1 and j = 1, . . ., c(k + 1) k+1,j , with the convention c(1) = 0.
Notice that we need to compare all copula coefficients and then let k tend to infinity to detect all possible alternatives.However, choosing a too large value for k can lead to a dilution of the test's power.Following [15], we suggest a data-driven procedure to automatically select the number of coefficients to test the hypothesis H 0 .For this purpose, we set where p n and d(n) tend to +∞, as n 1 , n 2 → +∞, kp n being a penalty term which penalizes the embedded statistics proportionally to the number of copula coefficients used.Roughly speaking, D(n) automatically selects the coefficients that exhibit the most significant differences.Therefore, the data-driven test statistic that we use to compare We consider the following rate for penalty term: Our first result shows that under the null the least penalized statistic will be selected, that is, the first one.

Theorem 1. If (A) holds, then, under
It is worth noting that under the null, the asymptotic distribution of the statistic ) 2 , with j = (1, 1, 0, . . ., 0).In that case, we simply have measures the discrepancy between E(L 1 (U 2 )) and 2 )).This simply means that all other copula coefficients are not significant under the null and are therefore not selected.Asymptotically, the null distribution reduces to that of and is given below.
, where a 1,2 is defined in (4), and where, for s = 1, 2, To normalize the test, we consider the following estimator where We then deduce the limit distribution under the null.

K-sample case
We restrict our attention to the iid case here.The paired case is treated in Appendix B.
Our aim is to generalize the two-sample case by considering a series of embedded statistics.Each new statistic will include a new pair of populations to be compared.We will use the first rule (10) to select a potentially different copula coefficient between each pair.A second rule will then be considered to select a possibly different pair between all populations.To select the pairs of populations we introduce the following set of indices: which represent all the pairs of populations that we want to compare and that can be ordered as follows: we write (ℓ, m) < V (ℓ ′ , m ′ ) if ℓ < ℓ ′ , or ℓ = ℓ ′ and m < m ′ , and we denote by r V (ℓ, m) the associated rank of (ℓ, m) in V(K).This can be seen as a natural order (left to right and top to bottom) of the elements of the upper triangle of a K × K matrix as represented below: We see at once that r V (1, 2) = 1, r V (1, 3) = 2 and more generally, for ℓ, m ∈ V(K) we have We construct an embedded series of statistics as follows: or equivalently, where D(n) is given by (10) and compares the first two populations 1 and 2. The second statistic V 2 compares the populations 1 and 2, and, in addition, the populations 1 and 3.And more generally, the statistic V k compares k pairs of populations.For each 1 < k < v(K), there exists a unique pair (ℓ, m) such that r V (ℓ, m) = k.To choose automatically the appropriate number of pairs k we introduce the following penalization procedure, mimicking the Schwarz criterion procedure [30]: where q n is a penalty term.The choice of q n is discussed in Remark 1.We will need the following assumption: The following result shows that, under the null, the penalty will choose the first element of V(K) asymptotically.This means that all other pairs are not significantly different under the null and do not contribute to the statistic.
Remark 1.In the classical smooth test approach (see [18]), the standard penalty in the univariate case is q n = p n = log(n), a choice closely linked to the Schwarz criteria [30] as detailed in [15].Here, we extend this approach to the multivariate case with the following generalization: Proposition 5 demonstrates that this choice is sufficient for detecting alternatives.In practical applications, the introduction of the factor α serves to stabilize the empirical level, bringing it closer to the asymptotic one.Details on the automatic selection of α can be found in Appendix C, offering a straightforward calibration of the test.It's worth noting that in [13], a comparison between this Schwarz penalty and the Akaike penalty was conducted.The latter proposes a constant value for p n or q n , providing an alternative approach to calibrating the test.
Finally, in the paired case where n := n 1 = . . .= n K , we opt for q n = p n = α log(n).

Alternative hypotheses
We consider the following series of alternative hypotheses: The hypothesis H 1 (k) asserts that for a given k, the populations indexed by ℓ and m with r V (ℓ, m) = k are the first to exhibit a difference, as per the order defined on V(K)).If k = 1, it means that the two first copulas C 1 and C 2 have at least one different copula coefficient.We will need the following assumption: Proposition 3. Assume that (A)-(A')-(B) hold.Then under H 1 (k), s(n) converges in probability towards k, as n 1 , . . ., n K → +∞, and V converges to +∞, that is, P(V < ϵ) → 0, for all ϵ > 0.
Thus a value of s(n) equal to k indicates that the first pairs of populations are equal and that a difference appears from the kth pair (following the order on V(K)).

Numerical study of the test
We choose the penalty ), as indicated in Remark 1.In our proofs, we set α = 1 for simplicity.However, in practice, we enhance this tuning factor empirically using the data-driven procedure outlined in Appendix C. Concerning the value of d(n), conditions (A) and (A') are asymptotic conditions and from our experience setting d(n) = 3 or 4 is enough to have a very fast procedure which detects alternatives where copulas differ by a coefficient with a norm less than or equal to d(n).This parameter can be modified in the package 'Kcop'.In our simulation, we fixed d(n) = 3.The nominal level is equal to α = 5%.

Simulation design
We consider the following copula families: Gaussian, Student, Gumbel, Frank, Clayton, and Joe Copulas (briefly denoted by Gaus, Stud, Gumb, Fran, Clay and Joe).For the explicit forms and properties of these copulas, we refer the reader to [20].For each copula C, the sample is generated with a given Kendall's τ parameter, and we denote it briefly by C(τ ).When τ is close to zero the variables are close to the independence.Conversely, if τ is close to 1 the dependence becomes linear.
In our simulation, we compute empirical levels and empirical powers as the percentage of rejections under the null and alternative hypotheses based on 1000 replicates.We consider the following scenarios: • We first consider the two-sample case where we compare our test procedure to that proposed in [27] which is the competitor we found for dependent as well as independent bivariate observations.Both methods give very similar results.• Then, we consider two cases: a 5-sample case and a 10-sample case.In both situations, alternatives are constructed by modifying τ .
• We also compare the performance of the smooth test to the approach developed in [24] in the K-sample case, with K = 2, 3, 4, restricting our study to sub-samples from the observations as done in [24,25].• A 6-population case is studied where we change copulas, keeping the same τ .
• Finally an additional simulation study is proposed in Appendix H.We compared three Student copulas with df = 5 and with τ = 0.4 or 0.6.

Simulation results in the two-sample case
In this case (K = 2) we consider the procedure of [27] as a competitor.Let us recall that this approach is based on the Cramer-von-Mises statistic between the two empirical copulas and an approximate p-value is obtained through the multiplier technique with 1000 replications.They also proposed a R package denoted by Twocop.By extension, we call our R package Kcop.
The results are very similar for all scenarios and we present the A2norm alternatives in this section, reserving the remaining results for Appendix F. Figures 1-2 illustrate that both methods (Twocop and Kcop) exhibit highly comparable performance.As expected, the more different the Kendall tau, the greater the power.In our simulation, the tau associated with C 1 is fixed and equal to 0.2.The tau associated with C 2 varies and the power is maximal (100%) when it is greater than or equal to 0.7.Conversely, the power is minimal (approaching 5%) when the tau is set at 0.2, corresponding to the null hypothesis.
Alternatives with different tau: we consider the following alternatives hypotheses with C 1 , . . ., C 5 in the same copula family but with different τ as follows Table 1 presents empirical levels (in %) with respect to sample sizes when τ = 0.1, 0.5 and 0.8, respectively.In each case, one can observe that the empirical level is close to the theoretical 5% as soon as n is greater than 200.For n = 50 or 100, two phenomena emerge: the empirical level appears larger than the theoretical level when τ is small and smaller than the theoretical level when τ is large.Hence, with fewer observations, the procedure more readily identifies identical copulas when their dependence structure is stronger.This leads to the following recommendations: for a small size (n < 200) if the estimation of τ is close to 0.1, it is advisable to adopt a more conservative approach (choosing a larger theoretical level, e.g., around 0.09).Conversely, if the estimation of τ is close to 0.9, it is preferable to be anticonservative (choosing a lower theoretical level around 0.02).This implies a slight reduction in power in the first case, while power increases in the second case.A tuning procedure could be considered, incorporating a data-driven criterion based on the estimation of τ .
Concerning the empirical power, Tables 2-4 contain all results under the alternatives.We omit some large sample size results where empirical powers are equal to 100%.It is important to note that, even for a sample size equal to 1000, the program runs very fast.It can be seen for alternatives Alt2 and Alt3 that the empirical powers are extremely high even for small sample sizes.The first series of alternatives yields lower empirical powers since only one copula differs with a slight change in τ .

Ten-sample case
Analogously to the previous 5-sample case, we consider null hypotheses with Gaussian, Student, Gumbel, Frank, Clayton, and Joe copulas.We fixed p = 2.We consider the following alternatives where only one copula differs from the others.
Empirical levels seem to tend fast to 0.5 and are relegated in Appendix I. Table 5 shows empirical powers under alternatives Alt4.We only treat the cases where n = 50 and 100, as beyond these values, all empirical powers are equal to 100%.Remarkably, even for such small sample sizes, we observe very good behavior of the test even with small sample sizes.

Alternatives with the same Kendall's tau
We consider a last alternative hypothesis with C 1 , . . ., C 6 which are the six copulas defined in the null hypothesis models above all with the same τ = 0.55 and with a dimension p up to 5 as follows Empirical powers are presented in Table 6.It can be seen that the power increases with the dimension p when the sample size is less than n = 300: it is then easier to detect differences between the dependence structures of the vectors.When n ≥ 300, the empirical power is stable and equal to 100% in all scenarios.

Testing the equality of all the bivariate sub-copulas of copulas
The purpose of this section is to compare the performance of our test with that obtained by [24].We follow the same design (see Tables 3 in [24]) and we adopt the same notation.More precisely, we simulated data where C is a 2K-dimensional copula and we examine the equality of all the bivariate sub-copulas of U , that is We denote by N (θ) the model where U is generated by the 2K-variate normal copula and by T (θ) the model where U is generated by the Student copula with ν = 3 degrees of freedom, where the correlation matrix Σ is such that θ = Σ 1,2 = Σ 2,1 and Σ i,j = 0.2 for all (i, j) ̸ = {(1, 2); (2, 1)} We compare our procedure (Kcop) to the following quadratic functional procedures proposed in [24]: • Cramér-von Mises (CvM ) statistic, • Two characteristic function statistics, denoted as (Cf 1 ,Cf 2 ), correspond to the weights functions of normal and double-exponential distributions, respectively • Diagonal statistics (Dia).
We refer the reader to [24] for more detail and to code for the program.
The results are provided in Tables 10 and 11.There is no overarching conclusion that allows determining a superior method.The various statistics seem to yield fairly similar results, except in the case of K = 4, where the emprical powers associated with our test statistic appear to be generally superior.

Table 7
Empirical levels for different models studied in [24]

Biology data
We analyze Fisher's well-known Iris dataset.The data consists of fifty observations of four measures: Sepal Length (SL), Sepal Width (SW ), Petal Length (P L), and Petal Width (P W ), for each of three Species: Setosa, Virginica, and Versicolor.We then have K = 3 populations, and the dimension is p = 4.The lengths and widths for the three species are represented in Appendix E. In [8] the authors show that multivariate normal distributions seem to fit the data well for all three Iris species.Looking at their mean parameters the 4-dimensional joint distributions seem different but that does not tell us about their dependence structures.
We propose to test the equality of the dependence structure between the four variables (SL, SW, P L, P W ) in the three-sample case, that is: We consider the data as possibly dependent, with the same sample size n = 50.We then apply the test for paired data.We obtain a p-value close to zero (10 −11 )  and a very large test statistic V = 45.9.We reject the equality of the dependence structure here.The selected rank s(n) is equal to 2. It means that the most significant difference is obtained when considering the statistics associated with population 1 versus 2 (Setosa and Virginica) and population 1 versus 3 (Setosa and Versicolor).
In case of rejection, we can proceed to an "ANOVA" type procedure, applying a series of two-sample tests.Table 9 contains the associated p-values and we conclude with the equality of the dependence structure between Versicolor and Virginica.

Insurance data
Insurance is an area in which understanding the dependence structure among multiple portfolios is crucial for pricing, especially for risk pooling or price segmentation.To illustrate, we examine the Society of Actuaries Group Medical Insurance Large Claims Database, which contains claims information for each claimant from seven insurers over the period 1997 to 1999.Each row in the database presents a summary of claims for an individual claimant in 27 fields (columns).The first five columns provide general information about the claimant, the next twelve quantify various types of medical charges and expenses, and the last ten columns summarize details related to the diagnosis.For a detailed and thorough description of the data available online, refer to [11], accessible on the web page of the Society of Actuaries.In this context, we focus on p = 3 dimensional variables X = (X 1 , X 2 , X 3 ), where X 1 = paid hospital charges, X 2 = paid physician charges, X 3 = paid other charges, for all claimants insured by a Preferred Provider Organization plan providing exposure for members.This consideration becomes pertinent for risk pooling if the objective is to group together similar charge scenarios or for price segmentation to provide similar guarantees for the charges.We employ a procedure with three scenarios to study the dependence structure of X as follows: Three-sample test, paired case.In this case, we consider the same claimants (paired situation) present over the three periods 1997 − 1999.At the end of the data processing, we obtained three samples of size n = 6874 observations.We analyse the dependence structure of the charges X between the three years, that is, we test H The test concluded with the nonrejection of the equality of the three dependence structures, as evidenced by a pvalue = 0.788, a test statistic of V = 0.072 and a selected rank equal to s(n) = 1.Hence, the dependence structure of paid for insured over the three years seems to be similar.It can be an argument for keeping the same distribution of risks on the different charges X 1 , X 2 and X 3 .
Three-sample test, independent case.Here, we narrow our focus to female claimants.The three populations consist of individuals classified by their relationship with the subscriber, which can be "Employee" (n E = 18144 observations), "Spouse" (n S = 10969 observations), or "Dependent" (n D = 10969 observations), all for the year 1999.
Our objective is to test the equality of the dependence structure among the charges X.In this context, we assume independence among the K = 3 populations.Through our testing procedure, we obtain a p-value close to zero.Consequently, we reject the null hypothesis of equal dependence structure for these charges.
Subsequently, applying an ANOVA procedure reveals that the two-by-two equalities are rejected for "Dependent" vs "Employee" and "Employee" vs "Spouse", with a p-value close to zero in each case.The p-value for "Dependent" vs "Spouse" is close to one.
Therefore, the status of being a "Dependent" or "Spouse" implies a similar dependence structure for the charges, distinct from the status of being an "Employee".In the context of risk pooling, differentiating charges between these two groups becomes relevant.
Ten-sample test, independent case.Here, we analyze data from the year 1999 where the relationship to the subscriber is "Employee".We categorize the charges X based on age ranges of three years, creating 10 groups as follows: 1963,1965].
The null hypothesis is H 0 : the dependence structures of these 10-sample groups are identical.Applying our test procedure, we obtain a p-value close to 0 and a test statistic of V = 16.20.Thus, we reject the null hypothesis of equal dependence structure by age at a significant level of α = 5%.
There is evidence to suggest that the dependence structure of X changes over age.We further apply an ANOVA procedure, and the results are presented in Appendix G, Table 12, where a two-by-two comparison is proposed.Notably, there are no significant differences between two successive years.Additionally, Group 6 exhibits a similar dependence structure to the other groups, except for Group 3. The disparity increases with the gap between the years, especially between the first age categories and the last ones.
Observing the age range, we identify two clusters: {Group 1, . .., Group 5} and {Group 6, . .., Group 10}.In terms of price segmentation, this allows the formation of two groups with similar dependencies.

Other similar tests
Some extensions of the K-sample test to various null hypotheses have been studied in [24,2,25].Following his approach we indicate how to adapt the previous test procedure to answer the following hypotheses: , for all permutations j of {1, . . ., K} Clearly, H RS 0 coincides with the radial symmetry, that is (U (1) , . . ., U (K) ) and (1 − U (1) , . . ., 1 − U (K) ) have the same joint distribution, while H Exc 0 means that copulas are pairwise exchangeable.The exchangeable symmetry is represented by H ES 0 .These three hypotheses have been elegantly grouped together and tested in [24,25].We can also adapt our procedure to such hypotheses very naturally by considering the density representation given by (3).For instance, in the two-sample case, testing H RS 0 remains to compare the coefficients E L j1 (U (1) )L j2 (U (2) ) to the coefficients E L j1 (1 − U (1) )L j2 (1 − U (2) ) for all j 1 , j 2 in N p .Asymptotically, under H RS 0 the test statistic coincides with the comparison of E L 1 (U 1 ) and the selected test statistic is which has an asymptotic centred normal distribution under H RS 0 with variance similar to that studied in Proposition 1 of the paper.
In the same way, H Exc 0 consists in comparing E L j1 (U (ℓ) )L j2 (U (m) ) to ) for all ℓ ̸ = m.Under the null hypothesis, the test statistic coincides simply with the comparison of the first coefficients (the least penal- ) , asymptotically.Then the selected statistic under the null is which has asymptotically a centered normal null distribution.
Finally, the same reasoning applies to H ES 0 where the test statistic is asymptotically the same as the previous one.
We propose now to compare the performance of our test to the one developed in [24] for testing the equality of all the bivariate sub-copulas of copulas.We follow the design given in [24](see Table 3).We adopt the same notation and the same design.More precisely, we simulated data U = (U 1 , . . ., U 2K ) ∼ C, where C is a 2K-dimensional copula and we examine the equality of all the bivariate sub-copulas of U , that is We denote by N (θ) the model where U is generated by the 2K-variate normal copula and by T (θ) the model where U is generated by the Student copula with ν = 3 degrees of freedom where the correlation matrix Σ is such that θ = Σ 1,2 = Σ 2,1 and Σ i,j = 0.2 for all (i, j) ̸ = {(1, 2); (2, 1)} We compare our procedure (Kcop) to the following quadratic functional procedures proposed in [24]: • Cramér-von Mises (CvM ) statistic, • Two characteristic function statistics, denoted as (Cf 1 ,Cf 2 ), correspond to the weights functions of normal and double-exponential distributions, respectively, • Diagonal statistics (Dia) We refer the reader to [24] for more details on these procedures and to code on their program.

Table 10
Empirical levels for different models studied in [24]

Conclusion
In this paper, we introduced characteristic sequences, referred to as copula coefficients, for testing the equality of copulas.We developed a data-driven procedure in the two-sample case, accommodating both independent and paired populations.The extension to the K-sample case involves a second data-driven method, resulting in a two-step automatic comparison method.Our approach is applicable to all continuous random vectors, even in cases where the copula density does not exist.Our method differs from the two-sample test proposed by [27] and complements the K-sample test developed by [24,25], enabling the comparison of separate samples.The simulation study demonstrates the effectiveness of our approach, even for more than two populations.The test is user-friendly and performs efficiently.We have limited our simulations to the case of ten samples, but larger dimensions are conceivable with this method.For future exploration, studying high dimensions within limited computation time may require dimension reduction by selecting a limited number of copula coefficients and vector components, which extends beyond the scope of this paper.
Comparing our method to existing approaches in the two-sample case, it appears as efficient as the competitor proposed by [27].In the K-sample case with K > 2, numerical results suggest performance at least as good as those obtained by [24,25].In both cases of comparison, we used the previous models proposed by the authors.An R package of our procedure, named "Kcop," is available on CRAN.
Following the seminal work of [24] we can adapt our procedure to test radial symmetry or exchangeability with a very similar statistic.This idea is already nicely developed in [24,2,25] with a general approach.
Eventually, our approach can be extended in various directions.Two potential directions include: • Copula coefficients can be used to obtain a simplified and unified expression for some measures of association.Let us recall that for any continuous d-dimensional random variable X = (X 1 , . . ., X d ) with copula C, one of the well-known popular multivariate versions of Spearman's rho ρ X (C) can be expressed as (see [20]): . Then Spearman's rho coincides with the first copula coefficients, that is For instance, for d = 3, we have and we deduce a novel estimator of the multivariate Spearman's rho as follows: This estimator opens up possibilities for constructing tests comparing Spearman's rho.However, this requires the calculation of the asymptotic distributions of copula coefficients as proposed in [32].• Secondly, since the copula coefficients characterize the dependence structure, we could use such coefficients for testing independence between random vectors in the same spirit as the penalized smooth tests proposed here.
where c > 0 and c ′ > 0 are constant.

Proof of Proposition 1
From Corollary 6.7 of [29], if µ is a Radon measure on R p for which all moments are finite and if there exists ϵ > 0 such that then µ is said determinate, that is: if ν is a Radon measure with the same moments then ν = µ.Since U is bounded on [0, 1] p , all its moments are finite and ( 15) is satisfied for all ϵ > 0. It follows that its distribution is determinate.■

Proof of Theorem 1
We want to show that P 0 (D(n) > 1) → 0 as n tends to infinity.We have with H(k) satisfying ( 9) and where H * (k) = H(k)\H (1).The last inequality comes from the fact that if a sum of (k − 1) positive terms, say k j=2 r j is greater than a constant c, then necessarily there exists a term r j such that r j > c/(k − 1).The important point here is that card(H * (k)) = k − 1, which corresponds to the number of elements of the form (r . For simplification of notation, we write H * instead of H * (d(n)).Under the null ρ (1) j = ρ (2) j and we decompose (r that we combine with the standard inequality for positive random variables: We now study the first quantity A, the quantity B being similar.Writing ρ(1) where Then we have We first study the quantity involving E j in (22).Write Applying the mean value theorem to E j we obtain From ( 13) and ( 14) there exists a constant c > 0 such that When j belongs to We now combine ( 25) and ( 26) with (22) to conclude that A → 0, as n → ∞.
This expression is very similar to the expansion used in [32] (see his proof of Theorem 1) and [3] (see his equation (3.4)).We imitate their approach here.

Proof of Theorem 3
Let us prove that P(s(n) ≥ 2) vanishes as n → +∞.By definition of s(n) we have: First, we can remark that V(K) is finite and then there is a finite number of terms in (r (ℓ,m) j Comparing (30) and ( 16) we can see that the study is now similar in spirit to the two-sample case and we can simply mimic the proof of Theorem 1 to conclude.■

Proof of Proposition 3
We give the proof for the case k > 1, the particular case k = 1 being similar.For simplification of notation, we now write H instead of H(d(n)).We first show
have a limiting normal distribution and the rest of the terms are all o P (1).Using the expression of the empirical cdf we can rewrite A i have finite variances.Applying the Central Limit Theorem to the independent iid series Z (1) i and Z completes the proof.■ 2≤ord V (ℓ,m)≤k V (ℓ,m) D(n) ≥ (k − 1)p n .Since the previous sum contains (k − 1) positive elements, there is at least one element greater than p n .It follows thatP s(n) ≥ 2 ≤ P ∃(ℓ, m) with 2 ≤ ord V (ℓ, m) ≤ v(K) : V 2≤ord V (ℓ,m)≤v(K) V (ℓ,m) D(n) .It follows that we simply have to show that the probability P(V (ℓ,m) D(n) ≥ p n ) vanishes as n → +∞ for any values of (ℓ, m) .Since D(n) ≤ d(n) have:P(V (ℓ,m) D(n) ≥ p n ) ≤ P(V (ℓ,m) d(n) ≥ p n ) = P 0 n ℓ n m n ℓ + n m j∈H(d(n))

Table 1
Empirical levels (in %) for the five-sample test.

Table 5
Percentage of rejection under alternative Alt4 (ten-sample case).

Table 9 P
-values for the two-sample test (Iris dataset).