Multivariate and functional covariates and conditional copulas

Abstract: In this paper the interest is to estimate the dependence between two variables conditionally upon a covariate, through copula modelling. In recent literature nonparametric estimators for conditional copula functions in case of a univariate covariate have been proposed. The aim of this paper is to nonparametrically estimate a conditional copula when the covariate takes on values in more complex spaces. We consider multivariate covariates and functional covariates. We establish weak convergence, and bias and variance properties of the proposed nonparametric estimators. We also briefly discuss nonparametric estimation of conditional association measures such as a conditional Kendall’s tau. The case of functional covariates is of particular interest and challenge, both from theoretical as well as practical point of view. For this setting we provide an illustration with a real data example in which the covariates are spectral curves. A simulation study investigating the finite-sample performances of the discussed estimators is provided.


Introduction
Assume that (X 1 , Y 11 , Y 21 ), . . ., (X n , Y 1n , Y 2n ) is a sample of n independent and identically distributed triples of random variables.The random variables Y 1i and Y 2i are real and the X i 's are random elements with values in a space E that will be specified later.
Researchers are often interested in the dependence structure of the bivariate outcome (Y 1 , Y 2 ) T when the value of the covariate is fixed at a given level, say X = χ.Consider the following example in the food industry.Spectral analysis is used to obtain information on chemical components in a sample of food, for example in meat.Apart from the spectral curves one also measures the percentages of, for example, fat, protein and water.One of the points of interest is to find out how the percentages of fat and protein relate to each other and how strong they are related.The interest in this paper is to discover how this dependency changes with the form of the spectral curve.This is an example of a situation in which X is of a functional type.See Section 6 for a more detailed description and for the analysis of a real data example using the methodology developed in this paper.
Suppose that the conditional distribution of (Y 1 , Y 2 ) T given X = χ exists and denote the corresponding conditional joint distribution function by If the marginals of H χ denoted as are continuous, then according to Sklar's theorem (see e.g.Nelsen, 2006 [15]) there exists a unique copula C χ which equals where F −1 1χ (u) = inf{y : F 1χ (y) ≥ u} is the conditional quantile function of Y 1 given X = χ and F −1 2χ is the conditional quantile function of Y 2 given X = χ.The conditional copula C χ fully describes the conditional dependence structure of (Y 1 , Y 2 ) T given X = χ.
The estimation of C χ is the subject of the current research.Gijbels et al. (2011) [12] and Veraverbeke et al. (2011) [19] investigated nonparametric estimation of C χ when the covariate is real, that is E = R. Semiparametric estimation of conditional copulas in this case has been studied in Hafner and Reznikova (2010) [13], Acar et al. (2011) [2] and Abegaz et al. (2012) [1].In this paper we introduce a nonparametric estimator of C χ when the covariate space is more complex.
The estimators are of a similar type as these in Gijbels et al. (2011) [12], but their theoretical study and practical use can be quite different depending on the complexity of the covariate space.See Sections 3-6.An estimator of the joint conditional distribution function H χ is where {w ni (χ, h n )} is a sequence of weights that smooth over the covariate space E and h = {h n > 0} is a bandwidth sequence tending to zero as the sample size increases.Then analogously as in Gijbels et al. (2011) [12] one can suggest the following empirical estimator of the copula C χ where F 1χh and F 2χh are the corresponding marginal distribution functions of H χh , i.e., F 1χh (y 1 ) = H χh (y 1 , +∞) and F 2χh (y 2 ) = H χh (+∞, y 2 ).As demonstrated by Gijbels et al. (2011) [12] and Veraverbeke et al. ( 2011) [19] it is often advisable to remove the influence of the covariate on the marginal distributions before the estimation of C χ .This can be done in various ways.One can for instance assume a regression model linking the covariate X with the response Y 1 (Y 2 ) and then replace the original observations (Y 1i , Y 2i ) with the estimated residuals.Gijbels et al. (2011) [12] suggested a very general way of removing the influence of the covariate on the marginal distributions which can be described as follows.First, estimate the unobserved marginally uniform observations (U 1i , U 2i where g 1 = {g 1n } ց 0 and g 2 = {g 2n } ց 0. Second, use the transformed observations ( U 1i , U 2i ) T in a similar way as the original observations and construct where and G 1χh and G 2χh are the corresponding marginals: G 1χh (u 1 ) = G χh (u 1 , 1) and G 2χh (u 2 ) = G χh (1, u 2 ).
In case of a univariate real-valued covariate a thorough study comparing the performances of the two types of estimators C χh and C χh , defined in respectively (1) and (3), can be found in Gijbels et al. (2011) [12] revealing the following practical recommendation: the estimator C χh is generally preferable when the covariate does not influence the marginal distributions; in the opposite situation, it is safer to use the estimator C χh .None of the two estimators is however uniformly (in all situations) outperforming the other one.Moreover, the effect of the covariate on the marginal distributions and/or on the conditional copula itself is often unknown.It is therefore worthwhile to study both estimators.
In this paper we establish weak convergence results of the processes associated with the estimators C χh and C χh , for a multivariate covariate (in Section 2) and for a functional covariate (in Section 3).Nonparametric estimators of conditional association measures are very briefly discussed in Section 4. In a simulation study in Section 5 the finite-sample performances of the estimators, in both the multivariate and functional covariate case, are investigated.In Section 6 the discussed methods are used to analyze a real dataset with a functional covariate.Some further discussions are provided in Section 7. The proofs and technical details are given in the Appendix.

Multivariate covariate -
) is a sample of i.i.d.triples, where X i = (X i1 , . . ., X id ) T is a d-dimensional continuous covariate with a density f X .Analogously as in Veraverbeke et al. ( 2011) [19] one can consider also a fixed design, but then one should think of f X as a design density.
As E = R d the role of the χ is now played by x = (x 1 , . . ., x d ) T .A common system of weights {w ni } is based on the quantity |B| −1/2 K B (X i − x), where K is a d-variate kernel, B is the bandwidth matrix with determinant |B|, and K B (y) = K(B −1/2 y).For simplicity of presentation we will suppose that B = h 2 n I d , where I d is the d-dimensional identity matrix.Then for example the Nadaraya-Watson weights are defined as The definition of local linear weights can be found in e.g.Ruppert and Wand (1994) [16].
In this paper we consider only pointwise convergence.Therefore, let χ = x be a fixed element in E = R d , and define the empirical copula processes Denote by C (1) x , C x the partial derivatives of the copula function C x with respect to u 1 and u 2 respectively.In order to establish the weak convergence results for the processes in (5) we need some regularity conditions, which we discuss first.

Regularity assumptions
Regularity conditions needed for the following theorems can be formulated in an analogous way as in Veraverbeke et al. (2011) [19].Technically speaking, the extension from the univariate to the multivariate covariate case is rather straightforward, and therefore we only very briefly discuss this case.Generally speaking, one has to replace h n with h d n and take into consideration that the covariate is now a vector.The complete list of conditions is to be found in Appendix A.
Similarly as it was argued in Veraverbeke et al. (2011) [19] the assumptions about the weights are satisfied for the Nadaraya-Watson weighting scheme provided that f X (x) > 0 and ∂fX(z) ∂z is continuous in the neighbourhood of the point x.For local linear weights the latter assumption can even be omitted.
When dealing with the d > 1 covariate case however, smoothing operations are needed over the d-dimensional space.This implies that the estimation procedure suffers from the curse of dimensionality, as any nonparametric smoothing technique would do in this case.For a further discussion, see Section 7.

Results for the estimator
Theorem 1. Assume (6), (R1)-(R2) and (W1).Then the following linear asymptotic representation where With the help of the asymptotic representation (7) it is usually possible to describe the limiting process of xn .The choice of the weights has some impact on the variance-covariance structure of the limiting process, through the asymptotic behaviour of the quantity n h d n n i=1 w 2 ni (x, h n ).Thanks to assumptions (W1) there typically exists, in commonly-used weight systems, a finite positive constant V such that For example, for Nadaraya-Watson weights defined in (4), In order to derive the asymptotic bias of the normalized empirical process, we need to study the expectation of the leading term in the linear asymptotic representation in (7).Suppose that there exists H such that n h 4+d n → H 2 , with H ≥ 0, using Taylor expansion and assumption (R1) one can find that (uniformly in (u 1 , u 2 )) (10) with D K being a vector and E K being a matrix of constants depending on the chosen system of weights {w ni } and on the type of the design (see the Assumptions (W3) in the Appendix) and + C (2,2)   x where a dot indicates a derivative with respect to the covariate x, e.g.Ḟz (u 1 ) = ; the symbol (i) indicates a derivative with respect to u i , e.g.
, which is a mixture of the above notational rules.Finally, the product E K B x in (10) is a matrix product.
We are now ready to state the weak convergence result for the process where V is a constant depending on the asymptotic properties of the weights {w ni }, as defined in (9), W x is a bivariate Brownian bridge on [0, 1] 2 with covariance function and where R x (u 1 , u 2 ) is the (deterministic) mean function For this estimator we need to specify the relation of the three bandwidths that are used.In the following we suppose that for j = 1, 2 The next theorem establishes a linear asymptotic representation for the normalized empirical copula process in (5) and subsequently the weak convergence result for C where and Moreover, if (9) holds and xn also converges in distribution to a Gaussian process Z x , as in (12), but now with the (deterministic) mean function equal to Similarly as in Veraverbeke et al. ( 2011) [19] one can see from Corollary 1 and Theorem 2 that both estimators of the conditional copula have the same asymptotic variances (provided the same bandwidth h n is used), but that the bias of the estimator C xh does not involve terms coming from the dependence of the marginal distributions on the covariate.See expressions (11), (12), (13) and (16).

Functional covariate X
In this section we study the weak convergence of the, properly normalized, empirical copula processes in case of a functional covariate.
Assume now that (X i , Y 1i , Y 2i ) is a sample of i.i.d.triples, where the X i s are random elements with values in a functional space E, equipped with a semimetric d.
In the following we will take E to be a separable Banach space endowed with a norm • .As argued in Ferraty et al. (2007) [10] the space is still very general and the separability avoids measurability problems.
Also here we first introduce some notations and state some primary regularity conditions.

Notations and regularity conditions
Let us consider the Nadaraya-Watson weights where K is a given (univariate) kernel.Another system of weights that naturally extends to the functional covariate case is that of the k-nearest neighbour weights; see when X is forced to be at a distance s from the point χ.
In an analogous way define X − χ = s , and similarly for φ C χ,u1,u2 .A very important role is played by the so called small ball probability function In this functional covariate case the appropriate normalization factor for the empirical processes turns out to be n ϕ χ (h n ).We consider the normalized empirical processes which is in analogy with (5).Indeed, note that this normalization factor is consistent with that in Section 2 as for E = R d one gets (see e.g.Ferraty and Vieu, 2002 [9], Chapter 13).
where Γ(•) is the gamma function, and hence the order of the normalization factor n ϕ χ (h n ) in (19) coincides with that of n h d n in (5).We now state the weak convergence results for the processes in (19).

Results for the estimator C χh of type (1)
The following theorem is proved in Appendix B. , χn , defined in (19), converges in distribution to a Gaussian process Z χ , as defined in ( 12) but now with The quantities M 1 , M 2 and M β are defined in respectively (B2), (B6) and (B5) in Appendix B.
If nϕ χ (h n )h 2β n → 0 then the asymptotic bias given by the function R χ diminishes.On the other hand if nϕ χ (h n )h 2β n → ∞, then the bias dominates the variance.Thus the assumption nϕ gives the optimal rate of convergence.But it should be said that the practical impact of such a result appears as rather limited given the current state-of-the-art of the research area.The general problem is that a random element with values in a functional space is generally a very complex object and it is not clear yet what are reasonable assumptions about its distribution.
To be more explicit, it is already well-understood that the variability of kernel estimators with functional covariate depends on the behaviour of the small ball probability function ϕ χ (h) near zero.For many standard processes, see e.g.Ferraty and Vieu (2006) [11], Ferraty et al. (2007) [10], and Hall et al. (2009) [14], one gets ϕ χ (h) = o(h d ), where d might be arbitrarily high.Roughly speaking this means that such a process is near χ more sparse than a continuous multivariate covariate of arbitrarily high dimension near a point with a positive density.This somehow violates the philosophy of functional data analysis, which states that switching from multivariate to functional variables helps to deal with the curse of dimensionality.In fact the small ball probability function ϕ χ (h) is directly linked with the concentration properties of a functional variable X .Obviously this small ball probability depends on the choice of the semi-metric (and consequently norm).Appropriate choices of semi-metrics can thus lead to an increase in the concentration properties of the functional variable X .The co-called projection type semi-metrics are of this type, and a general procedure for construction can be found in Lemma 13.6 of Ferraty and Vieu (2006) [11].
Next, the lack of sense what is reasonable to assume about the distribution of X also affects our ability to check assumption (F3) in Appendix B about the behaviour of the function φ H χ,y1,y2 (s) near zero.Note that if thus one would expect that β = 2 is a reasonable assumption.On the other hand in nonparametric regression with a functional covariate Ferraty et al. ( 2007) [10] and (2010) [8] explicitly assume (when translating to the context of estimation of distribution functions) that ∂φ H χ,y1,y2 (s)/∂s at s = 0 is non-vanishing and finite, which corresponds to the case β = 1.
As it is usually rather difficult to find the highest possible β in (F3) even if the model is specified, researchers often assume Lipschitz continuity in the covariate of the quantity of interest.Let us suppose that there exists γ > 0 such that Then we can arrive at Theorem 3 with

Results for the estimator C χh of type (3)
Similarly as in the case of a univariate covariate, it seems to be advisable to try to remove the effect of the covariate on the marginal distributions.
To guarantee an analogous result as stated in Theorem 2 we need regularity assumptions, listed in Appendix B. Most of them just guarantee that the assumptions made in the previous section hold in some sense uniformly on a neighbourhood of the point of interest.
Note that assumption ( F5) in Appendix B is specific for a functional covariate, as for a univariate or multivariate covariate it is satisfied.Somebody may find this assumption rather restrictive as the unit ball in a Banach space is totally bounded if and only if the space has a finite dimension.But on the other hand, there are many interesting sets of functions that satisfy this assumption.For instance all the sets of functions for which we can find finite covering or bracketing numbers (see e.g.van der Vaart and Wellner (1996) [18]) are totally bounded.Note that a condition similar to ( F5) is also assumed in Ferraty et al. (2010) [8].Their assumption (C7) (together with (C6)) implies that for a given ε > 0 for all sufficiently large n there exists a finite cover such that the diameter of each element of the cover is at most ε for each of the sets The next theorem states the weak convergence result for the second normalized empirical process in (19).
χn , defined in (19), converges in distribution to a Gaussian process Z χ , as defined in (12) with V = √ M 2 /M 1 and the drift function R χ given by Remark 1. Comparing Theorems 3 and Theorems 4 one can see that asymptotic variances of both estimators are the same.The comparison of the biases is less straightforward here than for a univariate or multivariate covariate.To get some insight into the problem, recall that C (j) and finally where the functions ϑ C (j) Fj (u 1 , u 2 ), ϑ C (j,j) F 2 j and ϑ C (1,2) F1F2 are continuous.Then one can derive that the function R χ of (20) can be expressed as Thus similarly as for a univariate or multivariate covariate one can see that the bias structure of the estimator C χh is much simpler as it does not include the several terms coming from the effect of the covariate on the marginal distributions.That is why we generally recommend trying to reduce the effect of the covariate on the marginal distributions.
Remark 2. We can get rid of assumption ( F4) if we slightly undersmooth.More precisely instead of ( F4) suppose that: ( F4') There exists η > 0 such that the bandwidths h n , g n1 and g n2 satisfy Then it can be proved that Theorem 4 holds with R χ ≡ 0. Note however that (21 . Hence, this means that no optimal bandwidth, as in (B3), can be used.

Conditional association measures derived from the conditional copulas
Similar to the unconditional case, we can measure the association between Y 1 and Y 2 , given X = χ by the conditional Kendall's tau and the conditional Spearman's rho expressed with the help of the conditional copula C χ .
With the nonparametric estimators of the conditional copula as defined in Section 1 we can estimate the conditional Kendall's tau by ) can be replaced with ( U 1i , U 2i ) defined in (2) if the relationship could be blurred with the effect of the covariate on the marginal distribution.We will denote this estimator as τ n (χ).The use of the conditional Kendall's tau estimators τn (χ) and τ n (χ) will be illustrated in the simulation section (Section 5) and in the real data application (Section 6).
In a similar fashion a nonparametric estimator for the conditional Spearman's rho is

Simulation study
To complement the theoretical results we provide here a simulation study to illustrate the finite-sample performance of the estimator C χh of (1) and the estimator C χh of (3).We include also the 'benchmark' estimator that corresponds to the estimator C χh calculated from the (unobserved T .This estimator corresponds to the situations when conditional marginal distributions are known and/or the covariate does not affect the marginal distributions.
For brevity of presentation while addressing various aspects of the studied problem, we focus on estimation of the conditional copula at some fixed covariate value for the multivariate covariate case, and on estimation on the conditional Kendall's tau function for the functional covariate case.

Multivariate covariate
In this section we are interested in estimation of a conditional copula for a two-or three-dimensional covariate at a given point x = (x 1 , . . ., x d ) T .The performance of the estimators is evaluated using the average (over all simulations) of the integrated squared error where Ĉxh stands for an estimator of a conditional copula.
The model for the marginals is given by Y 1 = µ 1 (X) + ε 1 and Y 2 = µ 2 (X) + ε 2 , where the covariates X = (X 1 , . . ., X d ) T are supposed to be independent standard normal variables, and with (for d = 2 of 3) Further, ε 1 and ε 2 are standard normal random variables with the joint conditional distribution function for X = x (with x = (x 1 , . . ., x d )) given by where Φ is the distribution function of a standard normal random variable and C θ(x) is a Frank copula with the copula parameter depending on the point x and given by θ(x) = 5 + 7 To calculate the estimators we use B = h 2 n I d and a multiplicative kernel The local linear weights of Ruppert and Wand (1994) [16]   We estimate the conditional copula at the point x = (1, 1), for d = 2, and at the point x = (1, 1, 1) for d = 3. Results based on 1 000 simulated samples of size n = 300 are in Figures 1 and 2 respectively, where we present (from left to right) the average of the integrated squared bias (AISB), the average of the integrated variance (AIV) and the average of the integrated squared error (AISE), plotted as functions of the bandwidth h.The dotted-dashed curve shows the result for the estimator C xh of (1), the solid curve for the estimator C xh with g 1 = g 2 = 2 h and finally the dashed curve stands for the benchmark estimator (where we use the information on the marginals to transform these).Using the same line types we depict as horizontal lines the values of AISB, AIV and AISE when a plug-in bandwidth choice (mimicking the procedure described in Section 2.3 of Gijbels et al. (2011) [12]) is applied for each of the three displayed estimators.
Note that in the considered simulation models the marginals clearly depend on the covariates (through regression models) and hence in this situation adjusting for the effect of the covariates on the marginals is quite crucial to obtain a good performing estimator.Once the effect of the covariates is removed from the marginals, the curves of the integrated squared error of the estimator C xh and of the benchmark estimator become flat and even large bandwidths give still very reasonable results.So by removing the effect of the covariates on the marginals, the choice of the bandwidth seems to have a lesser impact for the estimator C xh , as compared to the estimator C xh .
Note that the estimator C xh is quite close to the benchmark estimator.The results of C xh could be further slightly improved when one uses more specific models instead of the general approach given in (2) to adjust the marginals for the effect of the covariate.For instance additive models would be completely appropriate for this purpose in this setting.See also Section 7.
The results for d = 2 and d = 3 are similar, with somewhat larger errors for the d = 3 case, and the estimator C xh that is not so close to the benchmark estimator as for d = 2.With increasing dimension of the covariate it is more difficult to adjust the marginals for the effect of the covariate.

Functional covariate
In this section we consider the following general model: where ε 1 and ε 2 are standard normal random variables with the joint conditional distribution function of (ε 1 , ε 2 ) for X = χ given by where C θ(χ) is a Frank copula with the parameter θ(χ) depending on the functional covariate X (t) that is observed at the equispaced points 0 = t 0 < t 1 < • • • < t 100 = π.Further elements of the different simulation models are listed in Table 1.Herein, the random variables A 1 ,A 2 , A 3 and A 4 are independent random variables uniformly distributed on [0, 1].The aim of this section is also to provide illustrations of various aspects of the application in Section 6.We therefore focus on estimation of the conditional Table 1 Simulation models for the functional covariate case functional covariate copula parameter function as in Model A functional covariate as in Model A C µ 1 (X ) and µ 2 (X ) µ 1 (X ) = 2 sin( 1 2 X 2 ) and µ 2 (X ) = 0 real-valued covariates X 1 and X 2 as in Model B copula parameter function as in Model A functional covariate Kendall's tau at all points of the sample.We measure the performance of an estimator τ (χ) of τ (χ) via the 'integrated squared error' where F n stands for the empirical distribution of the functional covariate in a given sample.The quantity ISE in ( 23) is then further averaged across all 1 000 simulations.Note that ISE can be decomposed into the 'squared integrated bias' and the 'integrated variance' An estimator for the conditional Kendall's tau is given in (22).

Simulation results for Model A
Simulation model A is inspired by the data generation process used in Ferraty et al. ( 2010) [8].In a typical sample from this model the conditional Kendall's tau ranges from 0.1 to 0.6 with the average value of 0.4.We consider the following four estimators: • 2 ).Note that the latter seems to be more appropriate here taking into account the structure of Model A.
For the nonparametric regression fit needed for the estimator nonp.regr.we use the R software functions funopare.kernel.cv(for the NW-weight system) and funopare.knn.gcv(for the k-nearest neighbour weights) that are available at the website http://www.math.univ-toulouse.fr/staph/npfda/.Finally, the bandwidths returned by the function funopare.kernel.cvare used as g n1 and g n2 in the calculation of the estimator unif.based on ( U 1i , U 2i ).In the top (respectively bottom) panel of Figure 3, the simulation results for sample size n = 200 using the distance function based on the first derivative (d 2 ), are presented for the NW-weights (respectively the k-nearest neighbour weights; where α = k n ).The results for the two type of weights are very similar.Note also the similarity with the results for a multivariate covariate.The estimators unif.and nonp.regr.are both very close to the benchmark estimator.The estimator nonp.regr. is doing slightly better than the estimator unif., as nonparametric regression is, for this model, the appropriate method to adjust the marginals for the effect of the covariate.
The results for the standard L 2 -distance d 2 are summarized in Figure 4.In this case, both estimators unif.and nonp.regr.are doing considerable worse than the benchmark estimator.The reason is that the distance d 2 is not so well suited as d for adjusting the marginals for the effect of the covariate.A closer inspection of the results also reveals that the benchmark estimator is doing worse for the distance d 2 than for the (more appropriate) d (1) 2 .

Conditioning on a functional covariate or on a summarizing real-valued covariate
When dealing with a complex covariate one might wonder whether it is not possible to replace the covariate with a simpler quantity that is easier to handle.The success of this strategy depends on how well this simpler quantity substitutes the complex covariate.To illustrate this we consider Models B and C, which go back to Model A, but where now we define two scalar summaries X 1 and X 2 of the functional covariate X (from Model A).For a real-valued covariate, one can simply use the approach of Gijbels et al. (2011) [12] and Veraverbeke et al. (2011) [19].
When calculating the estimator of the conditional Kendall's tau based either on X 1 or X 2 we take Nadaraya-Watson (or k-nearest neighbour) weights based on the same kernel as for the other estimators.For brevity we only present results for the C-type estimators, and for nearest neighbour weights.
Note that in Models A and B the dependence of the copula parameter on the functional covariate is fully described through X 2 (but not at all through X 1 ), but that this is not the case for the dependence of the marginals on the functional covariate.This is in contrast with Models C and D where the dependence of the marginals and the copula parameter is fully captured through the real-valued covariate X 2 .These models can thus be used to investigate how much we loose if the conditioning is done upon a functional covariate whereas it could have been done simply via a real-valued covariate.Also for Model D the conditional Kendall's tau ranges from 0.1 to 0.6 with the average value of 0.4.
The results for Model B are presented in Figure 5.The dotted-dashed curve is the result for the estimator based on X 1 and the dotted curve for the estimator based on X 2 .As in Figures 3 and 4 the solid curve represents the estimator C χh and the dashed curve stands for the benchmark estimator (both based on d due to the enormous bias of the estimator.The X 2 -based estimator is doing much better, but also does not really have a satisfactory performance.In this model none of the two real-valued covariates can fully describe the dependence structure. Figure 6 shows the results for Model C. Note that the X 2 -based estimator is slightly preferable to the unif.estimator (as well as to the benchmark estimator) only for small α.With increasing α the differences between the estimators diminishes.The reason behind this is that for larger values of α, the covariates with similar values of X 2 are also close when the distance is measured by d for the functional covariate.
In contrast to this, for Model D, similar values of X 2 can be given by curves of the functional covariate that are far away when measured by the distance d The results for Model D are in Figure 7.Note that the X 2 -based estimator is doing even slightly better than the benchmark estimator.Although in this case, conditioning on the functional covariate is not necessary, we see that by doing so the conditional Kendall's tau estimator still behaves reasonably well, and little harm has been done by considering the functional covariate instead of the real-valued covariate X 2 .

Application to real data
The tecator dataset contains 215 spectra of light absorbance as functions of the wavelength, observed on finely chopped pieces of meat.These data were first studied by Borggaard and Thodberg (1992) [6].The original data come from a quality control problem in the food industry and can be found at website http://lib.stat.cmu.edu/datasets/tecator.For each finely chopped pure meat sample a 100 channel spectrum of absorbances (− log 10 of the transmittance measured by a spectrometer) and the contents of moisture (water), fat and protein were measured.The 215 spectral curves (100 discretely observed values each; over a common grid of wavelength points) are presented in Figure 8(a).
To each spectral curve corresponds a three-dimensional vector -percentage of (fat, protein, water) in each piece of meat.In this analysis we concentrate on the relationship between fat and protein.
For simplicity we summarize the degree of dependence by the conditional Kendall's tau (see Section 4).To calculate the estimator in (22) we use Nadaraya-Watson weights defined by (17).The kernel function is chosen to be K(u) = 1 − u 2 for u ∈ [0, 1] and zero otherwise.For a given χ the bandwidth h n is taken to be such that the ball of radius h n contains 60 observations (note that this corresponds to a k-nearest neighbour type of bandwidth; see Burba et al. (2009) [7]).For calculating ( U 1i , U 2i ) defined in (2) we set g n1 = g n2 such that ball of radius g n1 contains 30 observations.
We consider two distance functions on the covariate space, denoted by d 2 and d (2) 2 .While, the first one is the standard L 2 -norm of a difference between two spectral curves, the second one is the L 2 -norm based on a difference of the second derivatives of two spectral curves.We include the distance function d as it is often used in the functional data framework analysis of this data set, see e.g.Ferraty and Vieu (2002) [9], (2006) [11], Ferraty et al. (2007) [10], (2010) [8] and Burba et al. (2009) [7] among others.
In Figure 9(a) a scatterplot of the percentages of fat and of protein for the 215 meat samples are plotted.This reveals that there is a strong negative relationship between protein and fat percentage with Kendall's tau equal to −0.69.In  A question that rises is: How different is this relationship for different (types of) spectral curves?Or also: How does this relationship change when conditioning on spectral curves that are close in the sense of one of the two considered semi-metrics?Answering these questions can help in identifying clusters of food with a similar dependence between fat and protein.
We first consider the distance function d 2 (based on the standard L 2 -norm).For each of the spectral curves X i we estimate the conditional Kendall's tau τn (X i ) ( τ n (X i )) by formula (22).Further, we take the minimal envelope (that is the pointwise minimum) of the spectral curves as the reference curve.This seems to be an appropriate reference curve in this example given the somewhat layered appearance of the spectral curves.In Figure 10(a) the conditional Kendall's tau estimates (τ n (χ i ) and τ n (χ i )) are depicted, as a function of the distance from the minimal envelope (reference) curve.The dashed horizontal line represents the standard (unconditional) Kendall's tau (−0.69) measuring the global association between fat and protein.
Note that both estimates τn and τ n are very close (although τ n is for most of the data points slightly more negative) which indicates that the marginal distributions are almost unaffected by the covariate when the L 2 -norm is considered.This is also confirmed by Figure 9(c) where the adjusted observations ( U 1i , U 2i ) (defined in (2)) are plotted.The relationships of the original observations (Figures 9(a) and (b)) and of the adjusted data (conditionally transformed margins) in Figure 9(c), seem to follow a very similar pattern.Finally, the main message of this analysis is that (see Figure 10(a)) the conditional Kendall's tau drops (from about −0.6 to about −0.8) when moving from the minimal envelope to the curves with distances slightly above one and then comes back to about −0.65.When focusing for example on the τn estimated values one can see that the conditional Kendall's tau ranges from −0.80 to −0.55, where the curves that correspond to these extreme values are indicated in Figure 8(a) as bold solid and dotted-dashed curves.One can think of these as spectra for meat in which the (negative) relationship between fat and protein is most (respectively less) pronounced.
Now, let us switch to the distance function d 2 than for d 2 .This indicates that the second derivative of a spectral curve explains a significant part of the dependence of protein and fat.In Figure 8(b) we depict the (discrete) second-order derivatives of the original spectra.In this figure we indicate the two curves for which the estimator τ n achieves the minimal and maximal value (respectively −0.51 and 0.12).One can think of these as (derivative) spectra for meat in which the relationship between fat and protein is largest negative, respectively largest positive.
Further analysis reveals that the adjusted estimator τ n is for approximately 74 percent of the observations higher than τn , with the difference being as high as 0.45.It is also interesting to look at Figure 9(d 2 ).When comparing Figures 9(c) and (d) one can see that when adjusting for the effect of the covariate on the marginal distributions using d (2) 2 , there is no longer such an obvious pattern of negative relationship between fat and protein as in the case when d 2 is used.This finding points in the same direction: the second derivative of a spectral curve explains a significant part of the dependence of protein and fat.
In conclusion, the L 2 -distance based on the second derivative of the spectral curves proves to be more informative than the standard L 2 -distance not only when fitting marginal distributions of either fat or protein (cfr previous analyses in the literature), but also when one is interested in the relationship between fat and protein.Focusing on the spectral curves that are close in d 2 -sense, results in estimates of the conditional Kendall's tau ranging from −0.5 to −0.8 with the average value 1 conditional Kendall's tau range for τn from −0.61 to 0 (with the average value −0.33) and for τ n from −0.51 to 0.12 (with the average value −0.23).
A referee pointed out that the need for having to choose a reference (spectral) curve in the above analysis might, in other examples, be less evident.One can avoid the choice of reference curves, by simply presenting estimates of the conditional Kendall's tau for given X = X i (for each of the i = 215 spectral curves).For the two distance functions (d 2 and d (2) 2 ) this leads to two estimated values for each given spectral curve.Plotting these two numbers against each other for all spectral curves results into Figure 11.The left panel shows the results for the estimator τn and the right one for the estimator τ n .For a visual impression we also include the Lowess smoother on each scatter plot.

Further discussion
In this paper we introduced estimators of a conditional copula function when the covariate is either a multivariate vector or a functional covariate.From these, estimators of conditional association measures can then be easily obtained.
The proposed estimators are defined in terms of kernel weights, requiring the choice of a bandwidth parameter.A study of theoretical optimal choices of bandwidths would start from the theoretical bias and variance properties of the estimators.These would then induce a discussion on practical choices of bandwidth parameters, such as plug-in type of bandwidths.In Section 5 we illustrate the impact of the choice of a bandwidth parameter on the performance of the estimators in the current context.We also implemented a plug-in type of bandwidth selector.However, a detailed study on optimal choices of bandwidth parameters in (conditional) copula estimation, is mostly lacking so far.
When dealing with multivariate covariates of dimension d, one needs to smooth in d dimensions.This involves working with local neighbourhoods in higher dimensions, and hence one cannot avoid to face the problem of curse of dimensionality (large sample sizes are needed in high dimensions).The presented methods lead to good results for moderate sample sizes (a few hundred) for dimensions up to 3. In multivariate nonparametric regression the curse of dimensionality problem is dealt with by restricting the class of models to for example additive models or single-index models.This could be done in the current setting by assuming that the multivariate covariate influences Y 1 as well as Y 2 through an additive regression model, or via a single-index model.Modeling the conditional copula function could then be done in a semiparametric way (see e.g.Abegaz et al. (2012) [1]), restricting the copula parameter function to be modelled also in an additive way.This is the subject of future research.
In case of a functional covariate, there is also the issue of the choice of distance function (or norm) to measure the distance between two curves.In the application in Section 6 we worked out analysis for two different distance functions, revealing that: (i) a comparison between the analyses based on different distances can lead to interesting findings; and (ii) some distance functions might appear more natural than others in a given study.

Corollary 1 .→ H 2 ,
If (9), n h 4+d n (W3) and the assumptions of Theorem 1 hold, then the processC (E)xn converges in distribution to a Gaussian process Z x

2. 3 .
Results for the estimator C xh of type(3)

Fig 3 .
Fig 3. Conditional Kendall's tau estimators for a functional covariate when the L 2 -norm of the first derivative is used.

Fig 4 .
Fig 4. Conditional Kendall's tau estimators for a functional covariate when the L 2 -norm is used.

Fig 5 .Fig 6 .
Fig 5. Conditional Kendall's tau estimators for Model B, when conditioning on the scalars X 1 or X 2 , or on the functional covariate.

Fig 8 .
Fig 8.The spectrometric curves data (a), and the second derivatives of these curves (b), with indication of two particular curves: with the smallest (respectively largest) estimated conditional Kendall's tau: dotted-dashed (respectively solid) curves.

Figure 9 (
Figure 9(b) we plot ( n n+1 F n1 (Y 1i ), n n+1 F n2 (Y 2i )), where F 1n (F 2n ) are the (unconditional) empirical distribution functions of the Y i1 's (Y i2 's) observations, i.e. the original data transformed in an unconditional way to uniform margins.Figures 9(a) and (b) reveal a similar strong negative relationship between protein and fat percentage.A question that rises is: How different is this relationship for different (types of) spectral curves?Or also: How does this relationship change when conditioning on spectral curves that are close in the sense of one of the two considered semi-metrics?Answering these questions can help in identifying clusters of food with a similar dependence between fat and protein.We first consider the distance function d 2 (based on the standard L 2 -norm).For each of the spectral curves X i we estimate the conditional Kendall's tau τn (X i ) ( τ n (X i )) by formula (22).Further, we take the minimal envelope (that is the pointwise minimum) of the spectral curves as the reference curve.This seems to be an appropriate reference curve in this example given the somewhat layered appearance of the spectral curves.In Figure10(a) the conditional Kendall's tau estimates (τ n (χ i ) and τ n (χ i )) are depicted, as a function of the distance from the minimal envelope (reference) curve.The dashed horizontal line represents the standard (unconditional) Kendall's tau (−0.69) measuring the global association between fat and protein.Note that both estimates τn and τ n are very close (although τ n is for most of the data points slightly more negative) which indicates that the marginal distributions are almost unaffected by the covariate when the L 2 -norm is considered.This is also confirmed by Figure9(c) where the adjusted observations ( U 1i , U 2i ) (defined in (2)) are plotted.The relationships of the original observations (Figures9(a) and (b)) and of the adjusted data (conditionally transformed margins) in Figure9(c), seem to follow a very similar pattern.Finally, the main
) that depicts the adjusted observations ( U 1i , U 2i ) (based on the distance function d(2)

Fig 11 .
Fig 11.For each given spectral curve X i , estimates of τn(Xi) based on the distance function d 2 plotted versus τn(Xi) based on the distance function d (2) 2 .right panel: similar plot but for the estimator τn(•).
[5]linet et al. (2011))On the other hand there is no unique generalization of local linear weights to the functional covariate case.For possible suggestions see e.g.Baíllo and Grané (2009)[3], Barrientos-Marin et al. (2010)[4]andBerlinet et al. (2011)[5].For simplicity we concentrate only on Nadaraya-Watson weights in the following, but with some further effort it might be shown Ferraty et al. (2007) hold for k-nearest neighbour weights as defined inBurba et al. (2009)[7].Analogously as inFerraty et al. (2007)[10] define Conditional Kendall's tau estimators, for Model D, when conditioning on the scalars X 1 or X 2 , or on the functional covariate.