Partial Martingale Difference Correlation *

We introduce the partial martingale difference correlation, a scalar-valued measure of conditional mean dependence of Y given X, adjusting for the nonlinear dependence on Z, where X, Y and Z are random vectors of arbitrary dimensions. At the population level, partial martingale difference correlation is a natural extension of partial distance correlation developed recently by Székely and Rizzo [14], which characterizes the dependence of Y and X, after controlling for the nonlinear effect of Z. It extends the martingale difference correlation first introduced in Shao and Zhang [10] just as partial distance correlation extends the distance correlation in Székely, Rizzo and Bakirov [13]. Sample partial martingale difference correlation is also defined building on some new results on equivalent expressions of sample martingale difference correlation. Numerical results demonstrate the effectiveness of these new dependence measures in the context of variable selection and dependence testing.


Introduction
Measuring and testing (in)dependence and partial (in)dependence is important in many branches of statistics.To measure the dependence of two random vectors X ∈ R p and Y ∈ R q , Székely, Rizzo and Bakirov [13] proposed distance covariance (dCov) and distance correlation (dCor), which have attracted a lot of attention lately; see related work by Sźekely and Rizzo [12], Li, Zhong and Zhu [7], Kong et al. [4], Lyons [8], Sejdinovick et al. [9], Sheng and Yin [11], Dueck et al. [2], Shao and Zhang [10], and Székely and Rizzo [14] among others for further extensions and applications of these concepts.In particular, Shao and Zhang [10] proposed the notion of martingale difference divergence (MDD, hereafter) and martingale difference correlation (MDC, hereafter) to measure the conditional mean (in)dependence of Y given X (Y is said to be conditionally mean independent of X provided that E(Y |X) = E(Y ) almost surely).Conditional mean dependence plays an important role in statistics.As Cook and Li [1] stated, "in many situations regression analysis is mostly concerned with inferring about the conditional mean of the response given the predictors, and less concerned with the other aspects of the conditional distribution".In practice, it can occur that Z is a variable that has been known a priori to contribute to the variation of Y , and our main interest is to know if X contributes to the conditional mean of Y (i.e. if Y is conditional mean dependent on X) after adjusting for the (possibly nonlinear) effect of Z.
In this paper, our goal is to develop a scalar-valued measure of conditional mean independence of Y given X, controlling for the third random vector Z, where X, Y and Z are in arbitrary, not necessarily equal dimensions.Our development follows that in Székely and Rizzo [14], where the partial distance covariance and partial distance correlation coefficient (pdCov and pdCor, hereafter) are developed to measure the dependence of Y and X after removing their respective dependence on Z ∈ R r .Owing to this connection, we name them partial MDD and partial MDC (pMDD and pMDC, hereafter).
The main contribution of this paper is two-fold: (1) We discover an equivalent expression for MDD at both population and sample levels and an important connection to the distance covariance.The new expression makes it easier to prove a fundamental representation result concerning the sample MDD.A natural extension of MDD to allow for random vectors for both Y and X is also presented.(In Shao and Zhang [10], Y is restricted to be a one-dimensional random variable.)(2) We propose partial MDD and MDC as an extension of partial dCov and partial dCor at both population and sample levels.Our definition of partial MDD differs from that of partial dCov in that the role of Y and X are asymmetric in quantifying the conditional mean dependence of Y on X controlling for Z.Furthermore, we provide an unbiased estimator of the squared MDD using the U-centering idea (Székely and Rizzo [14]) and find equivalent expressions for partial MDC at both population and sample levels.Numerical results are provided in Section 5 to show that a permutation-based test of zero partial MDC has accurate size and respectable power in finite sample.Also a data illustration is provided to demonstrate the effectiveness of pMDC-based forward variable selection approach as compared to pdCor-based counterpart.Section 6 concludes, and technical details are gathered in the Appendix 6.

Essentials of dCov and dCor
As proposed by Székely, Rizzo and Bakirov [13], the (population) dCov of two random vectors X ∈ R p and Y ∈ R q with finite first moments is V(X, Y ), the non-negative square root of where φ X , φ Y , and φ X,Y are the individual and joint characteristic functions of X and Y , and c p = π (1+p)/2 /Γ((1 + p)/2).Throughout the paper, | • | p and | • | q are the (possibly complex) Euclidean norms defined by, for example, where x H denotes the conjugate transpose of x ∈ C p .In the special case of C 1 , we simply denote the modulus as |x|.The dCov characterizes independence in the sense that V(X, Y ) = 0 if and only if X and Y are independent.When X and Y have finite second moments, it can be shown that where (X , Y ) and (X , Y ) are iid copies of (X, Y ).(See Székely and Rizzo [12], Theorems 7 and 8.) The (population) dCor of X and Y is R(X, Y ), defined as the nonnegative number satisfying provided the denominator is positive, and zero otherwise.Both dCov and dCor have readily-computable sample analogues, which may be used in general tests for independence.

Some properties of MDD and MDC
Martingale difference divergence (MDD) and martingale difference correlation (MDC) are intended to measure departure from the relationship for Y ∈ R q and X ∈ R p .Shao and Zhang [10] proposed these measures in the case q = 1, by extending the distance covariance and distance correlation proposed in Székely, Rizzo and Bakirov [13].The following brief review of MDD and MDC also generalizes Shao and Zhang [10] by allowing for q > 1.
The martingale difference divergence MDD(Y |X) for real random vectors X ∈ R p and Y ∈ R q is defined to be the nonnegative number satisfying Compared to the expression of the squared distance covariance in (2.1), the squared MDD uses the same form of weighting function and thus forms a natural extension.In Shao and Zhang [10], it has been shown that the MDD and MDC inherit a number of useful properties of dCov and dCor, including the following: In the case q = 1 these follow from Theorem 1 of Shao and Zhang [10], and extension to the case q > 1 is straightforward.

The martingale difference correlation MDC(Y |X) is the nonnegative number satisfying
when the denominator is positive, and zero otherwise.This coincides with the definition in Shao and Zhang [10] for the case q = 1.Following the same argument in the proof of Theorem 1 of Shao and Zhang [10], it can be shown that Since there is no additional novelty, we skip the details.

Connection between MDD and dCov
In this subsection, we provide an alternative formulation of MDD, using the Laplace operator, which relates it more closely to the formula for distance covariance (Székely, Rizzo and Bakirov [13]).Furthermore, this new formulation makes it easier to prove a fundamental representation result concerning the empirical (sample) version of MDD.Denote the gradient of a (possibly complex) function f of a real vector x ∈ R p as ∇ x f and the Hessian as x f .Define the Laplace operator (Laplacian) of f to be The proof of proposition 3.2 is in the appendix.It follows that p ds which has a very similar form to the squared distance covariance ) is weighted everywhere in both s and t, whereas MDD(Y |X) 2 is weighted everywhere in s, but depends only on particular behavior local to t = 0.This is conceptually sensible, because only first-moment information about Y is being used in MDD(Y |X) 2 .Indeed, we can further relate this to the ordinary (squared) covariance.For example, when p = q = 1, it is true that (when X and Y have sufficiently many moments) The new formulation (3.2) leads to a straightforward proof of a fundamental representation for sample MDD.Let φ n X,Y (s, t), φ n X (s), φ n Y (t), be the empirical characteristic functions, joint and marginal, based on averaging over a sample (x 1 , y 1 ), . . .(x n , y n ) of size n: Let MDD n (Y |X) 2 be the empirical squared martingale difference divergence, based on these empirical characteristic functions: This agrees with the definition in Shao & Zhang [10], according to Theorem 2 therein, along with the previous proposition (applied to the empirical characteristic functions).
where A and B • are the double centered versions of the matrices that have elements (3.4) where (X , Y ) and (X , Y ) are iid copies of (X, Y ).This compares with (2.2), thus making another connection between MDD and dCov.

Unbiased estimation of MDD
In general, MDD n (Y |X) 2 is a biased estimator of MDD(Y |X) 2 .When developing the partial distance covariance, Székely and Rizzo [14] introduced Ucentering, which seems essential and leads to unbiased estimator of squared distance covariance.
Let A = (a ij ) be a symmetric, real valued n × n matrix with zero diagonal, with n > 3. Define the U-centered matrix A as having (i, j)-th entry Let S n denote the linear span of all n × n distance matrices of samples {x 1 , . . ., x n }.Any A ∈ S n is a real valued, symmetric matrix with zero diagonal.Let H n = { A|A ∈ S n }.Following Székely and Rizzo [14], we define the inner product of A and B in H n as and as the norm of A. Theorem 1 in Székely and Rizzo [14] shows that the linear span of all matrices in H n is a Hilbert space with inner product defined in (3.6).This inner product is useful because it defines an unbiased estimator of squared population dCov (see Proposition 1 of Székely and Rizzo [14]).Below we shall introduce an unbiased estimator for MDD 2 (Y |X), which is crucial for the development of partial MDD and partial MDC in Section 4.
Given a random sample (x 1 , y 1 ), . . ., (x n , y n ) from the joint distribution of (X, Y ), where n > 3, define Define A, B • , and B * to be the U-centered matrices based on (a ij ), (b • ij ), and (b * ij ).(Even though (b * ij ) generally does not have a zero diagonal, the matrix B * may still be formally defined as in (3.5).Obviously (b • ij ) has a zero diagonal, so it better fits the context defined by Székely & Rizzo.)Below we assert that Perhaps surprisingly, B • can be replaced with B * , and the result remains true.Indeed, Proposition 3.5. ( Based on the U-centering, we introduce the unbiased estimator of MDD(Y |X) 2 as and the sample MDC is defined by

Partial MDD and partial MDC
Following the development in Székely and Rizzo [14], we shall present population and sample versions of partial MDD and partial MDC below.

Population pMDD and pMDC
provided that the integrals exist.Correspondingly, let A X and B Y denote the double centered version of a(x, x ) = |x−x | p and b(y, y ) = |y−y | q , respectively.Székely and Rizzo [14] define and indicate that this is equivalent to the original definition (2.1).Let b • (y, y ) = 1 2 |y − y | 2 q for y, y ∈ R q and its double centered version with respect to Y by after a straightforward calculation.Here E Y denotes the expectation with respect to the random vector Then in view of (3.1), it is not hard to show that To define the partial MDD at the population level, consider the usual L 2 space of random variables with finite second moment, having inner product (U Then it is easy to see that 2) of Székely and Rizzo [14]).Thus in a sense One way to measure the additional contribution of X to the conditional mean of Y controlling for Z, is to measure E(U |W ).
where D W is the random double centered version of d(w, w ) = |w − w | p+r with respect to W .The population partial martingale difference correlation is defined as Since there is no random sample corresponding to U , there is no direct plug-in estimate for B • U .Here we use P Z ⊥ (Y ) as a surrogate.Further note that if β = 0, then The problem with this definition is that the interpretation is not as intuitive and straightforward as that based on the one defined above.
Analogous to Theorem 3 in Székely and Rizzo [14], we can also provide an alternative definition of pM DC(Y |X; Z) below.

pM DC(Y |X
Remark 4.2.It has been noted in Section 4.2 of Székely and Rizzo [14] that the partial distance correlation R * (X, Y ; Z) = 0 is not equivalent to the conditional independence between X and Y given Z, although both the conditional dependence measure and partial dependence measure capture overlapping aspects of dependence.In our setting, a natural notion that corresponds to conditional independence of X and Y given Z is the so-called conditional mean independence of Y given X conditioning on Z, i.e., That is, conditioning on Z, the variable X does not contribute to the (conditional) mean of Y .It can be expected that pM DC(Y |X; Z) = 0 is not equivalent to conditional mean independence of Y given X conditioning on Z.In particular, we revisit the example given in Section 4.2 of Székely and Rizzo [14].Let Z 1 , Z 2 , Z 3 be iid standard normal random variables, X = Z 1 + Z 3 , Y = Z 2 +Z 3 and Z = Z 3 .Then X and Y are conditionally independent given Z, which implies that Y is conditionally mean independent of X given Z, but pM DC(Y |X; Z) = 0.04805(0.0004)= 0 based on numerical simulations with sample size 1000 and 10000 replications.On the other hand, it is also possible that pM DD(Y |X; Z) = 0 and yet Y is not conditionally mean independent of X conditioning on Z.For example, letting X, Y ∼ i .i.d .Bernoulli(0.5) and Z | X, Y ∼ Bernoulli c min(X, Y ) , it is possible to numerically determine a zero value of pM DD(Y |X; Z) at c ≈ 0.5857839, for which we have Thus the mean of Y depends on X, even after conditioning on Z.
Recently, conditional distance correlation (Wang et al. [15]) was proposed to measure the dependence between Y and X conditioning on Z, and its extension to measure the conditional mean dependence of Y given X conditioning on Z would be very interesting.It is worth noting that our sample pMDC and pMDD can be easily calculated without any choice of a bandwidth parameter, whereas a scalar-valued metric that quantifies the conditional mean dependence of Y given X, conditioning on Z, presumably has to involve a bandwidth parameter, the choice of which can be difficult.

Sample pMDD and pMDC
Given the sample (x T i , y T i , z T i ) n i=1 , we want to define sample partial MDD and MDC, denoted as pM DD n (Y |X; Z) and pM DC n (Y |X; Z) as sample analogs of population partial MDD and partial MDC.Let , C, respectively and the sample counterpart of Definition 4.2.Given a random sample from the joint distribution (X T , Y T , Z T ), the sample partial martingale difference divergence of Y given X, controlling for the effect of Z, is given by Remark 4.3.In Shao and Zhang [10], the conditional quantile dependence of univariate Y given X at the τ th level has been quantified using MDD and MDC by noting that ).Similarly, we can measure the so-called partial τ th quantile dependence of Y given X, controlling for the effect of Z, by using pM DD(V τ |X; Z) or pM DC(V τ |X; Z).Their sample versions can be defined accordingly, and the details are omitted.It is worth noting that in a recent paper by Li et al. [6], quantile partial correlation was introduced to measure the conditional quantile dependence of a random variable Y given X in a linear way, controlling for the linear effect of Z.In contrast, our quantile partial MDC measures the nonlinear dependence of Y at τ th quantile level on X after adjusting for the nonlinear effect of Z.Our sample pMDC can be readily calculated without fitting quantile regression models.
We also notice that both the population and sample pMDD (pMDC) could be negative, just like the pdCov (pdCor) proposed in Székely and Rizzo [14].We illustrate this through the following example.Suppose that Then, after straightforward but laborious computations, it can be shown that pM DD(Y |X; Z) = 4 − 3 √ 2 /144 ≈ −0.001685.Therefore, the sample counterpart may also be negative, since it is consistent.

Numerical studies
In the first three examples, we examine tests of the null hypothesis of zero pMDD.We compare our test with the partial distance covariance test (pdCov) by Székely and Rizzo [14] and partial correlation test (pcor), which is implemented as a t test as described in Legendre [5].Both the pMDD test and pdCov test are implemented as permutation tests, in which we permute the sample X in order to approximate the sampling distribution of the test statistic under the null.Specifically, in each permutation test, we generate R = 999 replicates by permuting the sample X and calculate the observed test statistic T 0 with the original data and test statistic T (i) corresponding to the i-th permutation.The estimated p-value is computed as where 1 is the indicator function.The significance level is α and we reject the null if p ≤ α.The type I error rate and power are estimated via 10,000 Monte Carlo replications.
Example 5.1.The settings of this example are adapted from Examples 2-5 in Székely and Rizzo [14].
1.a: Let X, Y and Z be three independent random variables, each of which follows a standard normal distribution.
1.b: Replace X in 1.a with an independent standard lognormal random variable.
1.c: X, Y and Z are generated from a multivariate normal distribution with marginal distributions as standard normal, and the pairwise correlations as ρ(X, Y ) = ρ(Y, Z) = ρ(Z, X) = 0.5.
1.d: Replace X in 1.c by a standard lognormal random variable such that the pairwise correlations are ρ(log X, Y ) = ρ(Y, Z) = ρ(Z, log X) = 0.5.  1 shows that partial correlation test's size (pcor) is inflated when sample size is relatively small for both normal and non-normal cases, and the size for pdCov and pMDD are reasonably close to the nominal level α.This is consistent with the findings in Székely and Rizzo [14].For the power comparison in 1.c and 1.d, Table 1 also shows that the pdCov has the highest power, and pMDD's power is only slightly lower than pdCov.Both tests have superior power performance over pcor.
Example 5.2.This example examines the case of negative correlation.
2.a: X, Y and Z are generated from a multivariate normal distribution with marginal distributions as standard normal and the pairwise correlations as ρ(X, Y ) = ρ(Y, Z) = ρ(Z, X) = −0.48.
2.b: Replace X in 2.a with a standard lognormal random variable such that the pairwise correlations are ρ(log 2.c: X, Y and Z are generated from a multivariate t distribution with marginal distributions as student-t with degree of freedom three and the pairwise correlations as From Table 2 we observe that overall pcor has the highest power; pMDD consistently outperform pdCov in all configurations and it is comparable to pcor when sample size is large. Example 5.3.Generate X = Z 2 + 1 and Y = 2Z + 2 X, where Z, 1 and 2 are iid standard normals.Numerical approximations based on n = 1000 and 10000 replications are 0.0253 (0.00014) for pdCor, −0.057 (0.0001) for pMDC, which indicated that after controlling the effect of Z, Y and X are still dependent with positive pdCor but pMDC is negative.It can be seen from Table 3 that for n = 100, the pdCov shows a substantial amount of rejections, whereas the rejection rate of pMDD is below the nominal level, which is consistent with the fact that pM DC(Y |X; Z) < 0 but pdCor(Y, X|Z) > 0. The population partial correlation can be easily calculated, that is, pcor(Y, X|Z) = 0. From Table 3, however, we see a big size distortion for pcor-based test.This is presumably because the joint distribution of (X, Y, Z) is not Gaussian.
Example 5.4.We consider the same prostate cancer data example used in Székely and Rizzo [14].The response variable, lpsa, is log of the level of prostate specific antigen.Our goal is to predict the response based on one or more predictors from a total of eight predictor variables.For comparison purposes, we standardize each variable first as in Székely and Rizzo [14] and Hastie et al. [3] and use the 67 training data for variable selection.Then prediction error is further reported using the 30 testing data.
The pMDC-based variable selection is implemented as a simple forward selection style combining both partial MDC and MDC as described in Székely and Rizzo [14] for pdCor.In the first step, we calculate the MDC(y|x i ) and select x j that has the largest martingale difference correlation.Then we compute pM DC(y|x i ; x j ) for all the variables x i = x j and also select the x i for which pM DC(y|x i ; x j ) is the largest.Define the vector w to be the variables that have already been included in the model through previous steps.We continue the procedure by selecting the next variable to be the one that has the largest pM DC(y|x i ; w).The stopping rule for the variable selection procedure is at 5% significance level implemented as a permutation test.The models selected by pMDC, pdCor, best subset method (BIC-based) and lasso are listed below: pMDC: lpsa ∼ lcavol + lweight + pgg45 + svi + lbph; pdCor: lpsa ∼ lcavol + lweight + svi + gleason + lbph; best subsets: lpsa ∼ lcavol + lweight; lasso: lpsa ∼ lcavol + lweight + svi + lbph + pgg45; The order for the forward selection with Mallow's C p is lcavol, lweight, svi, lbph, pgg45, age, lcp, gleason.We can see that the selection results are to some extent similar for all the methods listed above, as all select the same top two variables.The top five variables selected by pMDC are the same as those selected from both LASSO and forward selection with C p , except that pgg45 comes third for pMDC while it comes fifth for LASSO and forward selection.Note that pgg45 is the percent of gleason scores 4 or 5, which is highly correlated with variable gleason, the gleason score.The latter is selected to enter the model by pdCor.Also from the scatter plot (Figure 1) we can see a strong non-linear relationship between lpsa and pgg45, which could contribute to the mean of the response in a non-linear way.The above results suggest that the additional variable pgg45 can contribute to the conditional mean of the response in a non-linear way, which may not be detected by LASSO under linear model assumptions.In general, the pdCor selects the variable that has the strongest dependence after controlling for the effect of the previously selected variables, while this overall dependence is different from the conditional mean dependence.The variable that has the largest pdCor may not be the one that contributes the most to the conditional mean of Y .Therefore, it may obscure the variable which has the largest additional contribution to the conditional mean with less overall dependence.The ranking of the variables delivered by pMDC seems to make more intuitive sense.
In addition, we notice both pdCor and LASSO select svi as the third variable to enter the model, whereas pMDC selects pgg45.To demonstrate the importance of the selected order, we fit the GAM model again with only the first three variables selected by different methods and report the adjusted R 2 s as follows: pMDC: lpsa ∼ lcavol + lweight + pgg45 (R 2 = 68.2%) pdCor & lasso : lpsa ∼ lcavol + lweight + svi (R 2 = 67%) Again, partial MDC based variable selection seems to deliver a more sensible order than its pdCor based counterpart.As all the models under consideration are for the conditional mean, pMDC can be more efficient to select the variables that enter into a conditional mean model.Furthermore, we use the fitted GAM models mentioned above to forecast for the 30 testing data and report the mean square error (MSE) and mean absolute error (MAE) for the predictions in Table 4. From the results we can see, pdCor has the least MSE while pMDC and LASSO have the least MAE.
In the sequel, we further look into the variables that contribute to the conditional quantile of the response variable.This is based on the quantile pMDC variable selection, which is the same as pMDC variable selection except for the fact that we first apply a transformation to Y , i.e., U i = τ − 1(Y i − Qτ (Y ) ≤ 0), where Qτ (Y ) is the τ -th sample quantile of the response and τ = (0.25, 0.5, 0.75); see Remark 4.3.The models selected by pMDC and τ -th quantile pMDC are as follows: pMDC: lpsa ∼ lcavol + lweight + pgg45 + svi + lbph; 0.25-pMDC: lpsa ∼ lcavol + lweight + pgg45 + lbph + gleason; 0.5-pMDC: lpsa ∼ lweight + lcavol + pgg45 + svi; 0.75-pMDC: lpsa ∼ lcavol + svi; It can be seen that different variables are selected for the conditional mean model and conditional quantile models at different quantile levels.The conditional quantile of Y seems to depend on different sets of covariates at different quantile levels.

Discussion
In this paper, we propose an extension of the martingale difference correlation introduced in Shao and Zhang [10] to partial martingale difference correlation following the Hilbert-space approach of Székely and Rizzo [14].In the course of this extension, we provide an equivalent expression for MDD at population and sample levels, which facilitates our definition of partial martingale difference correlation.Although the definition is not unique, the proposed partial MDC has a natural interpretation, admits a neat equivalent expression, and can be easily estimated using existing U-centering-based unbiased estimates of MDD and distance covariance.Our numerical simulation shows that the test of zero partial MDD has comparable power to the test of zero partial distance covariance.A data illustration demonstrates that it can be more effective to select variables that enter into the conditional mean model using pMDC-based forward selection approach than the pdCor-based counterpart.
Similarly, generalize the expression for the distance inner product to apply to all matrices A and B (though, of course, it is not a true inner product on the space of all matrices).
Let vec(A) be the usual vectorization of matrix A formed by stacking its columns.

If
A is an n × n real matrix with all zeros on its diagonal, and B is any n × n real matrix, then To prove (i): Clearly vec( Ḃ) is a linear transformation of vec(B).Let S be the matrix of this transformation: vec( Ḃ) = S vec(B) An explicit form for S is from which it is clear that S is symmetric.Also, let F denote the matrix of the (linear) operator that sets the diagonal of a matrix to zero.That is, where B −D is B with its diagonal set equal to zero.It is obvious that F is a diagonal matrix with ones and zeros on the diagonal, hence symmetric and idempotent.
It follows that vec( B) = F S vec(B) Then the distance inner product may be written and thus because A already has a zero diagonal (so F vec(A) = vec(A)), and using the symmetry of S and F .
To prove (ii): By linearity, it suffices to prove the proposition for the matrices which are all zero except for a one in the ith diagonal position.Now, Then (removing the diagonal) and this is clearly a diagonal matrix, so (removing the diagonal) and the proof is complete.
To prove (iii): Since it suffices by linearity to prove the proposition for the matrices For such matrices, and setting the diagonal to zero gives according to equation (A.1).It then follows that according to (ii), and the proof is complete.

A.2. Proofs of propositions
Proof of Proposition 3.2.For any fixed s ∈ R q , let which is at least twice differentiable under the assumption that V has finite second moments.Then and Substituting all of these into (A.2) and canceling terms gives (α kl β km + α lk β mk ) (A.3) Now, from the fundamental Lemma of Székely, Rizzo, and Bakirov [13], Applying both of these operations (the integral and the Laplacian) to (A.3), and using their linearity, gives and the proof is complete.
Proof of Proposition 4.1.We consider the following cases: • Case 1: If Z is constant a.s.then Then And also

Definition 4 . 1 .
The population partial MDD of Y given X, after controlling for the effect of Z, i.e., pM DD(Y |X; Z) is defined as

Remark 4 . 1 .
Alternatively, we could define pM DD(Y |X; Z) as the difference between MDD(Y |W ) 2 and MDD(Y |Z) 2 , where the former measures the relationship E((Y − E(Y ))|W ) = 0 whereas the latter measures the relationship

Proposition 4 . 1 .
The following definition of population partial MDC is equivalent to Definition 4.1.
r , and D ij = |w i − w j | p+r , respectively.We use B • , C and D to denote the Ucentered versions of B • , C and D, respectively.Then the sample analogs of B • Y , β and

Proposition 4 . 2 .
C) C|| D| = 0 and otherwise pM DC n (Y |X; Z) = 0.An equivalent computing formula for pM DC n (Y |X; Z) in Definition 4.2 is 0 almost surely, where Q τ (Y ) and Q τ (Y |X) are the unconditional and conditional τ th quantiles of Y , and

Table 1
Type I error rate and power at nominal significance level α for Example 5.1 Cases 1.a and 1.b demonstrate the size of different tests at nominal level α, whereas cases 1.c and 1.d show the power.Table

Table 2
Power at nominal significance level α for Example 5.2

Table 4
Prediction ErrorsAfter selecting the variables, we fit a generalized additive model to evaluate and compare different model fits using the package mgcv in R. The adjusted R 2 s from GAM models are: