Generalized Hoeffding-Sobol Decomposition for Dependent Variables -Application to Sensitivity Analysis

In this paper, we consider a regression model built on dependent variables. This regression modelizes an input output relationship. Under boundedness assumptions on the joint distribution function of the input variables, we show that a generalized Hoeffding-Sobol decomposition is available. This leads to new indices measuring the sensitivity of the output with respect to the input variables. We also study and discuss the estimation of these new indices.


Introduction
Sensitivity analysis (SA) aims to identify the variables that most contribute to the variability into a non linear regression model.Global SA is a stochatic approach whose objective is to determine a global criterion based on the density of the joint probability distribution function of the output and the inputs of the regression model.The most usual quantification is the variance-based method, widely studied in SA literature.Hoeffding decomposition [9] (see also Owen [21]) states that the variance of the output can be uniquely decomposed into summands of increasing dimensions under orthogonality constraints.Following this approach, Sobol [26] introduces variability measures, the so called Sobol sensitivity indices.These indices quantify the contribution of each input on the system.
Different methods have been exploited to estimate Sobol indices.The Monte Carlo algorithm was proposed by Sobol [27], and has been later improved by the Quasi Monte Carlo technique, performed by Owen [22].FAST methods are also widely used to estimate Sobol indices.Introduced earlier by Cukier et al. [3] [4], they are well known to reduce the computational cost of multidimensional integrals thanks to Fourier transformations.Later, Tarantola et al. [29] adapted the Random Balance Designs (RBD) to FAST method for SA (see also recent advances on the subject by Tissot et al. [30]).
However, these indices are constructed on the hypothesis that input variables are independent, which seems unrealistic for many real life phenomena.In the literature, only a few methods and estimation procedures have been proposed to handle models with dependent inputs.Several authors have proposed sampling techniques to compute marginal contribution of inputs to the outcome variance (see the introduction in Mara and references therein [17]).As underlined in Mara et al. [17], if inputs are not independent, the amount of the response variance due to a given factor may be influenced by its dependence to other inputs.
imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 Therefore, classical Sobol indices and FAST approaches for dependent variables are difficult to interpret (see, for example, Da Veiga's illustration [5] p.133).Xu and Gertner [32] proposed to decompose the partial variance of an input into a correlated part and an uncorrelated one.Such an approach allows to exhibit inputs that have an impact on the output only through their strong correlation with other incomes.However, they only investigated linear models with linear dependences.
Later, Li et al. [15] extended this approach to more general models, using the concept of High Dimensional Model Representation (HDMR [14]).HDMR is based on a hierarchy of component functions of increasing dimensions (truncation of Sobol decomposition in the case of independent variables).The component functions are then approximated by expansions in terms of some suitable basis functions (e.g., polynomials, splines, ...).This meta-modeling approach allows the splitting of the response variance into a correlative contribution and a structural one of a set of inputs.Mara et al. [17] proposed to decorrelate the inputs with the Gram-Schmidt procedure, and then to perform the ANOVA-HDMR of Li et al. [15] on these new inputs.The obtained indices can be interpreted as fully, partially correlated and independent contributions of the inputs to the output.Nevertheless, this method does not provide a unique orthogonal set of inputs as it depends on the order of the inputs in the original set.Thus, a large number of sets has to be generated for the interpretation of resulting indices.As a different approach, Borgonovo et al. [1,2] initiated the construction of a new generalized moment free sensitivity index.Based on geometrical consideration, these indices measure the shift area between the outcome density and this same density conditionally to a parameter.Thanks to the properties of these new indices, a methodology is given to obtain them analytically through test cases.
Notice that none of these works has given an exact and unambiguous definition of the functional ANOVA for correlated inputs as the one provided by Hoeffding-Sobol decomposition when inputs are independent.Consequently, the exact form of the model has neither been exploited to provide a general variancebased sensitivity measures in the dependent frame.
In a pionnering work, Hooker [10], inspired by Stone [28], shed new lights on hierarchically orthogonal function decomposition.We revisit and extend here the work of Hooker.We obtain hierarchical functional decomposition under a general assumption on the inputs distribution.Furthermore, we also show the uniqueness of the decomposition leading to the definition of new sensitivity indices.Under suitable conditions on the joint distribution function of the input variables, we give a hierarchically orthogonal functional decomposition (HOFD) of the model.The summands of this decomposition are functions depending only on a subset of input variables and are hierarchically uncorrelated.This means that two of these components are orthogonal whenever all the variables involved in one of the summands also appear in the other.This decomposition leads to the construction of generalized sensitivity indices well tailored to perform global SA when the input variables are dependent.In the case of independent inputs, this decomposition is nothing more than the Hoeffding one.Furthermore, our imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 generalized sensitivity indices are in this case the classical Sobol ones.In the general case, the computation of the summands of the HOFD involves a minimization problem under constraints (see Proposition 1).A statistical procedure to approach the solution of this counstrainted optimization problem will be investigated in a next paper.Here, we will focus on the particular case where the inputs are independent pairs of dependent variables (IPDV).Firstly, in the simplest case of a single pair of dependent variables, the HOFD may be performed by solving a functional linear system of equations involving suitable projection operators (see Procedure 1).In the more general IPDV case, the HOFD is then obtained in two steps (see Procedure 2).The first step is a classical Hoeffding-Sobol decomposition of the output on the input pairs, as developped in Jacques et al. [11].The second step is the HOFDs of all the pairs.In practical situations, the non parametric regression function of the model is generally not exactly known.In this case, one can only have at hand some realizations of the model and have to estimate, with this information, the HOFD.Here, we study this statistical problem in the IPDV case.We build estimators of the generalized sensitivity indices and study numerically their properties.One of the main conclusion is that the generalized indices have a total normalized sum.This is not true for classical Sobol indices in the frame of dependent variables.The paper is organized as follows.
In Section 2, we give and discuss general results on the HOFD.The main result is Theorem 1.We show here that a HOFD is available under a boundedness type assumption (C.2) on the density of the joint distribution function of the inputs.Further, we introduce the generalized indices.In Section 3, we give examples of multivariate distributions to which Theorem 1 applies.We also state a sufficient condition for (C.2) and necessary and sufficient conditions in the IDPV case.Section 4 is devoted to the estimation procedures of the components of the HOFD and of the new sensitivity indices.Section 5 presents numerical applications.Through three toy functions, we estimate generalized indices and compare their performances with the analytical values.In Section 6, we give conclusions and discuss future work.Technical proofs and further details are postponed to the Appendix.

Generalized Hoeffding decomposition-Application to SA
To begin with, let introduce some notation.We briefly recall the usual functional ANOVA decomposition, and Sobol indices.We then state a generalization of this decomposition, allowing to deal with correlated inputs.

Notation and first assumptions
We denote by ⊂ the strict inclusion, that is A ⊂ B ⇒ A ∩ B = B, whereas we use ⊆ when equality is possible.
imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 Let (Ω, A, P ) be a probability space and let Y be the output of a deterministic model η.Suppose that η is a function of a random vector X = (X 1 , • • • , X p ) ∈ R p , p ≥ 1 and that P X is the pushforward measure of P by X, Let ν be a σ-finite measure on (R p , B(R p )). Assume that P X << ν and let p X be the density of P X with respect to ν, that is p X = dP X dν .
Also, assume that η ∈ L 2 R (R p , B(R p ), P X ).The associated innner product of this Hilbert space is: Here E(•) denotes the expectation.The corresponding norm will be classically denoted by Let P p := {1, • • • , p} and S be the collection of all subsets of P p .Define S − := S \ P p as the collection of all subsets of P p except P p itself.
Further, let X u := (X l ) l∈u , u ∈ S \ {∅}.We introduce the subspaces of L 2 R (R p , B(R p ), P X ) (H u ) u∈S , (H 0 u ) u∈S and H 0 .H u is the set of all measurable and square integrable functions depending only on X u .H ∅ is the set of constants and is identical to (H 0 ∅ ) u∈S .H 0 u , u ∈ S \ ∅, and H 0 are defined as follows: At this stage, we do not make assumptions on the support of X.For u ∈ S \∅, the support of X u is denoted by X u .

Sobol sensitivity indices
In this section, we recall the classical Hoeffding-Sobol decomposition, and the Sobol sensitivity indices if the inputs are independent, that is when The usual presentation is done when X ∼ U([0, 1] p ) [26], but the Hoeffding decomposition remains true in general case [31].
imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 Let x = (x 1 , • • • , x p ) ∈ R p and assume that η ∈ L 2 (R p , P X ).The decomposition consists in writting η(x) = η(x 1 , • • • , x p ) as the sum of increasing dimension functions: The expansion (1) exists and is unique under one of the hypothesis: Equation (1) tells us that the model function Y = η(X) can be expanded in a functional ANOVA.The independence of the inputs and the orthogonality properties ensure the global variance decomposition of the output as Moreover, by integration, each term η u has an explicit expression, given by: Hence, the contribution of a group of variables X u in the model can be quantified in the fluctuations of Y .The Sobol indices expressions are defined by: Furthermore, However, the main assumption is that the input parameters are independent.This is unrealistic in many cases.The use of expressions previously set up is not excluded in case of inputs' dependence, but they could lead to an unobvious and sometimes a wrong interpretation.Also, technics exploited to estimate them may mislead final results because most of them are built on the hypothesis of independence.For these reasons, the objective of the upcoming work is to show that the construction of sensitivity indices under dependence condition can be done into a mathematical frame.
imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 In the next section, we propose a generalization of the Hoeffding decomposition under suitable conditions on the joint distribution function of the inputs.This decomposition consists of summands of increasing dimension, like in Hoeffding one.But this time, the components are hierarchically orthogonal instead of being mutually orthogonal.The hierarchical orthogonality will be mathematically defined further.Thus, the global variance of the output could be decomposed as a sum of covariance terms depending on the summands of the HOFD.It leads to the construction of generalized sensitivity indices summed to 1 to perform well tailored SA in case of dependence.

Generalized decomposition for dependent inputs
We no more assume that P X is a product measure.Nevertheless, we assume: Our main assumption is : where u c denotes the complement set of u in P p .p Xu and p X u c are respectively the marginal densities of X u and X u c .The section is organized as follows: a preliminary lemma gives the main result to show that H 0 is a complete space.Then, this ensures the existence and the uniqueness of the projection of η onto H 0 .The generalized decomposition of η is finally obtained by adding a residual term orthogonal to every summand, as suggested in [10].The first part of the reasoning is mostly inspired by Stone's work [28], except that our assumptions are more general.Indeed, we have a relaxed condition on the inputs distribution function.Moreover, the support X of X is general.
To begin with, let us state some definitions.In the usual ANOVA context, a model is said to be hierarchical if for every term involving some inputs, all lower-order terms involving a subset of these inputs also appear in the model.Correspondingly, a hierarchical collection T of subsets of P p is defined as follows: imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 The next Lemma is a generalization of the Lemma 3.1 of [28].As already mentioned, it will be the key to show the hierarchical decomposition.
The proof of Lemma 1 is postponed to the Appendix.Our main theorem follows: Pp such that the following equality holds : Moreover, this decomposition is unique.
The proof is given in the Appendix.Notice that, in case where the input variables X 1 , • • • , X p are independent, δ = 1 and Inequality (4) of Lemma 1 is an equality.Indeed, in this case, this equality is directly obtained by orthogonality of the summands.
The variational counterpart of Theorem 1 is a minimization problem under conditional constraints.
Proposition 1. Suppose that (C.1) and (C.2) hold.Let (P) be the minimization problem under constraints: Proof of Proposition 1 is postponed to the Appendix.Notice that a similar result for the Lebesgue measure is given in [10].Its purpose was to provide diagnostics for high-dimensional functions.Here, we will no more exploit this idea.This will be done in a forthcoming work.Instead, we are going to construct stochastic sensitivity indices based on the new decomposition (5) and focus on a specific estimation method for IPDV models.

Generalized sensitivity indices
As stated in Theorem 1, under (C.1) and (C.2), the output Y of the model can be uniquely decomposed as a sum of hierarchically orthogonal terms.Thus, the global variance has a simplified decomposition into a sum of covariance terms.So, we can define generalized sensitivity indices.Definition 2. The sensitivity index S u of order |u| measuring the contribution of X u into the model is given by : More specifically, the first order sensitivity index S i is given by : An immediate consequence is given in Proposition 2 (see proof in the Appendix) : Thus, sensitivity indices are summed to 1. Furthermore, the covariance terms included in these new indices allow to take into account the inputs dependence.Thus, we are now able to measure the influence of a variable on the model, especially when a part of its variability is embedded into the one of other dependent terms.We can distinguish the full contribution of a variable and its contribution into another correlated income.
Note that for independent inputs, the summands η u are mutually orthogonal, so Cov(η u , η v ) = 0, u = v, and we recover the well known Sobol indices.Hence, these new sensitivity indices can be seen as a generalization of Sobol indices.
However, the HOFD and subsequent indices are only obtained under constraints (C.1) and (C.2).In the following, we give illustrations of distribution functions satisfying these main assumptions.

Examples of distribution function
This section is devoted to examples of distribution function satisfying (C.1) and (C.2).The first hypothesis only implies that the reference measure is a product of measures, whereas the second is trickier to obtain.
In the first part, we give a sufficient condition to get (C.2) for any number p of input variables.The second part deals with the case p = 2, for which we give equivalences of (C.2) in terms of copulas.

Boundedness of the inputs density function
The difficulty of Condition (C.2) is that the inequality has to be true for any splitting of the set (X 1 , • • • , X p ) into two disjoint blocks.We give a sufficient condition for (C.2) to hold in Proposition 3 (the proof is postponed to the Appendix): Then, Condition (C.2) holds.
Let give now an example where (C.3) is satisfied.
Example 1: Let ν be the multidimensional gaussian distribution N p (m, Σ) with In the next section, we will see that (C.2) has a copula version when p = 2.We will give some examples of distribution satisfying one of these conditions.

Examples of distribution of two inputs
Here, we consider the simpler case of inputs X = (X 1 , X 2 ).Also, until Section 4, we will assume that ν is absolutely continuous with respect to Lebesgue measure.The structure of dependence of X 1 and X 2 can be modelized by copulas.
Copulas [19] give a relationship between a joint distribution and its marginals.Sklar's theorem [25] ensures that for any distribution function F (x 1 , x 2 ) with marginal distributions F 1 (x 1 ) and F 2 (x 2 ), F has the copula representation, where the measurable function C is unique whenever F 1 and F 2 are absolutely continuous.
The next corollary gives in the absolutely continuous case the relationship between a joint density and its marginal: Corollary 1.In terms of copulas, the joint density of X is given by: Furthermore, Now, Condition (C.2) may be rephrased in terms of copulas: Proposition 4. For a two-dimensional model, the three following conditions are equivalent: The proof of Proposition 4 is postponed to the Appendix.Hence, the generalized Hoeffding decomposition holds for a wide class of examples.The first example is the Morgenstern copulas [18]: Example 2: The expression of the Morgenstern copulas is given by: imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 For θ ∈] − 1, 1[, (C.6) holds, and Let now consider the class of Archimedian copulas, where the generator ϕ is a non negative two times differentiable function defined on [0, 1] with ϕ(1) = 0, ϕ ′ (u) < 0 and A sufficient condition for (C.5) is given in Proposition 5: Then, Condition (C.5) holds.
The proof is straightforward.Now, we will see three illustrative Archimedian copulas satisfying (C.5).

Example 3:
The Frank copula is characterized by the generator: Example 4: Let α < 0, θ > 0 and β with β < −αe −θ .Set Example 5: Let C < 0 and set Leaving the class of copulas, we now directly work with the joint density function.Proposition 6 gives a general form of distribution for our framework: imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 Proposition 6.If p X has the form where f X1 , f X2 are univariate density functions, and g X is any density function (with respect to ν) with marginals f X1 and f X2 , then p X satisfies (C.5).
The proof is straightforward.
Example 6: As an illustration of Proposition 6, take ν = ν L , f X = f X1 f X2 a normal density with a diagonal covariance matrix Σ, and g X a normal density of covariance matrix Ω, with Ω ii = Σ ii , i = 1, 2. Notice that because a copula of Gaussian mixture distribution is a mixture of Gaussian copulas (see [20]), this example can be directly recovered by the copula approach.
Example 7: Let generalize Example 6.If P X is a Gaussian mixture Thus, for many distributions, the generalized decomposition holds, and generalized sensitivity indices may thus be defined.
For the remaining part of the paper, we will assume that the set of inputs is an IPDV.If p is odd, we will assume that an input variable is independent to all the others.The next section is devoted to the estimation of HOFD components.The simplest case of a single pair of dependent variables is first discussed.Then, the more general IPDV case is studied.In this last part, first and second order indices are defined to measure the contribution of each pair of dependent variables and each of its components in the model.Indices of order greater than one involving variables from different pairs will not not be studied here.

Estimation
Using the property of hierarchical orthogonality (H 0 u ⊥ H 0 v , ∀ v ⊂ u), we will see that the summands of the decomposition are solution of a functional linear system.For u ∈ S, the projection operator onto H 0 u is denoted by P H 0 u .
In this section, we present the HOFD terms computation, based on the resolution of a functional linear system.The result relies on projection operators previously set up.Further, we expose the linear system estimation for practice.

Models of p = 2 input variables
This part is devoted to the simple case of bidimensional models.Let Assuming that Conditions (C.1) and (C.2) both hold, we proceed as follows: Procedure 1

HOFD of the output:
3. Computation of the right-hand side vector of ( 18): In this frame, we have: Proposition 7. Let η be any function of L 2 R (R p , B(R p ), P X ).Then, under (C.1) and (C.2), the linear system (18).As the constant term corresponds to the expected value of η, and the residual one can be deduced from the others, the dimension of the system (20) can even be reduced to:

Reduction of the system
imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 5. Practical resolution: The numerical resolution of ( 21) is achieved by an iterative Gauss Seidel algorithm [13] which consists first in decomposing A 2 as a sum of lower triangular (L 2 ) and strictly upper triangular (U 2 ) matrices.Further, the technique uses an iterative scheme to compute ∆.At step k + 1, we have: Using expression of A 2 , we get: 6. Convergence of the algorithm: now, we hope that the Gauss Seidel algorithm converges to the true solution.Looking back at (18) , we see that we only have to consider P H 0 1 (respectively P H 0 2 ) restricted to H 0 2 (respectively to H 0 1 ).Under this restriction, let us define the associated norm operator as : As explained in [7], Gauss Seidel algorithm converges to the true solution ∆ if A 2 is striclty diagonally dominant, which is implied by : i.e.
Equality (19) Hence, by the Jensen inequality [12] for conditional expectations, P H 0 i , i = 1, 2 admits an upper bound: Take U ∈ H 0 1 : The same holds for U ∈ H 0 2 , and we also have P H 0 2 ≤ 1.Moreover, the Jensen's inequality is strict if U is not X i -measurable.As U is a function of X j (that is j = 2 if i = 1 and conversely), the condition of convergence holds if X 1 is not a measurable function of X 2 .Hence, Gauss Seidel algorithm converges if X 1 is not a function of X 2 .
7. Estimation procedure: Suppose that we get a sample of n observations • Estimation of the components of the HOFD: the iterative scheme (23) requires to estimate conditional expectations.As extended in Da Veiga et al [6], we propose to estimate them by local polynomial regression at each point of observation.Then, we use the leave-one-out technique to set the learning sample and the test sample.Moreover, as the local polynomial method can be summed up to a generalized least squares (see Fan and Gijbels [8]), the Sherman-Morrison formula [24] is applied to reduce the computational time.
A more detailed procedure is given in the Appendix.The iterative algorithm is easy to implement.We stop when ∆ (k+1) − ∆ (k) ≤ ε, for a small positive ε.Once (η 1 , η 2 ) have been estimated, we estimate η 0 by the empirical mean of the output.Then, an estimation of η 12 is obtained by substraction.
• We use empirical variance and covariance estimation to estimate sensitivity indices S 1 , S 2 and S 12 .

Generalized IPDV models
Assume that the number of inputs is even, so p = 2k, k ≥ 2. We note each group of dependent variables as By rearrangement, we may assume that: ) SA for IPDV models has already been treated in [11].Indeed, they proposed to estimate usual sensitivity indices on groups of variables via Monte Carlo estimation.Thus, they have interpreted the influence of every group of variables on the global variance.Here, we will go further by trying to measure the influence of each variable on the output, but also the effets of the independent pairs.To begin with, as a slight generalization of [26] and used in [11], let apply the Sobol decomposition on groups of dependent variables, imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 where for u = {u 1 , • • • , u t } and t = |u|, we set Thus, we obtain independent components of IPDV.Under the assumptions discussed in the previous section, we can apply the HOFD of each of these components, that is, In this way, let define some new generalized indices for IPDV models: 2 ) on the output of the model is: The second order sensitivity index for the pair The estimation procedure of these indices is quite similar to Procedure 1: Procedure 2 1. Estimation of (η i ) i=1,••• ,k : as reminded in Part 2.2 with Equations ( 2), Step 7 of Procedure 1 gives method to estimate the conditional expectations.So that, we will have estimations of we apply step 2 to step 7 of Procedure 1, considering η i as the output.
If p is odd, the procedure is the same except that the influence of the independent variable is measured by a Sobol index, as it is independent from all the others.The next part is devoted to numerical examples.

Numerical examples
In this section, we study three examples with dependent input variables.We consider IPDV models and a Gaussian mixture distribution on the input variables.We choose covariance matrices of the mixture satisfying conditions of Example 1.
imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 We give estimations of our new indices, and compare them to the analytical ones, computed from expressions (6).We also compute dispersions of the estimated new indices.In [6], Da Veiga et al. proposed to estimate the classical Sobol indices )/V (Y ), u ⊆ P p , by nonparametric tools.Indeed, the local polynomial regression were used to estimate conditional moments E(Y /X u ), u ⊆ P p .This method, used further, will be called Da Veiga procedure (DVP).Results given by DVP are compared with the ones given by our method.The goal is to show that the usual sensitivity indices are not appropriate in the dependence frame, even if a relevant estimation method is used.

Two-dimensional IPDV model
Let consider the model Here, ν and P X are of the form given by Example 1, with m = µ = 0. Thus, the analytical decomposition of Y is For the application, we implement Procedure 1 in Matlab software.We proceed to L = 50 simulations and n = 1000 observations.Parameters were fixed at σ 1 = σ 2 = 1, ϕ 2 1 = ϕ 2 2 = 0.5, ρ 12 = 0.4 and α = 0.2.
In Table 1, we give the estimation of our indices and their standard deviation (indicated by ±•) on L simulations.In comparison, we give the analytical value of each index.
The analytical classical Sobol indices are difficult to obtain, but we give estimators of the classical Sobol indices with DVP.We notice that estimations with our method give quite good results in comparison with their analytical values.The estimation error of the interaction term is due to the fact that the component η12 is obtained by difference between the output and the other estimated components.
imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 The DVP indices are are difficult to interpret as the sum is higher than 1.In our method, it would be relevant to separate the variance part to the covariance one in the first order indices.Indeed, in this way, we would be able to get the part of variability explained by X i alone in S i , and its contribution hidden in the dependence with X j .We note S v i the variance contribution alone, and S c i the covariance contribution, that is The new indices estimations given in Table 1 are decomposed in Table 2.As previously, the number at the right of ± indicates the standard deviation on L simulations.For each index, the covariate itself explains 28% (in estimation, 25% in reality) of the part of the total variability.However, the contribution embedded in the correlation is not negligible as it represents 14% of the total variance.Considering the shape of the model, and coefficients of parameters distribution, it is quite natural to get the same contribution of X 1 and X 2 into the global variance.Also, as their dependence is quite important with a covariance term equals to 0.4, we are not surprised by the relatively high value of S c 1 (resp.S c 2 ).
From now, we take ρ 12 = 0, i.e. we assume that the inputs are independent.Let compare our new estimated indices with their analytical values in Table 3.We again decompose new indices into a variance (S v i ) and a covariance (S c i ) contribution.Thus, the new indices are well tailored if we have a small idea on inputs dependence in a system.Indeed, Table 3 shows that our new indices take dependence imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 into account if it exists, and the covariance contribution is estimated by 0 if not.New indices recover the classical Sobol indices in case of independence.

Linear Four-dimensional model
The test model is Actually, Condition (C.2) only needs to be satisfied on groups of correlated variables.Let consider the two blocks X (1) = (X 1 , X 3 ) and X (2) = (X 2 , X 4 ) of correlated variables.The previous form of density can be taken for X (1) and X (2) .P X (i) is then the Gaussian mixture The analytical sensitivity indices are given by ( 26) & (27).
For L = 50 simulations and n = 1000 observations, we took ϕ 12 = 0.37 and α 1 = α 2 = 0.2.We see that X 1 has the biggest contribution, whereas the influence of X 4 is very low.It reflects well the model if we look at the coefficients of X i , i = 1, • • • , 4. Also, interaction terms are well estimated, as they are closed to 0. For each case, the dispersion on 50 simulations is very low.As for the DVP estimation, it is once again very high compared with the true indices values.

The Ishigami function
This function is well known in SA ( [23]).It is defined by: Y = sin(X 1 ) + a sin 2 (X 2 ) + bX 3  3 sin(X 1 ) We assume that X 3 is the independent variable, and that X 1 and X 2 are correlated.P X is again the Gaussian mixture α • N 3 (0, With L = 50 simulations of n = 1000 observations, we fixed parameters of distribution at ϕ Every boxplot shows that there is a small dispersion.For all estimations, DVP indices are larger that the new ones.The figure clearly shows that, for all values of a, the sum of these four indices is greater than 1.It again shows their non adaptation to a situation of dependence.If we have a look on the values taken by our new sensitivity indices, we see that, for small values of a, the variable X 1 contributes the most to the model's variability.This role decreases as a increases, and X 2 then gets the biggest contribution.For any value of a, the input X 3 plays a very negligible role, which seems realistic as b is a small fixed value.As for the interaction index S 12 , it is getting bigger with the increasing importance of a, but the contribution remains low.

Conclusions and Perspectives
The Hoeffding decomposition and associated Sobol indices have been widely studied in SA over past years.Recently, a literature appears to treat the case of dependent input variables, in which authors propose different ways to deal with dependence.The goal of this paper is to conciliate the problem of inputs dependence with the Hoeffding decomposition.Indeed, we study a functional ANOVA imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 decomposition in a generalized inputs distribution frame.Thus, we show that a model can be uniquely decomposed as a sum of hierarchically orthogonal functions of increasing dimension.Also, this approach generalizes the Hoeffding's one, as we recover it in case of independence.Similarly to the classical Sobol decomposition, this leads to the construction of new sensitivity indices.They consist of a variance and a covariance contribution able to take into account the possible correlation among variables.In case of independence, these indices are the classical Sobol indices.However, the indices construction is only possible under specific assumptions on the joint distribution function of the inputs.We expose few cases that satisfy these assumptions for any p-dimensional models.More specifically, for two-dimensional models, the required assumption is equivalent to assumptions on copulas.In this context, we give examples satisfying one of these assumptions .Focused on the IPDV models, summands of the decomposition are estimated thanks to projection operations.This leads to the numerical resolution of functional linear systems.The strength of this method is that it does not require to make assumptions on the form of the model or on the structure of dependence.We neither use meta-modelling and avoid in this way many sources of errors.Through three applications on test models, we observe the importance of considering the inputs correlation, and show how our method could catch it.The comparison with estimators of classical indices with DVP shows that the Sobol indices are not appropriate in case of correlations, even when using nonparametric method.Also, when inputs independence holds, the new indices remain well suited to measure sensitivity into a model.Nevertheless, only considering IPDV models for estimation is restrictive.The perspective is to explore other estimation methods suitable for more general models.Also, we intend to lead a systematic study on copulas satisfying or not our assumptions.

A.1. Generalized decomposition for dependent inputs
The upcoming proof follows the guideline of the proof of Lemma 3.1 in Stone [28].

Proof of Lemma 1
By induction on the cardinal of T , let show that • H(1) is obviously true, as T is reduced to a singleton [E(hr(X) This implies Set x = hr(X) and y = u =r hs(X).(31) is rephrased as Further, ( 30) is x, y ≥ − √ 1 − M x • y .Thus, As H(n) is supposed to be true and (31) holds, it follows that: We can deduce that H(n) is true for any collection T of Pp.

Proof of Theorem 1
Let define the vector space K In the first step, we will prove that K 0 is a complete space to prove the existence and uniqueness of the projection of η in K 0 , thanks to the projection theorem [16].Secondly, we will show that η is exactly equal to the decomposition into H 0 , and finally, we will see that each term of the summand is unique.
• We show that H 0 u is closed into Hu (as Hu is a Hilbert space).
Let (hn,u)n be a convergent sequence of H 0 u with hn,u → hu.As (hn,u)n ∈ H 0 u ⊂ Hu complete, hu ∈ Hu.Let v ⊂ u, and hv ∈ H Thus, hu, hv = 0, so that hu ∈ H 0 u .H 0 u is then a complete space.
Let (hn)n be a Cauchy sequence in K 0 and we show that each component is of Cauchy and that hn → h ∈ K 0 .As hn ∈ K 0 , hn = u∈S − hn,u, hn,u ∈ H 0 u .It follows that : hn − hm 2 = u (hn,u − hm,u) 2 ≥ δ #(S − )−1 u∈S − hn,u − hm,u 2 by Inequality ( 4) As (hn)n is a Cauchy sequence, by the above inequality, (hn,u)n is also Cauchy.As hn,u → hu ∈ H 0 u , we deduce that hn −→ n→∞ u∈S − hu = h ∈ K 0 .Thus, K 0 is complete.By the projection theorem, we can deduce there exists a unique element into K 0 such that : • Decomposition of η: following Hooker [10], we introduce the residual term as The local polynomial estimation [8] consists in approximating m(x) = E(Y /X = x) by a q th -order polynomial fitted by a weight least squared estimation.
An explicit solution of m(x) is given by : The leave-one-out technique on local estimation consists in estimating m in every observation point X 1 , • • • , X n , i.e. computing Equation (36) when the k th line of matrices has been removed for estimating m(X k ).It means that we would need to inverse t X −k (x)D −k (x)X −k (x) n times, which is very expensive.To avoid these expensive computations, Sherman and Morrison [24] proposed a formula : Lemma 2. If A is a square invertible matrix, and u, v are vectors such that 1 + t vA −1 u = 0, then A −1 u t vA −1 1 + t vA −1 u (37) In our problem, set S n (x) = t X(x)D(x)X(x).S n (x) can be rewritten as : where Thus, S −k (x), corresponding to t X(x)D(x)X(x) when the k th line has been removed, is of the form : The Sherman-Morrison formula gives : imsart-ejs ver.2011/12/06 file: ps-template.texdate: March 13, 2012 As m(X k ) = t (1 0

Figure 1 Fig 1 .
Figure1displays the dispersion of indices of first order for all variables and second order for grouped variables.We compare them to their analytical values.In the same figure, we also represented the estimators of classical Sobol indices with DVP.

Table 1
Estimation of the new and DVP indices with ρ 12 = 0.4

Table 2
Estimation of S v i and S c i with ρ 12 = 0.4

Table 3
Comparison between analytical and estimated indices with ρ 12 = 0 We want to prove H(n + 1).Choose a maximal set r of T , i.e. r is not a proper subset of any set u in T .We show first that imsart-ejs ver.2011/12/06 file: ps-template.texdate:March13, 2012 it is faster to estimate S −1 n (x) and Φ k (x) at each point of the design.