Risk of estimators for Sobol sensitivity indices based on metamodels

Sobol sensitivity indices allow to quantify the respective effects of random input variables and their combinations on the variance of mathematical model output. We focus on the problem of Sobol indices estimation via a metamodeling approach where we replace the true mathematical model with a sample-based approximation to compute sensitivity indices. We propose a new method for indices quality control and obtain asymptotic and non-asymptotic risk bounds for Sobol indices estimates based on a general class of metamodels. Our analysis is closely connected with the problem of nonparametric function fitting using the orthogonal system of functions in the random design setting. It considers the relation between the metamodel quality and the error of the corresponding estimator for Sobol indices and shows the possibility of fast convergence rates in the case of noiseless observations. The theoretical results are complemented with numerical experiments for the approximations based on multivariate Legendre and Trigonometric polynomials.


Introduction
Computational models have penetrated everywhere. Over the past decades, they have become so complicated that there is an increasing need for special methods of their analysis. This analysis is even more important since the advent of artificial neural networks. In the context of AI safety, we have faced the interpretability problem [56]: how can we explain the decision made by the neural network? For instance, such methods of analysis are especially important in medical applications, where almost every decision is accompanied by the most serious consequences [3].
Being an important tool for investigation of computational models, sensitivity analysis tries to find how different model input parameters influence the model output, what are the most influential parameters, and how to evaluate such effects quantitatively [41]. Sensitivity analysis allows to understand the behavior of computational models better. Particularly, it allows us to separate all input parameters into important (significant), relatively important and unimportant (nonsignificant) ones. Important parameters, i.e. parameters whose variability has a strong effect on the model output, may need to be controlled more accurately. As complex computational models often suffer from overparametrization, by excluding unimportant parameters, we can potentially improve model quality, reduce parametrization and computational costs [18].
At its core, sensitivity analysis deals with the variation of the model response caused by the variability of model input parameters, where the latter may be controlled by the experimenter (adaptive design of experiments, active learning setting) or given a-priori (fixed sample [35]). As an example, in [55] image classification with Convolutional Neural Networks is considered, and by occluding different portions of the input image with a grey square and monitoring the output of CNN, it is revealed which parts of the scene are essential for the classification.
Sensitivity analysis includes a wide range of metrics and techniques including the Morris method [10], linear regression-based methods [20], variance-based methods [41], etc. See for details reviews [20,52]. Among all the metrics, we focus on Sobol' (sensitivity) index, a common way to evaluate the nonlinear influence of model parameters [42,40]. Sobol' indices quantify which portions of the output variance are explained by different input parameters and combinations thereof.
The approaches for the evaluation of Sobol' indices are usually divided into Monte Carlo and metamodeling approaches. Monte Carlo approaches run the analyzed model and conduct high-dimensional numerical integration to estimate Sobol' indices using direct formulas such as given in [50], SPF scheme [43], formulas of Kucherenko [27], Myshetzskay [44], Owen [34], etc. They are relatively robust [54], but require a large number of model runs for an accurate estimation of each index; see [11,15,23,50] for more information on the accuracy and convergence of these methods. Thus, Monte Carlo approaches are impractical for a number of applications, where each model evaluation has a high computational cost.
On the other hand, metamodeling approaches allow one to reduce the required number of model runs [20]. Following this approach, we replace the original computational model with an approximating metamodel (also known as surrogate model or response surface) and use this approximation to calculate Sobol' indices. Unlike the original model, the metamodel is easy to simulate and has a known internal structure. Commonly used metamodels include Polynomial Chaos Approximation [48], Gaussian process (GP) approximation [30], local polynomials [12] and others.

Confidence of metamodeling approaches
Following metamodeling approaches, we face the question of confidence in the results obtained: can we guarantee the closeness of the true Sobol' index and its estimate, especially for a small training sample? What size of the training 238 I. Panin sample is needed? Although the metamodeling approach can be more efficient, its accuracy is more difficult to analyze compared to Monte Carlo. Indeed, even though procedures like cross-validation [18,46] allow to evaluate the quality of metamodel, the accuracy of complex statistics (such as Sobol' indices), derived from this metamodel, has a complicated dependence on the metamodel structure and its quality. In general, the approach includes two sources of uncertainty: the metamodel error and the sampling (Monte Carlo) error, where the latter is related to the calculation of Sobol' index from the metamodel (unless it is calculated analytically).
Several methods are proposed in the literature to solve this confidence problem, starting from a simple numerical simulation to more theoretical considerations. In [13], bootstrap-based confidence intervals are constructed for Sobol' indices calculated from the Polynomial Chaos Approximation, and adaptive design is proposed for the computation of Sobol' indices with a given accuracy. [30] proposes to approximate the original model with the Gaussian process metamodel and to use the metamodel simulation to estimate Sobol' indices. In addition, the probability distribution of the estimated Sobol' indices is obtained via numerical integration of the Gaussian process realizations generated from the trained metamodel. According to another method, the sampling error is estimated with the bootstrap technique, and the metamodel error can be obtained as a solution of the multidimensional optimization problem, provided pointwise error bounds for the metamodel are given [22]. An extension of this approach to the Gaussian process metamodel is presented in [29].
Some authors consider the asymptotic approximation of Sobol' indices distribution and confidence intervals using the δ-method [33] in the classical regression setting of non-random design and the presence of random noise in the analyzed model output [5,6,53]. [37] also gives an exact distribution of Sobol' indices estimates in this setting. A similar idea, including the asymptotic approximation through the δ-method, is presented in [21]. In addition, asymptotic properties of conditional moments estimation based on local polynomials are considered in [12].
We should also mention [19], where a general theory on rates of convergence of the least squares estimate is proposed, including the asymptotic convergence of ANOVA components. In addition, general adaptive estimators of quadratic functionals are considered in [28].

Contribution
Although the estimation of Sobol' indices with metamodels is widespread in practice, it lacks a theoretical justification. In this paper, we address the questions: what is the accuracy of metamodels-based Sobol' indices, the risk of the estimate, and its rate of convergence?
Following these questions, we establish a relation between the metamodel error and Sobol' indices errors; propose a method for indices quality control; and obtain the risk of Sobol' indices evaluation in the random design setting for a general class of metamodels.
We assume that a target function is an element of some Hilbert space, and the approximation lies in its finite-dimensional subspace (the linear span of regressors). Two methods are exploited to estimate metamodels parameters: ordinary least squares and the projection (quasi-regression) method. In a non-parametric setting, we study the asymptotic and non-asymptotic behavior of Sobol' indices of all orders, including total-effects. The convergence rates and asymptotic selection of metamodel complexity are studied for different smoothness classes and for various metamodels that use Legendre, Chebyshev, and Trigonometric polynomials.
In addition, we pay special attention to the case of noiseless observations in which the model input parameters completely determine the output of the model i.e. the variance V(y|x) = 0, and which often takes place in practice. Table 1 summarizes the asymptotic results when the dimension of the approximation subspace grows with the increasing sample size. Our findings indicate that the absence of random noise in the output of the analyzed function, along with its high smoothness 1 and low dimension, is one of the main factors that provides the possibility of fast convergence rates of Sobol' indices estimates. Our results also give an opportunity to analyze various experimental designs used in global sensitivity analysis based on metamodels.
The paper is organized as follows: in Section 2, we review the definition of Sobol' indices, describe their evaluation using metamodels with tensor structure, and introduce two methods for the estimation of metamodels parameters. In Section 3, the main results are presented, including a general relation between the accuracy of arbitrary metamodel and the error of estimated indices, and specific relations for the two methods. In addition, a method for quality control is proposed. In Section 4, the obtained results are considered from an asymptotic point of view. Experimental results 2 are provided in Section 5. Proofs are given in Appendix A.

Sobol-Hoeffding decomposition
Let us assume that there is a prescribed 3 probability measure μ on the design space, having the form of tensor product: where μ i is a probability measure over X i . The corresponding distribution represents the uncertainty and/or variability of the input variables, modelled as a random vector x = (x 1 , . . . , x d ) with independent components. In this setting, the model output y = f (x) becomes a stochastic variable.
Suppose that the function f is square-integrable w.r.t. μ i.e. f lies in Hilbert space L 2 (X , μ) of real-valued functions 4 on X square-integrable for μ. We have the following unique Sobol-Hoeffding decomposition of the model output [42,51] given by Due to orthogonality of the summands, we can decompose the variance of the model output: to the output variance, also known as the partial variance.
. . , d}\U (the subset of all input variables except the variables belonging to U). Then for U = {i} the sensitivity index (2.2) can be calculated as We also define the quantity that characterizes the "total" contribution of variables x U : total-effect index also known as total Sobol' index.

d} of model input variables is defined as
Note that T U ∈ [0, 1] and T {1,...,d} = U S U = 1. We can also formally define The question is how to efficiently calculate Sobol' indices?

Sobol' indices and metamodels
Direct calculation of Sobol' indices leads to computationally expensive multidimensional integration. To simplify this problem using the metamodeling approach, one can replace the original function f (x) with the approximationf (x) that is better suited for computing of Sobol' indices.
In general, metamodels-based sensitivity analysis includes the following steps: (a) the selection of the design of experiments; (b) generation of responses of the analyzed model; (c) selection and construction of the metamodel based on the obtained training sample, including its accuracy assessment; (d) the estimation of Sobol' indices using the constructed metamodel. The last step is based either on a known internal structure of the metamodel or on Monte Carlo simulation of the metamodel itself that is computationally cheap.

I. Panin
As such an approximation, we consider an expansion in a series of μorthonormal functions having a form of tensor product. Special cases of such approximation are Polynomial Chaos Approximation, low-rank tensor approximations (LRA) [26], Fourier approximation [36], etc. Besides, by using Karhunen-Loève decomposition one can reduce to this expansion Gaussian process approximation [37]. Similar generalized Chaos Expansions, built on general tensor Hilbert basis, are considered in [39].
Let us formally introduce the approximation of this type. Denote scalar product and norm for g, h ∈ L 2 (X , μ) as g, In addition, the uniform norm g L ∞ g L ∞ (X ) sup x∈X |g(x)|. The norm in Euclidean vector spaces is denoted by · .
Define the regressors. Suppose there exists a function set {Ψ α (x)} in L 2 (X , μ) parametrized with multi-index 5 α = (α 1 , . . . , α d ) ∈ N d that consists of μorthonormal functions having a form of tensor product of d μ i -orthonormal univariate function families {ψ where δ is the Kronecker symbol. As a result, An example of such set for a uniform distribution is multivariate Legendre polynomials which are additionally normalized to have a unit variance w.r.t. a uniform measure on [−1, 1] d (see Section 4.1.1). Remark 1. W.l.o.g. we will consider the set {Ψ α (x)} that consists of an infinite number of elements, but all further statements also remain true for a finite number. For example, such a case can take place for the discrete measure μ. We only assume that there is at least one non-constant element ψ (i) 1 for each input dimension i = 1, . . . , d that leads to {Ψ α (x)} which contains at least the following 2 d elements: Ψ (0,0,...,0) (x) = 1, Ψ (1,0,...,0) (x) = ψ (1) 1 (x 1 ), Ψ (0,1,...,0) (x) = ψ (2) 1 ( Define the metamodel with tensor structure. Let the approximation be the linear combination of N functions from the set {Ψ α (x), α ∈ L N } for some L N ⊂ N d . In other words, (2.5) If the regressors Ψ α are μ-orthogonal polynomials, then (2.5) corresponds to Polynomial Chaos Approximation.
One of the important advantages of the presented approximation type is that it allows to calculate Sobol' indices analytically from the expansion coefficients. Indeed, it can be shown (see [37]) that for the functionf defined as (2.5) using the orthonormal set (2.4) it holds where L U L U [L N ] is the subset of L N that consists of such multi-indices that only indices corresponding to variables x U are nonzero: Similarly, the total-effect index readŝ is the subset of L N that consists of such multi-indices that at least one index corresponding to variables x U is nonzero:

Remark 2. Similar analytical expressions of Sobol' indices for Polynomial
Chaos was obtained in [47], and for Fourier approximation such an approach is exploited in Fourier amplitude sensitivity test (FAST) [9] and Random balance designs (RBD) [49].
In the next section, we consider the strategies for constructing an approximation based on a finite training sample.

Approximation construction
Consider some fixed nested sets of d-dimensional multi-indices (2.9) where |L N | = N for all N ∈ N + . We will refer to every L N as the truncation set, as we assume that all expansion coefficients of the corresponding approximation, not belonging to L N , are zero:ĉ α 0 for α / ∈ L N . Selecting N , we define in some sense the complexity of the approximating model.

I. Panin
Denote the subspace of all linear combinations of Define also the residual (2.11) then the error of the best approximation of f in the space V N w.r.t. μ-norm is e N μ . We have the following representation of arbitrary f ∈ L 2 (X , μ): 12) The observation model. In general, the only information about f comes from the observations. We suppose that for some design of experiments D = {x i ∈ X } n i=1 ∈ R n×d in the samples space X n we can obtain a set of model responses and form a training sample: where η i are i.i.d. realizations of random noise η with Eη = 0 and Vη = σ 2 < ∞, independent from x. In the matrix form . . , f(x n ) T and η = η 1 , . . . , η n ) T . We will consider both the general case of noisy observations and the special noiseless case corresponding σ 2 = 0. Noiseless case may also include situations when we can control the state of random generator inside the analyzed model. Thus, we cannot obtain projection f N directly but can try to do this numerically. We will describe two basic ways to estimate the expansion coefficients in (2.5) from the training sample S of size n.
Projection 6 method is based on quasi-regression [1] and is used in the RBD-like methods [49]. Starting from the representation (2.12) of f , we have Assuming the design is randomly i.i.d. sampled from the probability measure μ, one can estimate the integral f, Ψ α μ numerically. Replacing the scalar product with its empirical approximation based on the training sample, obtain Ordinary least squares (OLS) method is often used for Polynomial Chaos Approximation construction [48] and in many other cases. Replace the theoretical norm in (2.10) with its empirical version: Assuming det(Φ T Φ) = 0, this minimization problem leads tô We will refer to the approximations constructed based on these two methods asf P andf LS correspondingly. Related Sobol' indices, estimated via these two approximations using (2.7), are denoted asŜ P andŜ LS correspondingly. The next question is how the quality of approximation limits the quality of indices.

Risk of Sobol' indices estimation
We begin with a general outline of this section. At first, we consider the relationship of the distance between the function and its arbitrary approximation and the distance between corresponding Sobol' indices (using various distance metrics). Based on this relation, we propose a method for quality control of indices estimates. In other words, the idea of the method is to establish a connection between the error of estimated Sobol' indices that is difficult to measure, and the relatively easily measurable approximation error. In addition, we show that the presented error bounds are achievable and, in this sense, cannot be improved.
The second part of the section is devoted to the risk of Sobol' indices estimates in the random design setting. We generalize the obtained results for the approximations having tensor structure, which were constructed based on the projection and least squares methods with randomly chosen design points.
All further results are valid for both Sobol' indices and total-effects of all orders unless otherwise stated. In order to avoid duplication, we use the notation S U for indices of both types in theorems' statements.

Main relationships
Our first goal is to assure that the closeness in some sense of the function and its arbitrary approximationf ≈ f leads to the closeness of their (total) Sobol' 246 I. Panin indicesŜ U and S U . Note that the opposite is not true in general. To characterize the closeness of functions, we will use the relative error of approximation w.r.t. μ-norm given by . (3.1) The right-hand side of (3.2) contains both S U andŜ U . Solving this inequality w.r.t. S U −Ŝ U and simplifying the solution, obtain an error estimate for the Sobol' index that depends only on the relative error E and the true index S U .

Corollary 1. For any
In particular, assume that Sobol' As we see, the error bound of Sobol' indices is proportional to the fraction of the standard deviation not explained by the approximation, and one can expect a decrease in this error, respectively, with an increase in the quality of the approximation.
The following corollary gives the bound for the sum of errors of Sobol' indices for all 2 d different subgroups of variables (not valid for total-effects).

Corollary 2. For any
. Then the errors of Sobol' and total-effect indices are bounded: Obviously of course, the closeness of all Sobol' indices of two functions does not necessarily lead to the closeness of functions. For example, consider a uniform measure μ on [0, 1] 2 and two functions: Although all Sobol' indices (including total) of these functions are the same, the functions are not equal in the sense of μ-norm, i.e. f −f μ > 0.
Discussion. To the best of our knowledge, for the first time, the explicit relation of the errors of Sobol' indices (total-effects) and the quality of the approximation was obtained. Our results lead directly to a simple, practical method for the estimation of indices errors (see section 3.1.2).
For comparison, we can consider the approach described in [22]. The method involves the construction of a metamodelf and the use of estimation of Sobol The valuesŜ m U andŜ M U are the solutions of some optimization problems 8 connected with the variability of the Monte Carlo estimatorŜ MC . Of course, only specific metamodel types can provide such pointwise error; for example, Gaussian process approximation 9 . In addition, to take into account the sampling error of the index, the bootstrap method is used, and the repetition of the optimization procedure for each bootstrap subsample is required. As a result, we have a computationally demanding scheme with no closed-form solution that relies on the accuracy of the pointwise error estimate.
In contrast to the previous approach, our method takes into account only the "average" accuracy of the approximation which is easier to estimate and which is more robust to outliers compared to pointwise accuracy. In addition, the presented bounds are not associated with the specific type of approximation (see also Remark 3). Thus, these results are valid for arbitrary approximating models such as Polynomial Chaos, Gaussian processes, local polynomials, and others. The results obtained can be especially helpful for methods such as FAST and RBD that use Fourier approximation, but not explicitly. Previously, there were no reliable estimates of the accuracy for these methods.
It should be noted that the obtained error bounds also do not depend on the specific design of experiments for the approximation construction, so they are also valid if the design is associated with a measure other than μ (or even in the case of adaptive design [6,37]). Following this line of thought, one can show that the metamodeling approaches can provide accurate estimates of Sobol' indices for various sampling strategies, which is especially important if we cannot control input sampling. Indeed, suppose that the experimental design is sampled from an unknown probability measure ν on X such that d(μ, ν) sup f { f μ / f ν } is finite (the supremum is taken over all functions square-integrable with respect to both μ and ν). Based on (3.3), and hence a sufficient decrease in f −f ν results in decreasing index error. This setting is connected with the so-called second-level global sensitivity analysis, which aims to quantify the impact of the uncertainty of the input probability distribution [31]. For example, in [17] the robustness of the Sobol' indices to marginal distribution uncertainty is studied by calculating a derivative of the Sobol' index with respect to the probability density functions to determine an optimal perturbation which gives the largest change in the Sobol' index. The second-level analysis requires a very large number of model evaluations, and metamodeling approaches can help circumvent this problem. In this case, the obtained results provide the requirements for the approximation accuracy to ensure the closeness of Sobol' indices and their estimates even when the measure is changed: U are Sobol' indices of f andf w.r.t. the probability measure ν; d(ν, μ) is supposed to be finite.

A new method for quality control
Corollary 1 allows us to propose a new method for the quality control for Sobol' and total-effect indices assessment. From a practical point of view, it may be useful to represent the inequality for errors of indices (3.4) in the form , which follows from the symmetry of Theorem 1 (and consequently Corollary 1) w.r.t. f andf .
Empirically, in practice to estimate E 2 one can replace the theoretical approximation error f −f μ and the variances V μ [f ], V μ [f ] with their sample estimates. Particularly, to replace the true approximation error with Root Mean Square Error (RMSE), obtained with procedures like holdout method. In the noiseless case (σ 2 = 0) one can use the following estimate for the true error f −f μ : is a sample of size n test i.i.d. generated from the measure μ that is independent from the design D and is not used for the construction of approximationf . Note that some types of approximationsf provide analytical expressions for their variances, see for example (2.6).

Remark 3.
One assumption needs to be emphasized. We suppose that the used metamodel allows the Sobol' indices to be calculated analytically, or at least that the computational budget is sufficient to provide a negligibly small error of this calculation. If this is not the case, one can adjust the presented bounds, for example, using the bootstrap method [22].

Remark 4.
The obtained results can also be used as a justification for a thresholding rule. For example, according to (3.5),T U > E 2 for some U implies T U > 0, and one can select a sample estimate of E 2 as the threshold that separates relatively important variables and their groups from completely unimportant.

Tightness of error bounds
We present an illustration of Theorem 1 and Corollary 2 and show that the corresponding bounds are achievable. Consider two functions on X ⊆ R d having simple additive forms, constructed as linear combinations of two elements 10 of the orthonormal set (2.4): 1 (x 1 ) and Ψ (0,1,...,0) (x) = ψ (2) 1 (x 2 ), the corresponding Sobol' and total-effect indices for f andf : Fig. 1). Notice that Moreover, the last inequality turns into equality if we additionally require f,f μ = 0 and set f , f −f μ = 0, see Fig. 1. Thus, we can achieve the bound (3.2) for any S 1 ,Ŝ 1 ∈ [0, 1] except two cases corresponding to f,f μ = 0: S 1 = 0,Ŝ 1 = 1 and S 1 = 1,Ŝ 1 = 0. It can be shown that in these two cases the index error does not reach the upper bound (3.2), but can be arbitrarily close to it. One can summarize these arguments in the following theorem.

Theorem 3. For any
(3.16) Consequently, the error bound (3.3) can be overestimated, but it is achievable. Thus, the problem of accuracy of Sobol' indices estimates is reduced to the assessment of approximation quality.

Risk bounds
In this section, we study how Sobol' indices errors behave when the underlying approximation is constructed using the random design of experiments. At first, we generalize for this case the results of Section 3.1. Then, we consider the errors of Sobol' indices for two presented methods of approximation construction, depending on the size of the training sample and the complexity of the approximating model.

General results
All risk bounds in this paper are obtained under the condition of random design: Consider an arbitrary approximationf of function f constructed based on the training sample S of size n. We assume that there is some fixed deterministic learning procedure that constructs the approximation from the given training sample (2.13): where fixed training sample leads to uniquely defined approximationf .
The following theorem establishes a connection between the risk of estimated Sobol' indices E(S U −Ŝ U ) 2 and E|S U −Ŝ U |, and the quadratic risk of the approximation E f −f 2 μ if averaging occurs over random design points and noise realizations.

I. Panin
In what follows we will assume that V μ [f ] > 0. If we could guarantee that the approximation does not degenerate into constant, then the generalization of Theorem 1 to this setting would be straightforward. However, this is not the case, and we need to take into account the possibility of constant approximation, which leads to undefined Sobol' indices (see Definition 1). For this reason, we additionally define 11 Theorem 4. Letf be an arbitrary approximation of f that is constructed according to (3.17

Corollary 3. Under the assumptions of Theorem 4, it holds for corresponding
Remark 5. Note that Theorem 4 does not impose any assumptions on the structure of approximating model, except (3.17). Particularly, the approximation does not need to have the tensor product form (2.5).

Remark 7. Under the assumptions of Theorem 4, if additionally
can prove an estimate for E(S U −Ŝ U ) 2 that is potentially more accurate for small indices: The proof repeats the proof of Theorem 4, using

Projection method
Theorem 4 considers an approximation of general type. Now, we will study the behavior of the particular method to construct the approximation, starting with the projection (quasi-regression) method defined as (2.14).

Condition 2.
We additionally require f to be bounded on X : In this setting, it is straightforward to estimate a bound for the risk of the approximation (3.20) and to obtain a corollary of Theorem 4 for the case of the projection method.

Theorem 5. Under Condition 1 of random design and boundedness Condition 2, for corresponding Sobol' and total-effect indices of f andf
An obvious corollary of the previous theorem is the sufficient condition for the convergence of Sobol' indices in the mean square sense.

Ordinary least squares
The approximation of the least squares type (2.15) is especially sensitive to the design quality. Although according to (2.4), , there is some probability of illconditioned normalized information matrix Φ T Φ/n that may lead to unbounded approximation withĉ

I. Panin
As a result, typical risk bounds for the least squares method in the random design case are obtained for the approximation with additionally truncated absolute values [16]. One consequence is that we cannot obtain risk bounds for Sobol' indices directly from Theorem 4.
Least squares stability. Let us introduce the numerical characteristic often used in the random design setting that helps us to control the design quality.

Definition 3. For the orthonormal set {Ψ α , α ∈ L N } that satisfies (2.4) and for some fixed nested sets of d-dimensional multi-indices (2.9) define
K N is connected with the inverse Christoffel function [32], defined as which is the diagonal of the orthogonal projection kernel onto V N , and K N depends on the measure μ and space V N . Although, the way one constructs the sequence (2.9) influences K N , it is independent on the choice of the orthonormal basis in V N . It can be shown that K N ≥ N and in general case K N may be arbitrary large [8]. In cases that we will consider in Section 4, K N ∼ N or ∼N 2 .
Denote the spectral norm of matrix A ∈ R m×p as |||A||| = max Following [8], we state a lemma that allows to bound the spectral norm of the normalized information matrix 13 . Lemma 1. Under Condition 1 of random design, for δ ∈ (0, 1) Lemma 1 leads to the condition on n and N that excludes the possibility of ill-conditioned information matrix with high probability.

Condition 3. For some fixed r > 0 the relation of N and n satisfies
Under the Condition 3, we have based on Lemma 1 (3.28) Indices stability. As the least squares estimate (2.16) can be degenerate if det(Φ T Φ) = 0, we additionally defineĉ α = 0 in this case.

Theorem 6. Under Condition 1 of random design and the stability Condition 3, for corresponding Sobol' and total-effect indices of f andf
Note that, as Sobol' indices are bounded, we do not need to truncate large values of the approximation additionally. Besides, the previous result does not require a maximum absolute value of the function.
Consider a noiseless case.

Corollary 5. Under the assumption of Theorem 6 and provided noiseless ob-
We have the following sufficient conditions for indices convergence in the mean square sense.

Asymptotic behavior
In this section, we discuss specific cases of approximation construction using various types of regressors, which are orthonormal w.r.t. different probability measures. For these particular approximations, we consider the interpretation  Table 1.
We suppose that the analyzed function has given smoothness defined as below.

Ordinary least squares in the noiseless case
Following Corollary 5, we start with the approximations based on OLS in the case of noiseless observations (σ 2 = 0, see the observation model in Section 2.2.2).

Legendre polynomials
Consider a uniform input distribution on X = [−1, 1] d . We exploit Legendre polynomials and ordinary least squares to approximate f (x) and then calculate Sobol' indices based on this expansion. The regressors are constructed as a tensor product of normalized univariate Legendre polynomials: where ψ α (x) =ψ α (x)/ ψ α μ andψ α (x) are univariate Legendre polynomials: In all cases in this section to construct the truncation set L N , we use maximum degree truncation scheme that is suitable for our asymptotic analysis. For some α max ∈ N + called maximum degree we have After selection of the probability measure and the regressors, we can estimate K N to ensure the stability Condition 3: Calculation of K N for more general truncation schemes can be found in [7]. Thus, the stability Condition 3 takes the form: It remains to estimate the error e N μ . One can do it using the inequality of multivariate Legendre polynomials with fixed maximal degree coincides with the space of multivariate algebraic polynomials with the same maximal degree 14 , therefore this two spaces have the same d N (f ). According to [19], for multivariate algebraic polynomials of maximal degree α max and p-smooth function f we have up to constant factor 15  For comparison see Table 1.

Chebyshev polynomials
Similarly to previous section, we continue our analysis for Chebyshev polynomials and the arcsine input distribution on X = [−1, 1] d having the density 14 The basis of this space can be constructed as a tensor product of univariate algebraic polynomials 1, x, x 2 , . . . , x αmax . 15 Given two sequences of positive numbers {an} and {bn}, the notation an bn means that an/bn is bounded.

I. Panin
The regressors are constructed as a tensor product of normalized univariate Chebyshev polynomials: with K N estimate: The stability Condition 3 takes the form The error e N μ ≤ d N (f ) is estimated similarly to Section 4.1.1. Thus, Setting r = 2p/d, we obtain for Chebyshev polynomials (4.6)

Trigonometric polynomials
We proceed with the analysis for a uniform input distribution defined on X = [0, 1] d and Trigonometric polynomials. For this case, we additionally require that f (x) can be extended to a 1-periodic function in each of its arguments (which is equivalent to additional boundary conditions). The regressors are constructed as a tensor product of normalized univariate Trigonometric polynomials: Assume α max is even. We have the following K N estimate:

259
The stability Condition 3 takes the form The error e N μ ≤ d N (f ). According to [19,38], for such p-smooth functions Setting r = 2p/d, we obtain for Trigonometric polynomials (4.9)

Noisy case
In the case of observations with noise, we have similar results for Legendre, Chebyshev, and Trigonometric polynomials. Indeed, for the projection method of coefficients estimation, following Theorem 5 obtain Balancing the summands at the right side, find asymptotically optimal N (n) that minimizes the quadratic risk: , (4.10) and the corresponding quadratic risk The presented asymptotics is not improved even if σ 2 = 0. This rate corresponds to the Stone's bound [45]. Back to ordinary least squares. Based on Theorem 6 for Legendre, Chebyshev, and Trigonometric polynomials, it holds Balancing the summands, obtain asymptotically optimal parameters that minimizes the risk (without the stability restriction on K N ) The risk for Chebyshev and Trigonometric polynomials:  Table 1 summarizes asymptotic bounds for the risk of Sobol' indices obtained in a different setting.

Remark 8.
Although the obtained noisy rate (4.13) for the least squares method coincides with the Stone's bound, the corresponding noiseless rates (4.6, 4.9) outperform this bound. There is no contradiction here, because Stone's assumptions [45] imply V(y|x) > 0. See [2,24,25] for more information on noiseless rates.

Test functions
The following two functions are commonly used for benchmarking in global sensitivity analysis.

Error bounds
Experiments in this section illustrate the results provided in 3.1, including justification of theoretical error bounds and the demonstration of the method proposed in 3.1.2. Observations are supposed to be noiseless (σ 2 = 0). Approximating function. All results of Section 3.1 are valid for arbitrary types of approximations. As an example, we use an approximation of the test functions based on multivariate Legendre polynomials (4.1), the least squares method, and hyperbolic truncation scheme [4] corresponding to where α q d i=1 α q i 1/q with fixed q ∈ (0, 1] and t ∈ N + is a fixed maximal total degree of polynomials. The parameters of the truncation scheme are the following: The relative error of approximation E is estimated based on (3.11) with an independent test sample of size 10 6 . As the regressors number is fixed, the quality of the model and indices accuracy cease to improve, starting with some sample size.
As one can see, most of the presented theoretical bounds overestimate the corresponding errors of Sobol' indices, particularly for large sample sizes. Another observation is that estimates of small indices do tend to converge faster. Besides, the bound (3.4) is more accurate for such indices and especially good for zero Sobol' and total-effect indices (in these cases for Ishigami function this bound coincides with E 2 ). The same holds for the indices close to 1.
Sample-based error bounds are shown in Figures 3b, 4b. These bounds are based on relatively small samples and can be used in practice.
The proposed method allows to estimate the error of Sobol' and total-effect indices obtained from the approximating function. It is described in 3.1.2 and is based on (3.10). Following (3.11), we use 15% of the sample for holdout control of the approximation error.
For comparison, we examine sample-based error bounds based on the bootstrap method motivated by [13]. It is used n s = 100 bootstrap subsamples to get error bounds for estimated Sobol' indices from the formula of the sample standard deviation

Fig 2. Different error metrics for Sobol' and total-effect indices and their theoretical error bounds (3.3, 3.6, 3.7). The approximation error E is given by (3.1). Fixed number of regressors.
whereŜ (j) U is the Sobol' index of the subset U ⊆ {1, . . . , d} of input variables obtained from the approximation described above, which was constructed based on the j-th bootstrap subsample. The error bounds for total-effects are calculated similarly. Note that due to random sampling with replacement and the lack of regularization in the least squares method, small sample sizes may lead to degenerate approximations and undefined bootstrap error bounds.
The experiments demonstrate that the bootstrap bounds may be overpessimistic for small sample sizes and overoptimistic for large samples. The latter can be explained by the fact that the method neglects the bias of the approximation, and its error estimate for indices can converge to zero even for inaccurate approximations.
Compared to bootstrap, the proposed method gives more conservative error estimates for large samples and relatively accurate estimates for small ones. Another of its properties is possible instability for very small samples, which is a consequence of high uncertainty of the approximation error obtained with holdout control. As it should be expected, the corresponding sample-based bound converges to the theoretical bound (3.4). Similar to the bound (3.4), the proposed method is more accurate for small (and close to 1) Sobol' and total-effect indices.

Risk bounds
In this section, we will provide an illustration for Theorem 5 and 6. We numerically estimate the quadratic risk of Sobol' and total-effect indices max U E(S U − S U ) 2 , E(T U −T U ) 2 depending on the sample sizes and show how it relates to the components of risk bounds (3.23, 3.29) for the two methods of approximation construction, that is (up to additional constant factors) These components allow us to understand the behavior of the risk better than the obtained risk bounds, which can significantly overestimate the risk. Numerical risk estimation is based on n run = 100 independent random designs (and noise realizations if σ > 0). In order to estimate the best theoretical error e N 2 μ , we follow (3.11) and the least squares method using training and test samples of size 10 6 each. Following (3.27), r is calculated as Approximating function. We use approximations of the test functions based on multivariate Legendre (4.1) and Trigonometric (4.7) polynomials, the least squares and the projection method, and the maximum degree truncation scheme (4.2).
The effects of regressors number and the type of regressors are shown in Figures 5a, 5b, 6a and 6b. Observations are noiseless. For comparison, we use regressors based on Legendre and Trigonometric polynomials, which are both orthogonal w.r.t. a uniform measure. The parameters of the truncation schemes are the following: The experiments demonstrate that the best theoretical error e N 2 μ does influence "limiting risk", which corresponds to almost constant values of estimated risk at large sample sizes. Although this best error can be much bigger than the indices risk, larger value of e N 2 μ in most cases leads to larger limiting risk. As expected, an increase in the number of regressors N corresponds to a decrease in limiting risk but also less stable estimates for small sample sizes. Note that different N may correspond to almost the same approximation errors for the  Figure 6a. In addition, the truncation scheme has a strong influence on the convergence speed (see, for comparison, Figure 2b).
In Figures 5a and 6a, one can see a big difference between the two regressors types when using least squares. Given the maximum degree truncation scheme, trigonometric polynomials with K N = N provide better stability for small sample sizes compared to Legendre polynomials with K N = N 2 . This stability is especially evident when limiting risks for the two types of regressors are close. The  (3.23, 3.29), depending on the regressors number and noise level. effect is not apparent in the case of the projection method. In general, regressors based on trigonometric polynomials perform better in all cases presented. Note, however, that Sobol' g-function and Ishigami can be extended correspondingly to 1-and 2π-periodic functions in each of their arguments, which provides better theoretical error e N 2 μ for the trigonometric approximation of these specific functions compared to "aperiodic" ones.
According to (5.2), the minimal sample size that provides sensible estimates based on Theorem 6 increases considerably with increasing dimension and the maximum degree of the truncation scheme. In particular, in the case of Ishigami function and N = 125 regressors of Legendre type, it needs at least n = 2,102,432 > 10 6 design points to achieve r > 0.
Finally, we can briefly discuss how to select an approximating metamodel for the evaluation of Sobol' indices; particularly, how to choose its structure and the number of regressors?
From a theoretical point of view, the structure of the metamodel should be adapted to the problem. In particular, according to (4.3, 4.5, 4.8, 4.10, 4.12), the number of regressors should depend on the smoothness of the analyzed function and the size of the experimental design. Despite the fact that these results are theoretically justified, they are essentially asymptotic and require a-priori information on the analyzed function.
From a practical perspective, it is important that, according to Theorem 1, more accurate (in the RMSE sense) metamodels are preferable for the estimation of Sobol' indices. Thus, one can customize the metamodel structure for global sensitivity analysis using standard methods of model selection and feature selection [18], which are based on cross-validation, penalization of the metamodel complexity [28], an adaptive construction of the approximation [4,14], etc.
The effect of noise in observations is presented in Figures 5c and 6c. Independent Gaussian noise is added to the output of the test functions. The standard deviation of the noise equals 0, L/10 and 2L with L = max x∈X f (x) . Regressors are based on Legendre polynomials. The parameters of the maximum degree truncation scheme are the following: As one can see, in comparison with least squares, the projection method is more stable for very small sample sizes n ∼ N . Given the moderate sample size and low or zero noise level, the quadratic risk of the estimates for the least squares method decays significantly faster than n −1 and correspondingly than the risk for the projection method. Large noise levels lead to similar behavior of the two methods, and the components of their risks bounds proportional to N/n are close.
We can hypothesize that the three components of the obtained risk (3.29), namely n −r , σ 2 /V μ [f ] · N/n and e N 2 μ /V μ [f ], correspond to three ranges of sample sizes with different rates of risk decay for the least squares method.

Conclusion
We address the problem of the accuracy and risk of metamodels-based Sobol' indices in the random design setting. We obtained a general relation between the accuracy of arbitrary metamodel and the error of estimated Sobol' indices; and the specific asymptotic and non-asymptotic relations for the two methods of parameters estimation for metamodels with tensor structure, including approximations based on multivariate Legendre, Chebyshev, and Trigonometric 268 I. Panin polynomials. Based on the obtained relation, we propose a method for Sobol' indices quality control that demonstrates a good performance compared to an existing approach.
The results indicate that the risk convergence with sample growth for the least squares method in the noiseless case can be much faster in comparison with the noisy case and with the projection method. In particular, it is demonstrated theoretically and (for a certain range of sample sizes n) experimentally that the decay of quadratic risk can be much faster than n −1 with increasing sample size and the number of regressors (see Table 1).
It is shown that the key factors that provide the possibility for fast convergence of Sobol' indices estimates are the absence of noise in the output of the analyzed function, its high smoothness, and low dimension. Besides, the estimates of small and close to 1 indices tend to converge faster, and the obtained bounds and the proposed method are more accurate for such indices.

I. Panin
divide the partial variances into two arbitrary disjoint groups: 2) Notice that all Sobol' indices remain the same if we replace f andf with the functions g = a 1 · f + a 2 and h Consider the inequalities Note that k min = arg min k∈R p − k · q 2 μ for p, q ∈ L 2 (X , μ) corresponds to orthogonal projection of p onto q that leads to q, p − k min · q μ = 0.
3 (3.3) holds true, as its LHS does not exceed 1. Besides, (3.2) is also true, as The functions g and h have the same Sobol' indices as f andf correspondingly. In addition, Thus, it is sufficient to prove the theorem for the functions g and h. To prove that, we will show that for any A defined above 4) Using Sobol-Hoeffding decompositions of g and h In addition, based on the decomposition properties We have the following representations of general Sobol' indices (A.2) (A.8)

5)
We will need an auxiliary bound based on (A.5) and Cauchy-Schwarz inequality: and therefore, 6) Consider LHS of (A.7). Denote α arccos G Fig. 1 for illustration in 2D case), then

A.2. Proof of Corollary 1
1) The case E = 0 is trivial. Let E = 0. The proof for Sobol' indices and totaleffects is the same. Denote δ S U −Ŝ U ∈ [0, 1] and consider the consequence of Theorem 1 for small indices: 2) The solution of the previous inequality w.r.t. δ is the union of intervals given as Based on these inequalities, obtain the bounds for δ: Taking into account 1 (A.14) 3) Consider the consequence of Theorem 1 for the indices close to 1: The obtained inequality is similar to (A.13) and the corresponding solution is 4) The final result is the combination of (A.14) and (A.15).

A.3. Proof of Corollary 2
Using the definition (A.2) of general Sobol' index and the inequality (A.12) from the proof of Theorem 1, obtain U⊆{1,...,d} On the other hand,

I. Panin
Finally, (3.7) follows from Due to the assumption given in Remark 1, there exists Ψ α , Ψ β ∈ L 2 (X , μ) which are the elements of the function set (2.4).

A.4. Proof of
2) Select two functions in L 2 (X , μ): It is easy to see that these functions have Sobol' (and total) indices equal S U andŜ U for x U variables correspondingly. One can check that

A.6. Proof of Theorem 4
1) We use the notation S U for both Sobol' indices and total-effects. Following Condition 1 and (2.13), define the measure ρ over X × R and the measure dρ(D, Y ) = ⊗ n i=1 dρ(x i , y i ) over the training samples space X n × R n . 2) Taking into account (3.17), consider separately two domains in the training samples space X n × R n : χ 0 {(D, Y ): V μ [f ] = 0} and χ + X n × R n \ χ 0 .
4) The risk takes the form:

A.7. Proof of Corollary 3
The proof repeats the proof of Theorem 4 using Corollary 2 and taking into account that if V μ [f ] = 0, then for U ⊆ {1, . . . , d}

A.8. Proof of Theorem 5
1) By definition (2.10), the theoretical orthogonal projection f N is a linear combination of {Ψ α , α ∈ L N }: with some c α ∈ R.
2) Letf P be an approximation based on the training sample of size n and {ĉ α } be the corresponding estimated coefficients. Forĉ α defined as (2.14) it holds for the expectation and for the variance Then, 4) It remains to apply Theorem 4.

A.9. Proof of Theorem 6
Lemma 2. Let Ω + be the subset of X n × R n that includes training samples for which the spectral norm |||Φ T Φ/n − I N ||| ≤ 1/2. Under conditions of Theorem 6 it holds Proof of Lemma 2 1) The proof includes modified versions of Theorem 2 and 3 in [8]. Denote the expansion coefficients of the best approximation (2.10) as c ∈ R N . We have f N (x) = c T Ψ(x). Provided det(Φ T Φ) = 0, 280 I. Panin

6) Thus
, Proof of Theorem 6 1) We use the notation S U for both Sobol' indices and total-effects. Define two subsets of the training samples space X n × R n . Let Ω + {(D, Y ): |||Φ T Φ/n − I N ||| ≤ 1/2} and χ 0 be the subset of Ω + that leads to constant approximation: Note that |||Φ T Φ/n − I N ||| depends only on the design D and is independent from Y .
2) The risk takes the form: where we used (S U −Ŝ LS U ) 2 ≤ 1.

A.10. Proof of Corollary 6
The result follows from the inequality which is based on Lemma 1 and the proofs of Theorem 6 and Lemma 2.