On a multivariate version of Bernstein's inequality

We prove a multivariate version of Bernstein's inequality about the probability that degenerate $U$-statistics take a value larger than some number $u$. This is an improvement of former estimates for the same problem which yields an asymptotically sharp estimate for not too large numbers $u$. This paper also contains an analogous bound about the distribution of multiple Wiener-Ito integrals. Their comparison shows that our results are sharp. The proofs are based on good estimates about high moments of multiple random integrals. They are obtained by means of a diagram formula which enables us to express the product of multiple random integrals as the sum of such expressions.


Introduction.
Let us consider a sequence of iid. random variables ξ 1 , ξ 2 , . . . , on a measurable space (X, X ) with some distribution µ together with a real valued function f = f (x 1 , . . . , x k ) of k variables defined on the k-th power (X k , X k ) of the space (X, X ) and define with their help the U -statistics I n,k (f ), n = k, k + 1, . . . , I n,k (f ) = 1 k! 1≤j s ≤n, s=1,. ..,k j s =j s ′ if s =s ′ f (ξ j 1 , . . . , ξ j k ) .
( 1.1) We are interested in good estimates on probabilities of the type P n −k/2 k!|I n,k (f )| > u under appropriate conditions. Arcones and Giné in [1] have proved an inequality which can be written in a slightly different but equivalent form as for all u > 0 ( 1.2) with some universal constants c 1 , c 2 and c 3 depending only on the order k of the Ustatistic I n,k (f ) defined in (1.1) if the function f satisfies the conditions |f (x 1 , . . . , x k )| ≤ 1, ( 1.3) f 2 2 = f 2 (x 1 , . . . , x k )µ( dx 1 ) . . . µ( dx k ) ≤ σ 2 , ( 1.4) and it is canonical with respect to the probability measure µ, i.e.

. k} \ {j}.
A U -statistic defined in (1.1) with the help of a canonical function f is called degenerate in the literature. A degenerate U -statistic is the natural multivariate version of sums of iid. random variables with expectation zero. Arcones and Giné called their estimate (1.2) a new Bernstein-type inequality. The reason for such a name is that the original Bernstein inequality (see e.g. [3], 1.3.2 Bernstein inequality) states relation (1.2) in the case k = 1 with the constants c 1 = 2, c 2 = 1 2 and c 3 = 1 3 if the function f (x) satisfies the conditions sup |f (x)| ≤ 1, f (x)µ( dx) = 0 and f 2 (x)µ( dx) ≤ σ 2 .
Let us fix a number C > 0. Formula ( 1.2) states in particular that for all numbers 0 ≤ u < Cn k/2 σ k+1 and degenerate U -statistics I n,k (f ) of order k with a kernel function f satisfying relations ( 1.3) and (1.4) the inequality P n −k/2 k!|I n,k (f )| > u ≤ A exp −B u σ 2/k holds with some appropriate constants A = A(C, k) and B = B(C, k) depending only on the fixed number C and the order k of the degenerate U -statistics. This inequality can be interpreted in the following way: Let us take a random variable η with standard normal distribution. Then P n −k/2 k!|I n,k (f )| > u ≤ AP σ η √ 2B k > u , at least for 0 ≤ u < Cn k/2 σ k+1 . Let us also observe that under condition (1.4) the variance of n −k/2 k!I n,k (f ) is bounded by k!σ 2 , and if the kernel function f is symmetric and there is identity instead of inequality in (1.4), then lim n→∞ Var n −k/2 k!I n,k (f ) = k!σ 2 .
In the above discussion we have considered the probability P n −k/2 k!|I n,k (f )| > u only for 0 ≤ u < Cn k/2 σ k+1 , while formula (1.2) yields an estimate for such a probability for all u > 0. On the other hand, as I shall show later, the above restriction of the parameter u does not mean an important loss of information.
Bernstein's inequality yields an analogous estimate in the case of degenerate Ustatistics of order 1, i.e. for sums n j=1 f (ξ j ) with a sequence of iid. random variables ξ 1 , . . . , ξ n and a function f (x) whose absolute value is bounded by 1, Ef (ξ 1 ) = 0 and Ef (ξ 1 ) 2 = σ 2 . (Actually, Bernstein's inequality is more general, it also yields a bound for the distribution of a sum of independent, not necessarily identically distributed random variables.) But Bernstein's inequality also contains some additional information.
It states that if 0 ≤ u < εn 1/2 σ 2 with a small ε > 0, then P n −1/2 n j=1 f (ξ j ) > u ≤ P ((1 − Cε)σ|η| > u) with an appropriate constant C > 0. Since n −1/2 n j=1 f (ξ j ) has expectation zero and variance σ 2 the above inequality can be interpreted in such a way that at not too large values u the distribution function of the normalized sum n −1/2 n j=1 f (ξ j ) can be bounded by the distribution of a normal random variable with expectation zero and only slightly smaller variance. The main goal of this paper is to show that a similar estimate holds for degenerate U -statistics of any order.
To carry out such a program first we have to find a good multivariate analog of Gaussian random variables. It is natural to consider multiple Wiener-Itô integrals which also appear as the limit of normalized degenerate U -statistics as the sample size tends to infinity. (See e.g. [4]). We shall prove an estimate about the distribution of multiple Wiener-Itô integrals in Theorem 1 and show in Example 2 that this estimate is sharp. The main result of this paper is Theorem 3 which yields an estimate about the tail behaviour of degenerate U -statistics. Its comparison with Theorem 1 shows that Theorem 3 provides an asymptotically sharp estimate on the tail distribution of a degenerate U -statistic for not too large values.
To formulate Theorem 1 let us take a σ-finite measure µ on the space (X, X ) and a white noise µ W with counting measure µ on (X, X ), i.e. a set of jointly Gaussian random variables µ W (A), A ∈ X , such that Eµ W (A) = 0, Eµ W (A)µ W (B) = µ(A ∩ B) for all A ∈ X and B ∈ X . (We also need the identity µ W (A ∪ B) = µ W (A) + µ W (B) with probability 1 if A ∩ B = ∅, but this is the consequence of the previous properties of the white noise. Indeed, they imply that E [µ W (A ∪ B) − (µ W (A) + µ W (B))] 2 = 0 if A ∩ B = ∅, hence the desired identity holds.) The k-fold Wiener-Itô integral of a function f can be defined with respect to a white noise µ W with counting measure µ if f is a measurable function on the space (X k , X k ), and it satisfies relation (1.4) with some σ 2 < ∞. (See e.g [6] or [7].) The expression J µ,k (f ) in formula (1.5) will be called a Wiener-Itô integral of order k. Our first result is the following estimate which is an improvement of the upper bound given in Theorem 6.6 of [7].
Theorem 1. Let us consider a σ-finite measure µ on a measurable space together with a white noise µ W with counting measure µ. Let us have a real-valued function f (x 1 , . . . , x k ) on the space (X k , X k ) which satisfies relation ( 1.4) with some σ 2 < ∞. Take the random integral J µ,k (f ) introduced in formula ( 1.5). This random integral satisfies the inequality with an appropriate constant C = C(k) > 0 depending only on the multiplicity k of the integral.
The following example shows that the estimate of Theorem 1 is sharp.
Proof of the statement of Example 2: We may restrict our attention to the case k ≥ 2.
Itô's formula (see [6] or [7]) states that the random variable k!J µ,k (f ) can be expressed is the k-th Hermite polynomial with leading coefficient 1, and η = f 0 (x)µ W ( dx) is a standard normal random variable. Hence we get by exploiting that the coefficient of x k−1 in the polynomial with an appropriateC > 0 if u σ > B. Since P (k!|J µ,k (f )| > 0) > 0, the above inequality also holds for 0 ≤ u σ ≤ B if the constantC > 0 is chosen sufficiently small. This means that relation (1.7) holds.
The main result of this paper is the following Theorem 3. Let ξ 1 , . . . , ξ n be a sequence of iid. random variables on a space (X, X ) with some distribution µ. Let us consider a function f (x 1 , . . . , x k ) canonical with respect to the measure µ on the space (X k , X k ) which satisfies conditions ( 1.3) and (1.4) with some 0 < σ 2 ≤ 1 together with the degenerate U -statistic I n,k (f ) with this kernel function. There exist some constants A = A(k) > 0 and B = B(k) > 0 depending only on the order k of the U -statistic I n,k (f ) such that for all 0 ≤ u ≤ n k/2 σ k+1 .
Remark: Actually, the universal constant B > 0 can be chosen independently of the order k of the degenerate U -statistic I n,k (f ) in inequality (1.8).
Theorem 3 states in particular that if 0 < u ≤ εn k/2 σ k+1 with a sufficiently small ε > 0, then P (k!n −k/2 |I n,k (f )| > u) ≤ A exp − 1−Cε 1/k 2 u σ 2/k with some universal constants A > 0 and C > 0 depending only on the order k of the U -statistic I n,k (f ). A comparison of this result with Theorem 1 and Example 2 shows that for small ε > 0 this estimate yields the right order in the exponent in first order. This means that for not too large numbers u inequality (1.8) yields an asymptotically optimal estimate.
To understand the previous statement better we can make the following observation: Let us have a probability measure µ on some measurable space (X, X ) together with a sequence of iid. random variables ξ 1 , ξ 2 , . . . , with distribution µ, a real-valued function f 0 (x) on (X, X ) such that f 0 (x)µ( dx) = 0, f 2 0 (x)µ( dx) = 1 and a real number σ. Let us introduce the function f (x 1 , . . . , x k ) = σf 0 (x 1 ) · · · f 0 (x k ) on (X k , X k ) and the Ustatistics I n,k (f ), n = 1, 2, . . . , of order k defined in formula ( 1.1) with this function f . Then the U -statistics I n,k (f ) are degenerate, the normalized U -statistics n −k/2 I n,k (f ) converge in distribution to the multiple Wiener-Itô integral J µ,k (f ) introduced in Example 2 (with the same measure µ and function f which appears in the definition of I n,k (f )) as n → ∞, (see e.g. [4]), and this Wiener-Itô integral satisfies relation (1.7). If the supremum of the function f is bounded by 1, then Theorem 3 can be applied for the U -statistics k!n −k/2 I n,k (f ), and the above considerations indicate that for not too large values u the estimate (1.8) is sharp.
Our goal was to find such an estimate about the distribution function of degenerate U -statistics which is asymptotically optimal for not too large values of this function. Inequality ( 1.8) has this property. In this respect it is similar to Bernstein's inequality which yields such an estimate in the special case k = 1. On the other hand, inequality (1.2) proved in [1], does not supply such a bound. Moreover, the method of paper [1] seems not to be strong enough to yield such an estimate. I return to this question later.
Let us remark that relation (1.2) yields a bound for the tail-distribution of a degenerate U -statistic for all numbers u > 0, while formula (1.8) holds under the condition 0 ≤ u ≤ n k/2 σ k+1 . Nevertheless, formula (1.8) implies an estimate also for u > n k/2 σ k+1 which is not weaker than the estimate (1.2) (at least if we do not fix the universal constants in these estimates). To see this let us first observe that in the case n k/2 ≥ u > n k/2 σ k+1 relation (1.8) holds withσ = un −k/2 1/(k+1) and it yields that right-hand side of (1.2) can be bounded from below by c 1 e −c 2 (u 2 n) 1/(k+1) /c 3 . Thus relation (1.8) implies relation (1.2) if n k/2 ≥ u > n k/2 σ k+1 with possibly worse constants c 1 = A, c 2 andc 3 = 2c 2 (1 + B) 1/k . If u > n k/2 , then the left-hand side of (1.2) equals zero because of the boundedness of the function f , and relation (1.2) clearly holds.
Theorem 3 shows some analogy with large deviation results about the average of iid. random variables. If we fix some number larger than the expected value of the average of some iid. random variables, then by the large deviation theory this average can be larger than this number only with exponentially small probability. The term in the exponent of the formula expressing this probability strongly depends on the distribution of the random variables whose average is taken. But if the above probability is considered at a level only slightly greater than the expectation of the average, then this term in the exponent can be well approximated by the value suggested by the central limit theorem. A similar result holds for the distribution of normalized degenerate U -statistics, n −k/2 k!I n,k (f ). In the case 0 ≤ u ≤ const. n k/2 σ k+1 , with σ 2 = Ef 2 (ξ 1 , . . . , ξ k ) we can get a large deviation type estimate for the probability P (n −k/2 k!I n,k (f ) > u). If 0 ≤ u ≤ εn k/2 σ k+1 with a small ε > 0, then we can say more. In this case such an estimate can be given which is suggested by the behaviour of appropriate non-linear functionals of Gaussian processes.
Let me also remark that in the case u ≫ n k/2 σ k+1 formula (1.8) (or (1.2)) yields only a rather weak estimate for the probability P (n −k/2 k!I n,k (f ) > u) for a degenerate U -statistic of order k with a kernel function f satisfying relation (1.  1.4) for which the lower bounds P (n −1/2 I n,1 (f ) > u) ≥ exp −An 1/2 u log u nσ 2 , and P (n −1 I n,2 (f ) > u) ≥ exp −An 1/3 u 2/3 log u nσ 3 hold if B 2 n 1/2 ≥ u ≥ B 1 n 1/2 σ 2 or B 2 n ≥ u ≥ B 1 nσ 3 respectively with a sufficiently large B 1 > 0 and some appropriate 0 < B 2 < 1. Similar examples of degenerate U -statistics could also be constructed for any order k. Thus there are such degenerate U -statistics of order k with a kernel function satisfying relation (1.3) and ( 1.4) with some σ > 0, whose tail distribution have an essentially different behaviour for u < n k/2 σ k+1 and u ≫ n k/2 σ k+1 .
There is another sort of interesting generalization of Bernstein's inequality. I would refer to a recent work of C. Houdré and P. Reynaud-Bouret [5], where good estimates are given for the distribution of degenerate U -statistics of order 2, but in that paper a more general model is considered. It deals with a natural object we can call generalized U -statistic in the special case k = 2. Generalized U -statistics can be defined similarly to classical U -statistics with the difference that the underlying independent random variables ξ 1 , ξ 2 , . . . may be not identically distributed, and the terms in the sum ( 1.1) are of the form f j 1 ,...,j k (ξ j 1 , . . . , ξ j k ). If the functions f j 1 ,...,j k are canonical, then we can speak about generalized degenerate U -statistics. (The notion of canonical functions can be generalized to this case in a natural way.) The problem about the distribution of generalized degenerate U -statistics can be considered as a multivariate version of the problem about the distribution of sums of independent, but not necessarily identically distributed random variables with expectation zero. Here we do not discuss this question, although it is very interesting. The most essential part of this problem seems to be to find the right formulation of the estimate we have to prove. A good estimate on the distribution of generalized degenerate U -statistics has to depend beside the variance of the U -statistics on different quantities which still should be found.
It is natural to expect that generalized degenerate U -statistics I n,k (f ) of order k (without normalization) satisfy the inequality with some universal constants A = A(k) > 0 and C = C(k) > 0 in a relatively large interval for the parameter u, where V 2 n denotes the variance of I n,k (f ). An essential problem is to find a relatively good constant C and to determine the interval 0 < u < D n , where the estimate (1.9) holds. The result of this paper states that in the case of classical degenerate U -statistics ( 1.9) holds in the interval [0, D n ] with D n = const. n k σ k+1 , where σ 2 = Ef (ξ 1 , . . . , ξ k ) 2 . For k = 1 this means that relation ( 1.9) holds in the interval 0 ≤ u ≤ V 2 n . But it is not clear what corresponds in the general case to the right end-point D n = const. n k σ k+1 of the interval where the estimate (1.9) should hold.
This paper consists of six sections and an Appendix. In Section 2 the method of the proofs is explained. Our results will be proved by means of a good estimate on high (but not too high) moments of the random variables we are investigating. These estimates are obtained by means of a diagram formula which enables us to express product of stochastic integrals or degenerate U -statistics as a sum of such expressions. Section 3 contains the proof of Theorem 1. We formulate a version of the diagram formula about the product of two degenerate U -statistics in Section 4. In Section 5 this result will be generalized to the product of L ≥ 2 degenerate U -statistics, and an estimate is given about the L 2 -norm of the kernel functions appearing in the U -statistics of this result. Theorem 3 will be proved in Section 6. The diagram formula about the product of two degenerate U -statistics is proved in the Appendix.

The idea of the proof.
Theorem 1 will be proved by means of the following Proposition A. Let the conditions of Theorem 1 be satisfied for a multiple Wiener-Itô integral J µ,k (f ) of order k. Then, with the notations of Theorem 1, the inequality

holds.
By the Stirling formula Proposition A implies that The following Proposition B which will be applied in the proof of Theorem 3 states a similar, but weaker inequality for the moments of normalized degenerate U -statistics. (2. 3) The constant C = C(k) in formula (2.3) can be chosen e.g. as C = 2 √ 2 which does not depend on the order k of the U -statistic I n,k (f ).
Let us remark that formula (2.1) can be reformulated as where η is a standard normal random variable. Theorem 1 states that the tail distribution of k!|J µ,k (f )| satisfies an estimate similar to that of σ|η| k . This will simply follow from Proposition A and the Markov inequality with an appropriate choice of the parameter M .
Proposition B states that in the case M 0 ≤ M ≤ εnσ 2 the inequality holds with a standard normal random variable η and a function β(ε), 0 ≤ ε ≤ 1 such that β(ε) → 0 if ε → 0, and β(ε) ≤ C with some universal constant C = C(k) depending only on the order k of the U -statistic for all 0 ≤ ε ≤ 1. This means that certain high but not too high moments of n −k/2 k!I n,k (f ) behave similarly to the moments of k!J µ,k (f ). As a consequence, we can prove a similar, but slightly weaker estimate for the distribution of n −k/2 k!I n,k (f ) as for the distribution of k!J µ,k (f ). Theorem 3 contains the result we can get about the distribution of I n,k (f ) by means of these moment estimates.
The proof of Proposition A is based on a corollary of a most important result about Wiener-Itô integrals called the diagram formula. This result enables us to rewrite the product of Wiener-Itô integrals as a sum of Wiener-Itô integrals of different order. It got the name 'diagram formula' because the kernel functions of the Wiener-Itô integrals appearing in the sum representation of the product of Wiener-Itô integrals are defined with the help of certain diagrams. As the expectation of a Wiener-Itô integral of order k is zero for all k ≥ 1 the expectation of the product equals the sum of the constant terms (i.e. of the integrals of order zero) in the diagram formula. We shall see that Proposition A can be proved relatively simply by means of this corollary of the diagram formula.
We shall also see that there is a version of the diagram formula which enables us to express the product of degenerate U -statistics as a sum of degenerate U -statistics of different order. Proposition B can be proved by means of this version of the diagram formula similarly to the proof of Proposition A. The main difference between their proof is that in the case of the diagram formula for degenerate U -statistics some new diagrams also appear, and their contribution also has to be estimated. It will be shown that if not too high moments of U -statistics are calculated by means of this new version of the diagram formula, then the contribution of the new diagrams is not too large.
The proof of formula (1.2) in [1] also contains the proof of the inequality with some appropriate constant C = C(k) for M ≤ nσ 2 in an implicit way. This estimate is sufficient to the prove relation (1.2), but insufficient to prove Theorem 3.
In this case we need such a sharpened version of inequality (2.4) which contains an asymptotically optimal constant C if M ≤ εnσ 2 with a small coefficient ε > 0. But the method of paper [1] is not strong enough to prove such a sharpened version of (2.4).
One reason for this weakness of the method of paper [1] is that it applies a consequence of Borell's inequality which does not give a sharp inequality. Nevertheless, this inequality could be improved. (See my paper [9].) Another problem is that the proof in [1] contains a decoupling argument of paper [2]. This argument which is needed to apply a multivariate version of the Marcinkiewicz-Zygmund inequality also weakens the universal constants in formula ( 1.2). This difficulty could also be overcome by some clever tricks. But the application of the Marcinkiewicz-Zygmund inequality does not allow to prove relation (2.4) with an optimal constant C. The proof of this inequality is based on a symmetrization argument which implies in particular, that the moments of n −k/2 k!I n,k (f ) are bounded by the (same) moments of a random variable with a constant times greater variance. The influence of this too large variance is inherited in all subsequent estimates, and as a consequence, a method applying a symmetrization argument cannot yield the estimate (2.4) with a sharp constant C. 3. The proof of Theorem 1.
To formulate the corollary of the diagram formula we need in the proof of Proposition A first I introduce some notations.
Let us have a σ-finite measure µ together with a white noise µ W with counting measure µ on (X, X ). Let us consider L real valued functions f l (x 1 , . . For this goal let us introduce the following notations. Put . . , x (l,k l ) ), (3.1) and define a class of diagrams Γ(k 1 , . . . , k L ) in the following way: Each diagram γ ∈ Γ(k 1 , . . . , k L ) is a (complete, undirected) graph with vertices (l, j), 1 ≤ l ≤ L, 1 ≤ j ≤ k l , and we shall call the set of vertices (l, j) with a fixed index l the l-th row of a graph γ ∈ Γ(k 1 , . . . , k L ). The graphs γ ∈ Γ(k 1 , . . . , k L ) will have edges with the following properties. Each edge connects vertices (l, j) and (l ′ , j ′ ) from different rows, i.e. l = l ′ for the end-points of an edge. From each vertex there starts exactly one edge. Γ(k 1 , . . . , k L ) contains all graphs γ with such properties. If there is no such graph, then Γ(k 1 , . . . , k L ) is empty.
If an edge of the diagram γ connects some vertex (l, j) with some other vertex (l ′ , j ′ ), l ′ > l, then we call (l ′ , j ′ ) the lower end-point of this edge, and we denote the set of lower end-points of γ by A γ which has N elements. Let us also introduce the following function α γ on the vertices of γ. Put α γ (l, j) = (l, j) if (l, j) is the lower end-point of an edge, and α γ (l, j) = (l ′ , j ′ ) if (l, j) is connected with the point (l ′ j ′ ) by an edge of γ, and (l ′ , j ′ ) is the lower end-point of this edge. Then we define the function with the function F introduced in (3.1), i.e. we replace the argument x (l,j) by x (l ′ ,j ′ ) in the function F if (l, j) and (l ′ , j ′ ) are connected by an edge in γ, and l ′ > l. Then we enumerate the lower end-points somehow, and define the function B γ (r), 1 ≤ r ≤ N , such that B γ (r) is the r-th lower end-point of the diagram γ. Write . . , x N ) depends on the enumeration of the lower end-points of the diagram γ, but its integral F γ is independent of it.) We shall need the following corollary of the diagram formula.
Theorem A. With the above introduced notation is empty, then the expected value of the above product of random integrals equals zero.) Beside this The proof of Theorem A can be found in Corollary 5.4 of [7] or [6]. The result of [7] actually deals with a different version of Wiener-Itô integrals where their 'Fourier transforms' are considered, and we integrate not with respect to a white noise, but with respect to its 'Fourier transform'. The results obtained for such integrals are actually equivalent to the result formulated in Theorem A. I formulated Theorem A in the present form because generally this version of Wiener-Itô integrals is applied in the literature, and it can be compared better with the diagram formula for the product of degenerate U -statistics applied in this paper. Paper [6] contains the diagram formula for the version of Wiener-Itô integrals considered in this paper. The result of Theorem A which is not contained explicitly in [6] can be deduced from the diagram formula proved in [6] in the same (simple) way as Corollary 5.4 is proved in [7]. Now we turn to the proof of Proposition A.
Proof of Proposition A. Proposition A can be simply proved with the help of Theorem A if we apply it with L = 2M , and the functions f l (x 1 , . . . , where |Γ 2M (k)| denotes the number of diagrams γ in Γ(k, . . . , k

2M times
). Thus to complete the proof of Proposition A it is enough to show that |Γ 2M (k)| ≤ 1 · 3 · 5 · · · (2kM − 1). But this can be seen simply with the help of the following observation. LetΓ 2M (k) denote the class of all graphs with vertices (l, j), 1 ≤ l ≤ 2M , 1 ≤ j ≤ k, such that from all vertices (l, j) exactly one edge starts, all edges connect different vertices, but we also allow edges connecting vertices (l, j) and (l, j ′ ) with the same first coordinate l. Let |Γ 2M (k)| denote the number of graphs inΓ 2M (k). Then clearly |Γ 2M (k)| ≤ |Γ 2M (k)|.
On the other hand, |Γ 2M (k)| = 1 · 3 · 5 · · · (2kM − 1). Indeed, let us list the vertices of the graphs fromΓ 2M (k) in an arbitrary way. Then the first vertex can be paired with another vertex in 2kM − 1 way, after this the first vertex from which no edge starts can be paired with 2kM − 3 vertices from which no edge starts. By following this procedure the next edge can be chosen 2kM − 5 ways, and by continuing this calculation we get the desired formula.
Proof of Theorem 1. By Proposition A, formula (2.2) and the Markov inequality we have Relation (3.7) means that relation (1.6) holds for u ≥ u 0 with the pre-exponential coefficient Ae k . By enlarging this coefficient if it is needed we can guarantee that relation (1.6) holds for all u > 0. Theorem 1 is proved. 4. The diagram formula for the product of two degenerate U -statistics.
To prove Proposition B we need a result analogous to Theorem A about the expectation of products of degenerate U -statistics. To get such a result first we describe the product of two degenerate U -statistics as the sum of degenerate U -statistics of different order together with a good estimate on the L 2 -norm of the kernel functions in the sum representation. The proof of this result will be given in the Appendix. We can get with the help of an inductive procedure a generalization of this result. It yields a representation of the product of several degenerate U -statistics in the form of a sum of degenerate U -statistics which implies a formula about the expected value of products of degenerate U -statistics useful in the proof of Proposition B. This generalization will be discussed in the next section. Let us have a sequence of iid. random variables ξ 1 , ξ 2 , . . . with some distribution µ on a measurable space (X, X ) together with two functions f (x 1 , . . . , x k 1 ) and g(x 1 , . . . , x k 2 ) on (X k 1 , X k 1 ) and on (X k 2 , X k 2 ) respectively which are canonical with respect to the probability measure µ. We consider the degenerate U -statistics I n,k 1 (f ) and I n,k 2 (g) and express their normalized product k 1 !k 2 !n −(k 1 +k 2 )/2 I n,k 1 (f )I n,k 2 (g) as a sum of (normalized) degenerate U -statistics. This product can be written as a sum of U -statistics in a natural way, and then by applying the Hoeffding decomposition for each of these U -statistics as a sum of degenerate U -statistics we get the desired representation of the product of two degenerate U -statistics. The result we get in such a way will be presented in Theorem B. Before its formulation I introduce some notations.
To define the kernel functions of the U -statistics appearing in the diagram formula for the product of two U -statistics first we introduce a class of objects Γ(k 1 , k 2 ) we shall call coloured diagrams. We define graphs γ ∈ Γ(k 1 , k 2 ) that contain the vertices (1, 1), (1, 2), . . . , (1, k 1 ) which we shall call the first row and (2, 1) . . . , (2, k 2 ) which we shall call the second row of these graphs. From each vertex there starts zero or one edge, and all edges connect vertices from different rows. All edges will get a colour +1 or −1. Γ(k 1 , k 2 ) consists of all γ obtained in such a way which we shall call coloured diagrams.
Given a coloured diagram γ ∈ Γ(k 1 , k 2 ) let B u (γ) denote the set of upper endpoints (1, j) of the edges of the graph γ, B (b,1) (γ) the set of lower end-points (2, j) of the edges of γ with colour 1, and B (b,−1) (γ) the set of lower end-points (2, j) of the edges of γ with colour −1. (The letter 'b' in the index was chosen because of the word below.) Finally, let Z(γ) denote the set of edges with colour 1, W (γ) the set of edges with colour −1 of a coloured graph γ ∈ Γ(k 1 , k 2 ), and let |Z(γ)| and |W (γ)| denote their cardinality.
Given two functions f (x 1 , . . . , x k 1 ) and g(x 1 , . . . , x k 2 ) let us define the function (f • g)(x (1,1) , . . . , x (1,k 1 ) , x (2,1) , . . . , x (2,k 2 ) ) = f (x (1,1) , . . . , x (1,k 1 ) )g(x (2,1) , . . . , x (2,k 2 ) ) (4.1) Given a function h(x u 1 , . . . , x u r ) with coordinates in the space (X, X ) (the indices u 1 , . . . , u r are all different) let us introduce its transforms P u j h and O u j h by the formulas At this point I started to apply a notation which may seem to be too complicated, but I think that it is more appropriate in the further discussion. Namely, I started to apply a rather general enumeration u 1 , . . . , u r of the arguments of the functions we are working with instead of their simpler enumeration with indices 1, . . . , r. But in the further discussion there will appear an enumeration of the arguments by pairs of integers (l, j) in a natural way, and I found it simpler to work with such an enumeration than to reindex our variables all the time. Let me remark in particular that this means that the definition of the U -statistic with a kernel function f (x 1 , . . . , x k ) given in formula (1.1) will appear sometimes in the following more complicated, but actually equivalent form: We shall work with kernel function f (x u 1 , . . . , x u k ) instead of f (x 1 , . . . , x k ), the random variables ξ j will be indexed by u s , i.e. to the coordinate x u s we shall put the random variables ξ j u s with indices 1 ≤ j u s ≤ n, and in the new notation formula (1.1) will look like Let us define for all coloured diagrams γ ∈ Γ(k 1 , k 2 ) the function α γ (1, j), 1 ≤ j ≤ k 1 , on the vertices of the first row of γ as α γ (1, j) = (1, j) if no edge starts from (1, j), and α γ (1, j) = (2, j ′ ) if an edge of γ connects the vertices (1, j) and (2, j ′ ).
Given two functions f (x 1 , . . . , x k 1 ) and g(x 1 , . . . , x k 2 ) together with a coloured diagram γ ∈ Γ(k 1 , k 2 ) let us introduce, with the help of the above defined function α γ (·) and (f • g) introduced in (4.1) the function (1,1) , . . . , x α γ (1,k 1 ) , x (2,1) , . . . , x (2,k 2 ) ). (4.4) (In words, we take the function (f • g), and if there is an edge of γ starting from a vertex (1, j), and it connects this vertex with the vertex (2, j ′ ), then the argument x (1,j) is replaced by the argument x (2,j ′ ) in this function.) Let us also introduce the function (4.5) (In words, we take the function (f • g) γ and for such indices (j ′ , 2) of the graph γ from which an edge with colour 1 starts we apply the operator P (2,j ′ ) introduced in formula (4.2) and for those indices (2, j ′ ) from which an edge with colour −1 starts we apply the operator Q (2,j ′ ) defined in formula (4.3).) Let us also remark that the operators P (2,j ′ ) and Q (2,j ′ ) are exchangeable for different indices j ′ , hence it is not important in which order we apply the operators P (2,j ′ ) and Q (2,j ′ ) in formula (4.5).
In the definition of the function (f • g) γ those arguments x (2,j ′ ) of the function (f • g) γ which are indexed by a pair (2, j ′ ) from which an edge of colour 1 of the coloured diagram γ starts will disappear, while the arguments indexed by a pair (2, j ′ ) from which an edge of colour −1 of the coloured diagram γ starts will be preserved. Hence the number of arguments in the function (f • g) γ equals where |B (b,1) (γ)| and |B (b,−1) (γ)| denote the cardinality of the lower end-points of the edges of the coloured diagram γ with colour 1 and −1 respectively, In an equivalent form we can say that the number of arguments of (f • g) γ equals k 1 + k 2 − (2|Z(γ)| + |W (γ)|). Now we are in the position to formulate the diagram formula for the product of two degenerate U -statistics.
Theorem B. Let us have a sequence of iid. random variables ξ 1 , ξ 2 , . . . with some distribution µ on some measurable space (X, X ) together with two bounded, canonical functions f (x 1 , . . . , x k 1 ) and g(x 1 , . . . , x k 2 ) with respect to the probability measure µ on the spaces (X k 1 , X k 1 ) and (X k 2 , X k 2 ). Let us introduce the class of coloured diagrams Γ(k 1 , k 2 ) defined above together with the functions (f • g) γ defined in formulas ( 4
If W (γ) = 0, then the inequality holds. In the general case we can say that if the functions f and g satisfy formula ( 1.3), then also the inequality

holds. Relations (4.7) and (4.8) remain valid even if we drop the condition that the functions f and g are canonical.
Relations (4.7) and (4.8) mean in particular, that we have a better estimate for (f • g) γ 2 in the case when the coloured diagram γ contains no edge with colour −1, i.e. |W (γ)| = 0, than in the case when it contains at least one edge with colour −1.
Let us understand how we define those terms at the right-hand side of (4.6) for which k(γ) = 0. In this case (f •g) γ is a constant, and to make formula (4.6) meaningful we have to define the term I n,k(γ) ((f • g) γ ) also in this case. The following convention will be used. A constant c will be called a degenerate U -statistic of order zero, and we define I n,0 (c) = c.
Theorems B can be considered as a version of the result of paper [8], where a similar diagram formula was proved about multiple random integrals with respect to normalized empirical measures. Degenerate U -statistics can also be presented as such integrals with special, canonical kernel functions. Hence there is a close relation between the results of this paper and [8]. But there are also some essential differences. For one part, the diagram formula for multiple random integrals with respect to normalized empirical measures is simpler than the analogous result about the product of degenerate U -statistics, because the kernel functions in these integrals need not be special, canonical functions. On the other hand, the diagram formula for degenerate U -statistics yields a simpler formula about the expected value of the product of degenerate U -statistics, because the expected value of a degenerate U -statistic equals zero, while the analogous result about multiple random integrals with respect to normalized empirical measures does not hold. Another difference between this paper and [8] is that here I worked out a new notation which, I hope, is more transparent.

The diagram formula for the product of several degenerate U -statistics.
We can also express the product of more than two degenerate U -statistics in the form of sums of degenerate U -statistics by applying Theorem B recursively. We shall present this result in Theorem B ′ and prove it together with an estimate about the L 2 -norm of the kernel functions of the degenerate U -statistics appearing in Theorem B ′ . This estimate will be given in Theorem C. Since the expected value of all degenerate Ustatistics of order k ≥ 1 equals zero, the representation of the product of U -statistics in the form of a sum of degenerate U -statistics implies that the expected value of this product equals the sum of the constant terms in this representation. In such a way we get a version of Theorem A for the expected value of a product of degenerate Ustatistics which together with Theorem C will be sufficient to prove Proposition B. But the formula we get in this way is more complicated than the analogous diagram formula for products of Wiener-Itô integrals. To overcome this difficulty we have to work out a good "book-keeping method".
Let us have a sequence of iid. random variables ξ 1 , ξ 2 , . . . taking values on a measurable space (X, X ) with some distribution µ, and consider L functions f l (x 1 , . . . , x k l ) on the measure spaces (X k l , X k l ), 1 ≤ l ≤ L, canonical with respect to the measure µ. We want to represent the product of L ≥ 2 normalized degenerate U -statistics n −k l /2 k l !I n,k l (f k l ) in the form of a sum of degenerate U -statistics similarly to Theorem B. For this goal I define a class of coloured diagrams Γ(k 1 , . . . , k L ) together with some canonical functions F γ = F γ (f k 1 , . . . , f k L ) depending on the diagrams γ ∈ Γ(k 1 , . . . , k L ) and the functions f l (x 1 , . . . , x k l ), 1 ≤ l ≤ L.
The coloured diagrams will be graphs with vertices (l, j) and (l, j, C), 1 ≤ l ≤ L, 1 ≤ j ≤ k l , and edges between some of these vertices which will get either colour 1 or colour −1. The set of vertices {(l, j), (l, j, C), 1 ≤ j ≤ k l } will be called the l-th row of the diagrams. (The vertices (l, j, C) are introduced, because it turned out to be useful to take a copy (l, j, C) of some vertices (l, j). The letter C was just chosen to indicate that it is a copy.) From all vertices there starts either zero or one edge, and edges may connect only vertices in different rows. We shall call all vertices of the form (l, j) permissible, and beside this some of the vertices (l, j, C) will also be called permissible. Those vertices will be called permissible from which some edge may start.
We shall say that an edge connecting two vertices (l 1 , j 1 ) with (l 2 , j 2 ) or (a permissible) vertex (l 1 , j 1 , C) with another vertex (l 2 , j 2 ) such that l 2 > l 1 is of level l 2 , and (l 2 , j) will be called the lower end-point of such an edge. (The coloured diagrams we shall define contain only edges with lower end-points of the form (l, j).) We shall call the restriction γ(l) of the diagram γ to level l that part of a diagram γ which contains all of its vertices together with those edges (together with their colours) whose levels are less than or equal to l, and tells which of the vertices (l ′ , j, C) are permissible for 1 ≤ l ′ ≤ l. We shall define the diagrams γ ∈ Γ(k 1 , . . . , k L ) inductively by defining their restrictions γ(l) to level l for all l = 1, 2, . . . , L. Those diagrams γ will belong to Γ(k 1 , . . . , k L ) whose restrictions γ(l) can be defined through the following procedure for all l = 1, 2, . . . , L.
The restriction γ(1) of a diagram γ to level 1 contains no edges, and no vertex of the form (1, j, C), 1 ≤ j ≤ k 1 , is permissible. If we have defined the restrictions γ(l − 1) for some 2 ≤ l ≤ L, then those diagrams will be called restrictions γ(l) at level l which can be obtained from a restriction γ(l − 1) in the following way: Take the vertices (l, j), 1 ≤ j ≤ k l , from the l-th row and from each of them either no edge starts or one edge starts which gets either colour 1 or colour −1. The other end-point must be such a vertex (l ′ , j ′ ) or a permissible vertex (l ′ , j ′ , C) with some 1 < l ′ < l which is not an end-point of a vertex in γ(l − 1), and naturally such a vertex can be connected only with one of the vertices (l, j), 1 ≤ j ≤ k l . We define γ(l) first by adjusting the coloured edges constructed in the above way to the (coloured) edges of γ(l − 1), and the set of permissible vertices in γ(l) will contain beside the permissible vertices of γ(l−1) and the vertices (l, j), 1 ≤ j ≤ k l , those vertices (l, j, C) for which (l, j) is the lower end-point of an edge with colour −1 in γ(l). Γ(k 1 , . . . , k L ) will consist of all coloured diagrams γ = γ(L) obtained in such a way.
Given a coloured diagram γ ∈ Γ(k 1 , . . . , k L ) we shall define recursively some (canonical) functions F l,γ with the help of the functions f 1 , . . . , f l together with some constants J n (l, γ) for all 1 ≤ l ≤ L in the way suggested by Theorem B. Then we put F γ = F L,γ and give the desired representation of the product of the degenerate U -statistics with the help of U -statistics with kernel functions F γ and constants J n (l, γ), γ ∈ Γ(k 1 , . . . , k L ), 1 ≤ l ≤ L.
Let us fix some coloured diagram γ ∈ Γ(k 1 , . . . , k L ) and introduce the following notations: Let B (b,−1) (l, γ) denote the set of lower end-points of the form (l, j) of edges with colour −1 and B (b,1) (l, γ) the set of lower end-points of the form (l, j) with colour 1. Let U (l, γ) denote the set of those permissible vertices (l ′ , j) and (l ′ , j, C) with l ′ ≤ l from which no edge starts in the restriction γ(l) of the diagram γ to level l, i.e. either no edge starts from this vertex, or if some edge starts from it, then its other end-point is a vertex (l ′ , j) with l ′ > l. Beside this, given some integer 1 ≤ l 1 < l let U (l, l 1 , γ) denote the restriction of U (l, γ) to its first l 1 rows, i.e. U (l, l 1 , γ) consists of those vertices (l ′ , j) and (l, j ′ , C) which are contained in U (l, γ), and l ′ ≤ l 1 . We shall define the functions F l (γ) with arguments of the form x (l ′ ,j) and x (l ′ ,j,C) with (l ′ , j) ∈ U (l, γ) and (l ′ , j, C) ∈ U (l, γ) together with some constants J n (l, γ). For this end put first (1,1) , . . . , x (k 1 ,1) ) = f 1 (x (1,1) , . . . , x (k 1 ,1) ). (5.1) To define the function F l,γ for l ≥ 2 first we introduce a function α l,γ (·) on the set of vertices in U (l − 1, γ) in the following way. If a vertex (l ′ , j ′ ) or (l ′ , j ′ , C) in U (γ, l − 1) is such that it is connected to no vertex (l, j), 1 ≤ j ≤ k l , then α l,γ (l ′ , j ′ ) = (l ′ , j ′ ), α l,γ (l ′ , j ′ , C) = (l ′ , j ′ , C) and if (l ′ , j ′ ) is connected to a vertex (l, j), then α l,γ (l ′ , j ′ ) = (l, j), if (l ′ , j ′ , C) is connected with a vertex (l, j), then α l,γ (l ′ , j ′ , C) = (l, j). We define, similarly to the formula (4.4) the functions F l,γ (x (l ′ ,j ′ ) , x (l ′ ,j ′ ,C) , (l ′ , j ′ ) and (l ′ , j ′ , C) ∈ U (l, l − 1, γ), x (l,j) , 1 ≤ j ≤ k l ) = F l−1,γ (x α l,γ (l ′ ,j ′ ) , x α l,γ (l ′ ,j ′ ,C) , (l ′ , j ′ ) and (l ′ , j ′ , C) ∈ U (l − 1, γ)) f l (x (l,1) , . . . , x (l,k l ) ), i.e. we take the function F l−1,γ • f l and replace the arguments of this function indexed by such a vertex of γ which is connected by an edge with a vertex in the l-th row of γ by the argument indexed with the lower end-point of this edge. Then we define with the help of the operators P u j and Q u j introduced in (4.2) and (4.3) the functions 3) similarly to the formula (4.5), i.e. we apply for the functionF l (γ) the operators P (l,j) for those indices (l, j) which are the lower end-points of an edge with colour 1 and the operators Q (l,j) for those indices (l, j) which are the lower end-points of an edge with colour −1.
Now we can formulate the following generalization of Theorem B.
Theorem B ′ . Let us have a sequence of iid. random variables ξ 1 , ξ 2 , . . . with some distribution µ on a measurable space (X, X ) together with L ≥ 2 bounded functions f l (x 1 , . . . , x k l ) on the spaces (X k l , X k l ), 1 ≤ l ≤ L, canonical with respect to the probability measure µ. Let us introduce the class of coloured diagrams Γ(k 1 , . . . , k L ) defined above together with the functions F γ = F L,γ (f 1 , . . . , f L ) defined in formulas (5.1)-( 5.4) and the constants J n (l, γ), 1 ≤ l ≤ L given in formula (5.5).
notes the number of lower end-points in the p-th row of γ with colour 1 and |B (b,−1) (p, γ)| is the number of lower end-points in the p-th row of γ with colour −1, 1 ≤ l ≤ L, and define k(γ) = k(γ(L)). Then k(γ(l)) is the number of variables of the function F l,γ , 1 ≤ l ≤ L.
The functions F γ are canonical with respect to the measure µ with k(γ) variables, and the product of the degenerate U -statistics I n,k l (f ), n ≥ max holds.
Theorem B ′ can be deduced relatively simply from Theorem B by induction with respect to the number L of the functions. Theorem B contains the results of Theorem B ′ in the case L = 2. A simple induction argument together with the formulas describing the functions F l,γ by means of the functions F l−1,γ and f l and Theorem B imply that all functions F γ in Theorem B ′ are canonical. Finally, an inductive procedure with respect to the number L of the functions f l shows that relation (5.6) holds. Indeed, by exploiting that formula (5.6) holds for the product of the first L − 1 degenerate U -statistics, then multiplying this identity with the last U -statistic and applying for each term at the right-hand side Theorem B we get that relation (5.6) also holds for the product L degenerate U -statistics.
A simple inductive procedure with respect to l shows that for all 2 ≤ l ≤ L the diagram γ(l) contains k(γ(l)) = In the proof of Proposition B we shall also need an estimate formulated in Theorem C. It is a simple consequence of inequalities (4.7) and (4.8) in Theorem B.
Theorem C. Let us have L functions f l (x 1 , . . . , x k l ) on the spaces (X k l , X k l ), 1 ≤ l ≤ L, which satisfy formulas ( 1.3) and (1.4) (if we replace the index k by index k l in these formulas), but these functions need not be canonical. Let us take a coloured diagram γ ∈ Γ(k 1 , . . . , k L ) and consider the function F γ = F L,γ (f 1 , . . . , f L ) defined by formulas (5.1)- (5.5). The L 2 -norm of the function F γ (with respect to a power of the measure µ to the space, where F γ is defined) satisfies the inequality F γ 2 ≤ 2 |W (γ)| σ (L−U(γ)) , where |W (γ)| denotes the number of edges of colour −1, and U (γ) the number of rows which contain a lower vertex of colour −1 in the coloured diagram γ.
Proof of Theorem C. We shall prove the inequality F l,γ 2 ≤ 2 |W (l,γ)| σ (l−U(l,γ)) for all 1 ≤ l ≤ L, (5.8) where |W (l, γ)| denotes the number of edges with colour 1, and U (l, γ) is the number of rows containing a lower point of an edge with colour −1 in the coloured diagram γ(l). Formula (5.8) will be proved by means of induction with respect to l. It implies Theorem C with the choice l = L.
Relation (5.8) clearly holds for l = 1. To prove this relation by induction with respect to l for all 1 ≤ l ≤ L let us first observe that sup 2 −|W (l,γ)| |F l,γ | ≤ 1 for all 1 ≤ l ≤ L. This relation can be simply checked by induction with respect to l.

First we prove Proposition B.
Proof of Proposition B. We shall prove relation (2.3) by means of Theorem C and identity (5.7) with the choice L = 2M and f l (x 1 , . . . , x k l ) = f (x 1 , . . . , x k ) for all 1 ≤ l ≤ 2M . We shall partition the class of coloured diagrams γ ∈ Γ(k, M ) =Γ(k, . . . , k We can bound the number of coloured diagrams in Γ(k, M, p) by calculating first the number of choices of the 2p permissible vertices from the 2kM vertices of the form (l, j, C) which we adjust to the 2kM permissible vertices (l, j) and then by calculating the number of such graphs whose vertices are the above permissible vertices, and from all vertices there starts exactly one edge. (Here we allow to connect vertices from the same row. Observe that by defining the set of permissible vertices (l, j, C) in a coloured diagram γ we also determine the colouring of its edges.) Thus we get by using the argument at the beginning of Proposition A that |Γ(k, M, p)| can be bounded from above by 2kM 2p 1 · 3 · 5 · · · (2kM + 2p − 1) = 2kM 2p (2kM +2p)! 2 kM +p (kM +p)! . We can write by the Stirling formula, similarly to relation (2.2) that (2kM +2p)! 2 kM +p (kM +p)! ≤ A 2 e kM +p (kM + p) kM +p with Since p ≤ kM we can write (kM + p) kM +p ≤ (kM ) kM 1 + p kM kM (2kM ) p ≤ (kM ) kM +p e p 2 p . The above inequalities imply that as we have claimed.
This estimate together with relation (5.7) and the fact that the constants J n (l, γ) defined in (5.5) are bounded by 1 imply that for kM ≤ ηnσ 2 Hence by formula (6.1) Proof of Theorem 3. We can write by the Markov inequality and Proposition B with η = kM nσ 2 that Observe the √ kM ≤ u σ 1/k , √ kM √ nσ ≤ un −k/2 σ −(k+1) 1/k ≤ 1, and if u σ ≤ n k/2 σ k . If the inequality D ≤ u σ also holds with a sufficiently large D = D(B, k) > 0, then M ≥ M 0 , and the conditions of inequality (6.2) hold. This inequality together with inequality (6.3) yield that 1.8) holds in this case with a pre-exponential constant Ae k . By increasing the pre-exponential constant Ae k in this inequality we get that relation (1.8) holds for all 0 ≤ u σ ≤ n k/2 σ k . Thus Theorem 3 is proved. Let us observe that the above calculations show that the constant B in formula ( 1.8) can be chosen independently of the order k of the U -statistics I n,k (f ).
To prove inequality (4.8) let us introduce, similarly to formula (4.3), the operators Q u j h(x u 1 , . . . , x u r ) = h(x u 1 , . . . , x u r ) + h(x u 1 , . . . , x u r )µ( dx u j ), 1 ≤ j ≤ r, (A6) in the space of functions h(x u 1 , . . . , x u r ) with coordinates in the space (X, X ). (The indices u 1 , . . . , u r are all different.) Observe that both the operatorsQ u j and the operators P u j defined in (4.2) are positive, i.e. these operators map a non-negative function to a non-negative function. Beside this, Q u j ≤Q u j , and the norms of the operators Q u j 2 and P u j are bounded by 1 both in the L 1 (µ), the L 2 (µ) and the supremum norm. Let us define the function (f • g) γ x (j,1) , x (j ′ ,2) , j ∈ {1, . . . , k 1 } \ B u (γ), 1 ≤ j ′ ≤ k 2 (A7) with the notation of Section 4 in the main part. We have defined the function ( f • g) γ with the help of (f • g) γ similarly to the definition of (f • g) γ in (4.5), only we have replaced the operators Q (2,j ′ ) byQ (2,j ′ ) in it. We may assume that g 2 ≤ f 2 . We can write because of the properties of the operators P u j andQ u j listed above and the condition sup |f (x 1 , . . . , x k )| ≤ 1 that where '≤' means that the function at the right-hand side is greater than or equal to the function at the left-hand side in all points, and 1 denotes the function which equals identically 1. Because of relation (A8) it is enough to show that ( 1 • |g|) γ 2 = (2,j)∈B (b,1) (γ) P (2,j) (2,j)∈B (b,−1) (γ)Q (2,j) g(x (2,1) , . . . , x (2,k 2 ) ) 2 ≤ 2 |W (γ)| g 2 . (A9) to prove relation (4.8). But this inequality trivially holds, since the norm of all operators P (2,j) in formula (A9) is bounded by 1, the norm of all operatorsQ (2,j) is bounded by 2 in the L 2 (µ) norm, and |B (b,−1) | = |W (γ)|.