Monte Carlo integration of non-differentiable functions on [0,1], =1,ƒ,d, using a single determinantal point pattern defined on [0,1]d

This paper concerns the use of a particular class of determinantal point processes (DPP), a class of repulsive spatial point processes, for Monte Carlo integration. Let d ≥ 1, I ⊆ d = {1, . . . , d} with ι = |I|. Using a single set of N quadrature points {u1, . . . , uN} defined, once for all, in dimension d from the realization of a specific DPP, we investigate “minimal” assumptions on the integrand in order to obtain unbiased Monte Carlo estimates of μ(fI) = ∫ [0,1]ι fI(u)du for any known ι-dimensional integrable function on [0, 1]ι. In particular, we show that the resulting estimator has variance with order N−1−(2s∧1)/d when the integrand belongs to some Sobolev space with regularity s > 0. When s > 1/2 (which includes a large class of non-differentiable functions), the variance is asymptotically explicit and the estimator is shown to satisfy a Central Limit Theorem. MSC2020 subject classifications: 60G55; 62K99.


Introduction
The paper investigates Monte-Carlo evaluation of the integral μ(f ) = [0,1] ι f (u)du for a known ι-dimensional integrable function on [0, 1] ι using a single set of N quadrature points {u 1 , . . . , u N } defined once for all in dimension d ≥ ι. The same set of nodes, defined in [0, 1] d , may therefore be used to estimate a finite number of different integrals, and therefore this set does not exploit the form of f , (locations of its possible singularities, etc).
Such an approach may be of importance in any application where repeatedly integrating a function over different subsets is needed, (e.g. in sensitivity analysis using experimental designs) or where calculating marginals is required (e.g. for evaluating the marginal likelihood in some parametric statistical models). The first example is emblematic of our motivation.
In the context of computer experiments (see for example [30,Chapter 5]), complex phenomena are simulated using a mathematical model to replace the process which generates the data. Usually, the model depends on a large number of parameters (inputs). An objective of the experiments is to quantify the influence of the variability of the inputs on the variable of interest. An experiment consists in running simulations, where each simulation represents a possible combination of the inputs. It is impossible in practice to consider all possible configurations, the number of simulations being limited. Therefore, the design of experiments, i.e. the choice of combinations of inputs, is of great importance. Under a lack of information on how inputs are linked to outputs, a strategy is to spread chosen inputs to cover as much as possible all the input space. This technique is called space-filling design. It can be summarized by generating N points in a given space which regularly cover this space. Latin hypercubes [20,23], low discrepancy sequences (see e.g. [12,33]) are standard methods to generate designs. The goal of computer experiments is not only to examine the influence of all the inputs on an output of interest, but also the influence of a subset of these inputs, or also the influence of a particular combination of subsets of these inputs. Since computer experiments may be very expensive in terms of computation load and/or storage capacity, the regularity of the coverage of the designs should be conserved when the initial configuration is projected onto lower dimensional spaces. This would allow to use the initial configuration to study the influence of subsets of inputs for example with the same efficiency. Furthermore, integrating a function over different subsets of its variables may be of importance in this context, for example in sensitivity analysis [30,Chapter 7]. When integrating explicitly is impossible, numerical methods are used. We turn our attention here to Monte Carlo integration.
Monte Carlo integration has a long history and it is not the aim of this paper to make a detailed bibliography. We refer the interested reader by an extensive treatment and bibliography to the electronic book by Owen [26]. Let us however cite a few methods keeping in mind what we mean by "minimal" assumptions in the situation ι = d. Crude Monte Carlo methods and importance sampling methods (see e.g. [28]) consist in using i.i.d. nodes {u 1 , . . . , u N } with a so-called proposal density (the uniform density in the usual situation in which the target distribution is also uniform). Under some L 2 ([0, 1] d ) type assumption on the integrand, the resulting estimator denoted by μ N (f ) has a variance proportional to N −1 and satisfies a Central Limit Theorem. When d is large, Monte Carlo Markov Chains (MCMC) methods, where the set of quadrature points is the realization of a particular Markov chain, are usually preferred. When f ∈ L 1 ([0, 1] d ), the variance of μ N (f ) is still of order N −1 and satisfies a CLT (see e.g. [9]). To improve the rate of convergence, the price to pay is to require some regularity assumptions on f . Many methods exist in the literature: grid-based stratified methods [11], possibly combined with antithetic sampling (see Owen [26,Chapter 10]), Quasi Monte Carlo and randomized versions, scrambled nets [8,24,25,26,4], etc. For example, a version of scrambled nets with antithetic sampling can lead to an estimator with variance O(N −3−2/d log(N ) d−1 ) if, to simplify Owen's assumption [25], f is d times continuously differentiable on [0, 1] d . Additional assumptions on the scrambled net are required to obtain a CLT. Grid-based stratified methods which are maybe the first simple alternative to ordinary Monte Carlo methods require that f is continuously differentiable on [0, 1] d and yield an estimator satisfying a CLT with variance asymptotically proportional to N −1−2/d . Let us also mention that Owen [24] showed that a version of scrambled net has a variance o(N −1 ) under the sole assumption that f ∈ L 2 ([0, 1] d ) but the rate is not explicit until strong regularity assumptions are made on f .
In a recent work, Bardenet and Hardy [3] proposed another alternative by defining the nodes as the realization of a repulsive point pattern, and in particular a determinantal point pattern. The class of determinantal point processes (DPPs for short) has received a growing attention in the last decades (see e.g. Soshnikov [34], Shirai and Takahashi [32], Hough et al. [15], Lavancier, Møller and Rubak [17], Decreusefond et al. [7]), thanks to its very appealing properties in particular in terms of tractability and exact simulation. Bardenet and Hardy [3] have defined an Orthogonal Polynomial Ensemble, which is a particular inhomogeneous DPP whose kernel is defined through orthonormal polynomials. Under the assumption that f is continuously differentiable and compactly supported in B ⊂ [0, 1] d the authors obtained an estimator with variance equivalent to an explicit constant times N −1−1/d .
In this paper, we investigate a different DPP. To be more explicit, we consider the most natural kernel, called the Dirichlet kernel in this paper, which is based on the Fourier decomposition of a rectangular subset of N indices of Z d . It is a projection DPP that is a point process which produces almost surely N points. It has the advantage to lead to a homogeneous DPP pattern, an interesting characteristic as we want the pattern to be used to estimate any integral without taking advantage of the integrand f . A second advantage is that the marginals are fully characterized and explicit which means that marginals can efficiently be used to estimate μ N (f I ) for any I ⊂ d = {1, . . . , d}. Last but not least, our main result Theorem 3.1, shows that the resulting estimator μ N (f I ) has asymptotic variance proportional to where H s ([0, 1] ι ) is some Sobolev space with regularity s ≥ 0, see (3.4) for more details. We remind that for periodic functions L 2 ([0, 1] ι ) = H 0 ([0, 1] ι ) and if f I is periodic and continuously differentiable then f I ∈ H 1 ([0, 1] ι ). In particular, our result states that when s > 1/2 (thus potentially for non-differentiable functions), the variance is asymptotically equivalent to an explicit constant times N −1−1/d . In this case, we also obtain a central limit theorem for the estimator (assuming in addition that the integrand is bounded). As a summary, the estimator proposed has the characteristic to exhibit a variance that decreases faster than the ordinary Monte Carlo as soon as s > 0. The decay is slower than methods such as grid-based methods, scrambled nets, etc, but require much less regularity assumptions and can be applied to any ι-dimensional function, ι = 1, . . . , d.
The paper is organized as follows. Section 1 contains a background on spatial point processes, generalities on the projection of spatial point processes and DPPs. We also outline the interest of repulsive point processes and in particular DPPs for Monte Carlo integration. Section 2 introduces the Dirichlet DPP and exposes some of its properties. Our main result, Theorem 3.1, is presented in Section 3. It details convergence results for Monte Carlo integration based on the realization of a Dirichlet DPP. A multivariate version of this CLT is also proposed. Section 4 discusses computational aspects for the simulation of Dirichlet DPPs and contains a simulation study which illustrates our results. In Section 5, we perform a deeper numerical comparison of the sampling used here with several existing methods among which Bardenet&Hardy's, stratified sampling, maximinLHS, etc. Finally, all proofs of the results are postponed to Appendix B.

Spatial point processes
A spatial point process X defined on a Borel set B ⊆ R d is a locally finite measure on B, see for example [22] and references therein for measure theoretical details, whose realization is of the form {x (1) , . . . , x (k) } ∈ B k where k is the realization of a random variable and the x (i) 's represent the events. We assume that X is simple meaning that two events cannot occur at the same location. Thus, X is viewed as a locally finite random set.
In most cases, the distribution of a point process X can be described by its intensity functions ρ (k) By Campbell theorem, see e.g. [22], ρ (k) X is characterized by the following integral representation: for any non-negative measurable function h : where = over the summation means that x (1) , . . . , x (k) are pairwise distinct points. Intuitively, for any pairwise distinct points x (1) , . . . , x (k) ∈ B, ρ (k) X x (1) , . . . , x (k) dx (1) . . . dx (k) is the probability that X has a point in each of the k infinitesimally small sets around x (1) , . . . , x (k) with volumes dx (1) , . . . , dx (k) , respectively. When k = 1, this yields the intensity function simply denoted by ρ X = ρ (1) X . The second order intensity ρ (2) X is used to define the pair correlation function is also set to 0 for all x ∈ B by convention. The pair correlation function (pcf for short) can be used to determine the local interaction between points of X located at x and y: g X (x, y) > 1 characterizes positive correlation between the points; g X (x, y) = 1 means there is no interaction (typically a Poisson point process); g X (x, y) < 1 characterizes negative correlations. A point pattern is often referred to as a repulsive point process, if g X (x, y) < 1 for any x, y ∈ B (see e.g. [16,Section 6.5]). Finally, a point process X with constant intensity function on B is said to be homogeneous.

Projection of a spatial point process
In this work, we sometimes consider projection of spatial point processes. By projection, we mean that we keep a given number of coordinates from the original spatial point process. Such a framework requires that the original point process X must be defined on a compact set B ⊂ R d : otherwise, the configuration of points of the projected point processes may not form a locally finite configuration, as also noticed in the two-dimensional case in [1, p. 17].
This section presents a few notation in this context. Let I ⊆ d := {1, . . . , d} with cardinality |I| = ι. Let B 1 , . . . , B d compact sets of R and B = B 1 ×· · ·×B d , denote by B I its orthogonal projection onto R ι . In particular B I = i∈I B i with B = B d . We denote by P I the orthogonal projection of R d onto R ι . To ease the reading, we let B = B 0 for = 1, . . . , d and even to fix ideas let B 0 = [0,1].
For any x ∈ B, we let x I = P I x and for a point process X defined on B, the projected point process X I = P I X is then defined on B I . Intensity functions and Laplace functionals for P I X can be derived from the corresponding functions and functionals from X, see [19] for more details and Section 2.2 for the particular point process considered in this paper.

Determinantal point processes
In this section, the class of continuous DPPs is introduced. Again, we restrict our attention to DPPs defined on a compact set B ⊂ R d . A point process X on B is said to be a DPP on B with kernel K : B × B → C if for any k ≥ 1 its kth order intensity function is given by and we simply denote by X ∼ DPP B (K). Note that K needs to be non-negative definite to ensure ρ (k) X 0. Our results rely on the spectral decomposition of K, see (1.5). Therefore, we assume that K is a continuous covariance function. The intensity of X is given by ρ X (x) = K(x, x) and its pcf by The popularity of DPPs relies mainly upon (1.3)-(1.4): all moments of X are explicit and since K is Hermitian, g X (x, y) < 1 for any x, y ∈ B. The kernel K defines an integral operator K (see e.g. [6]) defined for any f ∈ L 2 (B) by From Mercer's Theorem [27,Sec. 98], K admits the following spectral decomposition for any x, y ∈ B where {φ j } j∈N are eigenfunctions associated to K and form an orthonormal basis of L 2 (B), and where {λ j } j∈N are the eigenvalues of K satisfying λ j ≥ 0 for any j ∈ N. We abuse notation in the sequel and refer λ j 's to as the eigenvalues of K.
The existence of a DPP on B with kernel K is ensured if its eigenvalues satisfy λ j 1 for any j ∈ N, see e.g. [15,Theorem 4.5.5.]. Eigenvalues and eigenfunctions are indexed here by N in (1.5), but other countable sets could be considered. In particular, the d-dimensional Fourier basis is indexed by Z d .
In that case, we will refer to K as a stationary kernel and will use the abusive notation K(x, y) ≡ K(x − y).
A kernel K such that λ j ∈ {0, 1} for j ∈ N is called a "projection kernel" and the corresponding DPP a "projection DPP". The number of points in B of such a point process is almost surely constant and equal to the number of non-zero eigenvalues of K (see e.g. [17]).

Why are DPPs interesting for Monte Carlo integration?
The repulsive nature of DPPs can be exploited to generate quadrature points that explore nicely the input space. To see this, let B ⊂ R d be a bounded set, f ∈ L 2 (B) and Y a homogeneous point process on B with intensity parameter ρ Y and pair correlation function g Y (a similar result would hold in the inhomogeneous case). Campbell's Theorem (1.1) ensures that the estimator If f is non-negative (or non-positive), Equation (1.7) suggests that using a point processes satisfying g Y < 1 makes the variance smaller than the first term which turns out to be the variance under the Poisson case. It is worth noting that the use of a DPP for this task does not require any sign assumption for f . Indeed, given the fact that and so the second term is always negative. The use of a general DPP, i.e. with random number of points, does not seem to be of great interest: thinking ρ Y = N as a number of points, we claim that the rate of convergence remains the same as in the independent case. This claim is based on the previous empirical study done by [19] which tends to show that the empirical variances of Monte-Carlo integral estimates based on quadrature points from a DPP with, for instance, a Gaussian kernel decrease with rate N −1 .
Therefore it seems natural to focus on the subclass of projections DPPs, i.e. a class for which the number of points is almost surely constant. Bardenet and Hardy [3] have proposed such an approach using an ad-hoc Orthogonal Polynomial Ensemble with N points. This class is a particular inhomogeneous projection DPP on B. As already outlined in the introduction, under the assumption that f ∈ C 1 (B) and that f is compactly supported on some bounded B ⊂ B, it is proved that a central limit theorem holds for the integral estimator with variance decreasing as N −1−1/d . We propose a similar approach, based on a realization of an (N, d)-Dirichlet DPP X. The (N, d)-Dirichlet DPP, detailed in the next section, is a projection DPP based on the Fourier basis. Unlike, the point process proposed by [3], this DPP has the advantage to be homogeneous and its projections X I for I ⊆ {1, . . . , d} are fully characterized (see Section 2.2). Finally, an advantage of our approach is that we do not require that f is continuously differentiable. We only assume that f belongs to some Sobolev space with low regularity parameter, see Section 3 and in particular Theorem 3.1 for more details.

Let us consider the Fourier basis in
Thus, E N is the rectangular subset of Z d with cardinality N which identifies eigenvalues that are all equal to 1. Due to the invariance by translation of the Fourier basis, the kernel (2.1) is a homogeneous kernel. This construction implies that for any . We point out that defining E i as a block of successive n i frequencies (e.g. frequencies centered around 0), would lead to the same DPP [15,Remark 4,p.48]. In particular, if n i is odd, we could consider E i = {− n i /2 , . . . , n i /2 }, which leads to the standard Dirichlet kernel, see e.g. [37]. This justifies the name Dirichlet DPP for this stochastic process.
Such a DPP, which produces almost surely N = d i=1 n i points in B, will be referred to as an (N, d)-Dirichlet DPP. We could wonder why we impose E N to be a rectangular subset of Z d instead of, for instance, the graded lexicographic order used by Bardenet and Hardy [3]. As seen in the next section, the rectangular nature of E N allows us to characterize the distribution of X I for any I ⊆ d.
Let us add that, when d = 1, the kernel K corresponds to the Fourier approximation [17] of the one-dimensional sine-kernel sin(πn i t)/πt (t ∈ (0, 1)), which takes its origins in the joint distribution of the eigenvalues (called the Weyl measure) of a unitary matrix. Asymptotic results involving one-dimensional linear functionals from the sine-kernel appear in several papers, see e.g. [35]. The present paper provides therefore an extension to the d-dimensional case and a more thorough treatment of the statistical application to Monte Carlo integration.
The (N, d)-Dirichlet DPP is a homogeneous point process producing exactly N points. Since we do not want to take any advantage from the function to be integrated (and we could potentially be interested in estimating several integrals), it is very natural to use a homogeneous model. We come back to this question in Section 6.

Projections of an (N, d)-Dirichlet kernel
An (N, d)-Dirichlet kernel (2.1) can be written as the product of d one-dimensional kernels. More precisely, for any I ⊂ d, by denoting N I = i∈I n i and N I c = N/N I , the (N, d)-Dirichlet kernel can always be written as . Projected point processes X I from models with kernels satisfying (2.3) have been studied and characterized in [19]. In particular, for an (N, d)-Dirichlet DPP we have the following result.
In particular, X I has (obviously) N points and k-th order intensity for any pairwise distinct x (1) , . . . , x (k) ∈ B I . Its pcf is therefore given, for any pairwise distinct x, y ∈ B I , by In the above result, the notation det α stands for an α determinant, see e.g. [32] for details on such quantities and for general properties of (α)-DPPs. Proposition 2.1 therefore proposes a full characterization of the distribution of X I . This result is not directly used in our paper (except in Appendix B where we propose an alternative proof to our main result). However, its main consequence (for the paper) is that the pcf of X I is bounded by 1 (see (2.4)). Therefore, for any I, X I remains in the class of repulsive point patterns.
We point out that the DPP proposed by [3] does not satisfy the general assumptions of [19]. Therefore for this DPP, the distribution of X I is not explicit and it is unclear whether X I remains repulsive or not.

Objective
In this section, we study the use of specific DPPs for Monte Carlo integration. To this end, we use notation introduced in Section 1.2. Our objective is to estimate any ι-dimensional integral, for 1 ≤ ι ≤ d, using a Monte Carlo approach and using the same quadrature points. More precisely, let d ≥ 1, I ⊂ d = {1, . . . , d} with cardinality ι = |I| and let f I : B I → R be a measurable function on More assumptions on f I will be given later. We intend to estimate and where we remind the notation (u j ) I = P I u j for any u j ∈ B. In the following, we study asymptotic properties for μ N (f I ).
In this paper, we have chosen to focus on integrals on [0,1] ι for simplicity. It can straightforwardly be extended to rectangles.
We introduce two additional fundamental pieces of notation induced by the choice of the Fourier basis and used in our results. Let f I ∈ L 2 (B I ). The notation f I (j) for j ∈ Z ι stands for the jth Fourier coefficient, i.e.
Finally, we define the space H s (B I ) as the isotropic (with respect to the sup norm) Sobolev space with index s ≥ 0 of square integrable periodic functions by where L 2 per (B I ) is the set of square integrable periodic functions. By f I periodic, we mean that for any i ∈ I Note that, if f I does not satisfy this condition, we can for example consider the function g I (x 1 , . . . , x ι ) = f I (|x 1 | , . . . , |x ι |) which will satisfy this periodic condition on [−1, 1] ι , and μ N (f I ) = 2 −ι μ N (g I ).
In the introduction, we mention that several Monte-Carlo integration methods (stratified-based methods, scrambled nets, etc) require some (strong) regularity assumptions of the integrands. In connection with the Sobolev space H s (B I ), we remind that (see e.g. Robinson [29]) for any ε > 0  (3.5), potentially to non-differentiable functions. In particular, we show that a CLT holds as soon as s > 1/2.
Finally, let us justify why we focus on periodic functions on B I . For a non periodic function which is at least twice continuously differentiable it is a known fact that | f I (j)| = O( j −1 ∞ ). So, the summability condition in (3.4) for such a smooth function would be fulfilled only for s < 1/2, a situation where no CLT is available (even if f I is infinitely differentiable).

Case ι = d
We This simple form of the variance of μ N (f ) invites us to study its asymptotic behavior as N → ∞. Given the fact that N = i n i , we require a specific asymptotic. For i = 1, . . . , d, we assume that (N ν ) ν≥1 and (n i,ν ) ν≥1 are integer sequences indexed by some ν ≥ 1. We assume that each sequence n i,ν tends to ∞ as ν → ∞ and that there exist κ i > 0 for i = 1, . . . , d such that For the sake of conciseness, we skip the dependence in ν. Similarly, when we write N → ∞, we implicitly assume (3.8).
We can now obtain the following asymptotic behavior of the variance of μ N (f ) and for some values of s a central limit theorem. Regarding this asymptotic normality, when d = 1, as mentioned in Section 2, X corresponds, up to a normalization, to the joint distribution of the eigenvalues of a unitary matrix. Linear statistics for such a DPP have been deeply studied in the literature, see e.g. [35] and the references therein. When d > 1, Theorem 3.1 (iii) is therefore original, and relies upon Soshnikov [36, Theorem 1].
Theorem 3.1. Consider the asymptotic framework (3.8) and assume that f ∈ H s (B). Then, we have the following statements.
Let us rephrase Theorem 3.1 (i): if f ∈ H s (B) for some s > 0, then necessarily Var[ μ N (f )] = o(N −1 ). That is, the variance of the estimator proposed decreases to 0 faster than the standard Monte Carlo estimator, as soon as for some ε > 0, which is a very weak assumption. Theorem 3.1 (ii) shows that the variance σ 2 (f ) can be approximated by σ 2 N (f ) given by which has the interest to avoid the constants κ i defined in (3.8). We point out that (3.11) does not provide an estimate of σ 2 (f ). We come back to this question in Section 6.

Discussion on σ 2 (f )
In this section, we would like to discuss how the value of σ 2 (f ) varies with d and how this variance compares to standard Monte-Carlo method and the stratifiedbased Monte-Carlo method (which are important alternatives for which a central limit theorem is available). For the sake of simplicity, we set in this section where j p p = d l=1 |j l | p . With such notation and recalling that κ i = 1 here we get σ 2 (f ) = f 2 The standard Monte-Carlo approach (resp. the stratified-based Monte-Carlo method, see [26]) assumes that f ∈ L 2 ([0, 1] d ) (resp. f is continuously differentiable on [0, 1] d ). The corresponding estimator has variance descreasing as N −1 (resp. N −1−2/d ) with asymptotic constants given by The latter expression ensues from Parseval's identity and standard expression of Fourier coefficients for continuously differentiable functions.
To compare these three constants in terms of d, we consider two particular cases for f : (i) f as a product of one-dimensional functions: ∃f 0 ∈ L 2 ([0, 1]) such that (ii) f as a sum of one-dimensional functions: ∃f 0 ∈ L 2 ([0, 1]) such that We are now able to present the following result.

Proposition 3.2.
(i) Let f be defined as (3.12) and satisfying the appropriate assumptions of the considered method, then (ii) Let f be defined as (3.13) and satisfying the appropriate assumptions of the considered method, then Proposition 3.2 shows that when f is given as the product (resp. sum) of one-dimensional functions, then σ 2 (f ) varies exponentially (resp. linearly) with the dimension d. This result points out that the constants for the standard Monte-Carlo and stratified-based Monte-Carlo methods vary along the same lines.

Case I ⊂ d
We now consider the situation where we estimate μ(f I ) (on B I ) based on {u 1 , . . . , u N } which is an (N, d)-Dirichlet DPP on B. In this section, we naturally assume that d > 1. The interest of the contruction of the Dirichlet-DPP is revealed by Corollary 3.1 which, briefly, states that Theorem 3.1 can be applied to functions of the form f ↑ where and where f I (j) is given by (3.3).
(iii) If s > 1/2 and f I is bounded, then as N → ∞ The asymptotic constant σ 2 (f I ) can still be approximated by (3.11). The proof of this result is a straightforward consequence of Theorem 3.1. Another approach using the fact that X I is distributed as an α-DPP is proposed in Appendix B.
To rephrase Corollary 3.1, we can estimate μ(f I ) for any ι-dimensional function f I with the rate of convergence √ N 1+1/d (if f I ∈ H s (B I ) with s > 1/2), which corresponds to the rate of convergence from the dimension where the points were generated. This is the price to pay to be able to estimate any function in any dimension. Of course, if one knows beforehand that we would like to estimate only, say one-dimensional integrals, one could use the decomposition N = N × 1 × · · · × 1 for the set E N of eigenvalues, see (2.2). Using such a DPP (in dimension d > 1), we claim that any one-dimensional integral could be estimated with a rate of convergence √ N 2 which would be a gain with respect to √ N 1+1/d . However, the resulting DPP would be catastrophic to estimate μ(f I ) for any I such that ι > 1.

Multivariate central limit theorem
We can combine Theorem 3.1 and Corollary 3.1 to obtain a multivariate version of the central limit theorem.
and μ p = μ(f I1 , . . . , μ(f Ip )) . Then, under the asymptotic framework (3.8), μ N,p is an unbiased estimator of μ p and as in distribution, where Σ p is the (p, p) Hermitian matrix with entries

Simulation study
We propose now a simulation study to illustrate the results. We first consider the setting of Theorem 3.1 (ii)-(iii) and Corollary 3.1 (ii)-(iii), i.e. for the case s > 1/2 (when d > 1) and and for the situation s < 1/2.

Case s > 1/2, illustration of Theorem 3.1 (ii)-(iii)
We consider three different functions with different regularity properties: • Bump function . (4.1) • Sum of cosines It is worth mentioning that f bump is infinitely continuously differentiable, so f bump ∈ H s , for any s > 0. The function f mixcos is a non-differentiable (with 4 d singularity points) which satisfies f mixcos (j) = O( j −2 ∞ ), so f mixcos ∈ H s (B) for any s < 3/2. f mixcosprod is also a non-differentiable function, which satisfies , whereby it can be deduced that f mixcosprod ∈ H s (B) for any s < 3/2. Finally, f γ is also a non-differentiable square integrable periodic function, which satisfies f γ (j) = O( j −1−γ ∞ ). Hence, f γ ∈ H s (B) for any s < 1/2 + γ. In the following, we consider the cases γ = 0.25 and γ = 0.75. These five test functions are depicted for d = 2 in Figure 1.
We use the simulation algorithm provided by Hough et al. [14]. Basically, it relies upon a Gram-Schmidt orthogonalization of N vectors with dimension N , with a cost of order N 3 , and a costly rejection sampling step for the simulation of each point location. Sampling DPPs with large N or when d > 4 is very time consuming using the standard R package spatstat [2]. Therefore, we have reimplemented an R package based on C++ functions (available at https://github.com/AdriMaz/rcdpp/). We perform our experiments in very reasonable computing time, with a basic laptop (2,3 GHz Intel Core i5 processor, 8 Go (2133 MHz DDR4) of RAM). For example, sampling a DPP with N = 1000 in dimension 6 can be performed within a reasonable time (approximately one minute). For each d, each function and each replication of the point pattern, we evaluate the estimator μ N (f I ) given by (3.1) with I = d. To visualize the rate of convergence of the variance, we perform a linear regression of the logarithm of empirical variances in terms of log(N ). According to Theorem 3.1 (ii), the expected slope is −1−1/d. For each test function, d and N , we also test the normality of estimates using the Shapiro-Wilk test [31], after adjusting the p-values using Holm procedure [13] for each function and each d. Results are reported in Figure 2. The fourth columns of the summary tables expose the usual Studentt confidence interval for the slopes. The values of N for which the normality assumptions has been rejected, i.e. for which adjusted p-value is smaller than 0.05, are represented by crosses instead of regular dots. We still made the arbitrary choice to keep them when performing the regression lines. We have also arbitrarily translated the curves to focus the interpretation on the slopes. Thus in Figures 2-7 (as well as figures presented in Appendix C) the y-axis has no meaning.
Most of results are in a clear agreement with our theoretical result. For small values of d, the asymptotic normality is not rejected even for small sample size N , and the slope of the logarithm of empirical variances is very close to −1−1/d.
For d = 10, the theoretical results are hardly recovered. This is related to the values of N considered: the values of the n i 's are very small (several of them are actually equal to 1).
The structure of the integrands affects the quality of empirical results: for functions f bump and f mixcosprod , normality requires higher values of N . The expected slope is also not always included in the confidence intervals. However, this might be related to usual observations when estimating integral of functions defined as (3.12), as illustrated in Section 5.

Case s > 1/2, illustration of Corollary 3.1 (ii)-(iii)
We perform similar experiments. We set the dimension to d = 6. For each configuration {x (1) , . . . , x (N ) } of (N, 6)-Dirichlet DPPs, we evaluate the estimator μ N (f I ) given by (3.1) with I ⊂ d and |I| = ι = 1, . . . , 6. In other words, we use 6-dimensional configurations of points to estimate integrals of ι-dimensional integrands. Let us precise that a single realization (N, 6)-Dirichlet is used for each value of ι. However, the directions kept when projecting are chosen randomly. Results are illustrated in Figure 3, as for the previous case. The conclusions are quite similar to the previous case: the points remain nicely aligned along the regression lines and confidence intervals are in agreement with theoretical slopes, all equal to −1 − 1/6 ≈ −1.17 in this situation. It seems that normality is hardly hinted when a design is projected on low dimensional space: most of the values of N are rejected by the Shapiro-Wilk tests for ι = 1, 2. We also conduct a similar experiment starting with realizations of the Dirichlet DPP in dimension d = 10, see Figure 4. Similar comments can be done from this situation.  = 1 . . . 6, 10). A • (resp. ×) indicates that the adjusted p-value of the Shapiro-Wilk test is not smaller (resp. smaller) than 5%.

Case s < 1/2
We now intend to illustrate Theorem 3.1 (i) for d = 1 and s < 1/2. We consider the following one-dimensional function defined on [0,1]: We repeat experiments presented in Section 4.1, but only for the case d = 1: getting an accurate evaluation of (4.5) becomes really expensive for higher dimension. We consider three values for γ: 0.625, 0.75 and 0.875. Empirical results are depicted in Figure 5. As expected, the normality is rejected for all values of N (for more readability we keep the dot representation for the points). Surprisingly, the results remains very satisfactory. When γ = 0.75, 0.875, the points remain well-aligned and the estimated slopes are in good agreement with the theoretical −1 − 2s/d. The case γ = 0.625 is less convincing but still very satisfactory, since h 0.625 is a very irregular function.

A comparative numerical study.
We perform similar experiments with several designs: crude Monte-Carlo with uniform proposal, the DPP proposed by Bardenet and Hardy [3] based on Legendre polynomials available in DPPy toolbox [10], stratified sampling [26], maximin Latin Hypercube Design of lhs R package and Sobol and Halton sequences (randtoolbox R package). The Dirichlet DPP and these different designs will respectively be denoted by DirDPP, crude MC, BHDPP, stratified MC, maximinLHS, Sobol, Halton in the following.
We first compare the DirDPP and BHDPP designs since they seem very close in terms of construction. Experiments presented in Section 4 (in the case s > 1/2) for the DirDPP are conducted for the BHDPP design (only for d = 1, . . . , 6 to save time). Thus, Figures 6-8 should be compared to Figures 2, 3 and 5. Regarding  Figures 2 and 6, as expected, the results are quite similar for the f bump function, in terms of slope and confidence interval. However, we observe that for d > 4, fluctuations around regression lines as well as confidence intervals are larger for the BHDPP design than for the DirDPP design. Moreover, Gaussianity seems to be reached less often. Surprisingly, the BHDPP design gives satisfactory results even for non-differentiable integrands (including for function (4.5) according to Figure 8), as long as d < 4. But results deteriorate for d ≥ 4. The estimated slopes are even larger than −1 which corresponds to crude MC designs. Bardenet and Hardy [3] did not consider the problem of estimating ι-dimensional integrals. Their point process was not designed to address such a problem, and this is indeed revealed by Figure 7. When projections of a 6-dimensional BHDPP design is used to estimate ι-dimensional integrals with ι = 1, . . . , 5, the estimated slopes are at best close to -1 and at worse larger than -1. In all cases, the estimated slopes for the BHDPP design are far from −1 − 1/6 which on the contrary is reached by the DirDPP design.
Next, we compare the DirDPP design with more standard alternatives. For each design (crude MC, strat MC, maximinLHS, Sobol, Halton), we have reproduced exactly the same experiment which has led to Figures 2-4, except that due to the nature of the maximinLHS, Sobol, Halton designs, we have computed logarithms of the empirical mean squared errors (MSE) in terms of log(N ). Of course, plotting the MSE for the DirDPP design instead of its variance in Figures 2-4 would have led to the same conclusions, since estimates are unbiased (which turns out to be also the case for crude MC, strat MC and BHDPP designs). Moreover, similar figures summarizing empirical bias and variance have also been produced and are available on demand. Figure 9 is a tentative summary of Figures 2-4 and Figures 10-24 presented in Appendix C for the sake of completeness. It reports confidence intervals for the slope estimates of linear regressions of the logarithms of the MSE in terms of log(N ) for all designs and tests functions. All these figures are quite complex to comment. Our goal is not to point out which method is best for a particular function, or a particular sample size N , etc. Instead, we are willing to investigate if a method is "stable" (in terms of rate of convergence, or over a set of test functions, etc) in order to estimate d-dimensional integrals or ι-dimensional integrals based on d-dimensional designs.
Crude MC designs react as expected with a slope of -1 in all situations. Stratified MC designs seem to be quite competitive with respect to DirDPP designs, when one estimates d-dimensional integrals, even for non-differentiable integrands. Nevertheless, we observe that the rate of convergence is not exactly the same over the set of tests functions (e.g. −3 for the bump function and -2.5 for f 0.25 when d = 1). Properties of stratified MC designs are significantly deteriorated when we estimate ι-dimensional integrals (see Figures 16 and 21). Estimates tend to converge but with a lower rate of convergence than estimates based on the DirDPP design.
MaximinLHS designs are quite impressive when d = 1, . . . , 6, 10 for functions like f mixcos , f 0.75 and f 0.25 but present some huge failures for functions f bump and f mixcosprod . We have no explanation for the difficulties of this design to estimate  integrals of functions of the form (3.12). Similar behaviours are observed when we estimate ι-dimensional integrals (see Figures 17 and 22). This apparent lack of convergence for the MSE is essentially explained by the fact that the empirical bias does not converge with N .
Sobol and Halton designs exhibit similar behaviours. Looking at Figures 13  and 14, we observe that these designs lead to very small MSE (often much smaller than the ones obtained with other methods). However, what is striking when we examine Figure 9 is the huge variability of slope estimates. Indeed, in Figures 13 and 14, the (logarithms of) MSE have a high variance in particular for functions f mixcos , f 0.75 and f 0.25 (i.e. for functions of the form (3.13)). For these functions, results are difficult to explain as the dimension does not seem to affect the rate of convergence. Empirical results are even more variable when we are interested in estimating ι-dimensional integrals (see  and Figures 23-24). There is no clear rate of convergence over the set of test functions.
Overall, from this (clearly not exhaustive) empirical study, we do not claim that the DirDPP design outperforms other designs investigated. For a specific sample size N , a specific function, a specific dimension, we may find a much better method. However, the DirDPP design is the only one (with the baseline crude MC design) which exhibits a clear and expected rate of convergence for any function, any dimension d or any "sub-dimension" ι.

Conclusion
In this paper, we build a specific class of repulsive point process and use its realization as quadrature points to estimate integrals. The resulting Monte-Carlo estimator is unbiased with a variance scaling as O(N −1−(2s∧1)/d ) when the integrand belongs to H s ([0, 1] d ). Our methodology and results have the interest to be applied to non differentiable functions, an assumption which is often considered for other methods such as grid-based methods, RQMC, scrambled nets [26]. We also show that the initial configuration of points can be used to estimate ιdimensional integrands (ι = 1, . . . , d) with the same efficiency than in dimension d. To complete this work on a theoretical side, it could be interesting to inves- tigate the rate of convergence to normality in some metric (e.g. Wassertstein or Kolmogorov-Smirnov). We have also left behind an important practical question: how to estimate σ 2 (f ) or its approximation given by (3.11) and the matrix Σ p given by (3.17)? Obviously, we do not want to assume known the Fourier expansion of f nor to use external simulations to estimate such quantities which would make the current work definitely useless. Estimating σ 2 (f ) or Σ p using only the realization of the Dirichlet DPP appears to be highly challenging. Due to the intrinsic nature of this projection DPP, strategies like bootstrap, subsampling or leave-one out type procedures did not lead to a consistent estimate of the asymptotic variance. This problem is definitely an interesting perspective.
We consider in this paper integrals on [0, 1] d (or rectangles). Addressing the similar question for more complex domains or for improper integrals is of great interest. Similarly, considering integrands defined on manifolds such as torus or spheres is definitely an interesting perspective. We could, for example, take advantage of the works by [21] which presents several spherical-valued DPP models. Crude Monte-Carlo or stratified Monte-Carlo designs can be improved by using antithetic or local antithetic methods, see Owen [25,26]. An idea of such improvements relies on the fact that U ∼ U (0, 1) and 1 − U have the same distribution. Such an idea could also be applied to our design: This is again an advantage of the homogeneity of the Dirichlet DPP. We have not deeply studied these extensions but leave them for future works. To give some flavor however, we can easily prove that the estimator for any u ∈ [0, 1] d , satisfies Theorem 3.1 and Corollary 3.1, with the same rate of convergence, the same assumptions on f I . However, the asymptotic constant is given by (3.15) where |f I (j)| 2 is replaced by Re(f I (j)) 2 : This necessarily leads to a variance reduction.
In this paper we did not exploit the form of the integrand f . "Is it possible to improve the rate of convergence by designing an appropriate DPP which exploits the form of f ?" seems to be a difficult but again interesting question. We believe this is not feasible with the Dirichlet DPP proposed in this paper but the question remains open.
Another natural extension would be to consider a general target product measure m on [0, 1] d , instead of the uniform distribution. The naive approach would be obviously to estimate [0,1] ι f I m I dx with Dirichlet DPP. But, as Bardenet and Hardy [3] did, the measure m could be used to build a dedicated DPP. According to the proofs in our work, it seems reasonable to think that Theorem 3.1 and Corollary 3.1 could be extended to any product measure m such that an orthonormal basis {ϕ k (·)} k of L 2 ([0, 1] d , m) satisfies a separability property ϕ k (x)ϕ l (x) = ϕ k+l (x).
One of the limitation of the methodology lies in the simulation of a continuous DPP. Current algorithms have a computational cost O(N 3 ) and are based on chain rules, where each point location is sampled with a rejection method, with evaluations of the acceptance ratio can be very costly. As the number of generated points increases, passing the rejection step becomes harder, and this step makes the sampler even more costly. A personal communication from [18] shows that in dimension one this rejection step can be significantly improved with a proposal adapted to the Fourier kernel. This result lets us hope that this proposal could be extended to the d-dimensional Dirichlet DPP. Finally, it is worth reminding the reader that we use homogeneous point patterns which can be used to estimate any integral. Therefore the 2500 replications from the Dirichlet DPP, for N = 50, 100, . . . , 1000 and d = 1, . . . , 6 can be used to evaluate any integral and are freely available upon request.

A.1. Proof of Proposition 3.1
Proof. Since X is a projection kernel, in (1.8), λ j = 0 or 1, ρ Y = N . Moreover, the trick is that the Fourier basis satisfies φ j (u)φ k (u) = φ j−k (u) for any j, k ∈ Z d and u ∈ B. These facts reduce (1.8) to whereby (3.6) is deduced using Parseval's identity. Let us focus now on the second term of the right-hand side of (3.6). When d = 1, it is quite well-known that The result is easily deduced using the fact that E N is a rectangular subset of Z d .