Adaptive density estimation on bounded domains under mixing conditions

In this article, we propose a new adaptive estimator for multivariate density functions deﬁned on a bounded domain in the framework of multivariate mixing processes. Several procedures have been proposed in the literature to tackle the boundary bias issue encountered using classical kernel estimators. Most of them are designed to work in dimension d = 1 or on the unit d -dimensional hypercube. We extend such results to more general bounded domains such as simple polygons or regular domains that satisfy a rolling condition . We introduce a speciﬁc family of kernel-type estimators devoid of boundary bias. We then propose a data-driven Goldenshluger and Lepski type procedure to jointly select a kernel and a bandwidth. We prove the optimality of our procedure in the adaptive framework, stating an oracle-type inequality. We illustrate the good behavior of our new class of estimators on simulated data. Finally, we apply our procedure to a real dataset.


Introduction
In this paper we study the classical problem of the estimation of a density function f : D → R where D ⊂ R d from the observation of n identically distributed random variables X 1 , . . ., X n not necessarily independent.In several modern applications, D is a known bounded domain whose shape can be quite complex (e.g.geographical region).Moreover these data often present a dependence structure that has to be taken into account.Our main objective is to propose a new kernel-type estimation procedure to address these two points and to establish results ensuring the optimality of this procedure from a theoretical point of view.
It is well known that classical kernel-type estimators present a severe bias when the density function f does not vanish near the boundary of D. Several procedures have been proposed in the literature to tackle this issue in the univariate setting.Schuster (1985), Silverman (1986) and Cline and Hart (1991) studied the reflection of the data near the boundary.Marron and Ruppert (1994) proposed a previous transformation of the data.Müller (1991), Lejeune and Sarda (1992), Jones (1993), Chen (1999) and Botev, Grotowski, and Kroese (2010) proposed to construct kernels which take into account the shape of the support of the density.These procedures can be easily adapted to the hypercube D = [0, 1] d using tensorization methods.Nevertheless, their generalization to more complex domains has not been studied and seems more tedious.Only few methods are designed to work with more general domains D ⊂ R d .In this context, Müller and Stadtmüller (1999) proposed a generic method to construct kernels (of arbitrary orders) whose shape depends on both the estimation point x ∈ D and the bandwidth h > 0. These kernels are solutions, for each x and h, of tricky continuous least squares minimization problems.Marshall and Hazelton (2010) proposed an alternative approach dedicated to the case d = 2 that allows one to construct more tractable procedures.
Few papers study in-depth the theoretical properties of these multivariate procedures.In the context of independent and identically distributed observations and twice differentiable density functions, we point out Bouezmarni and Rombouts (2010) who study the behavior of Beta kernels with a cross-validation selection procedure and Marshall and Hazelton (2010) who study the pointwise behavior of their estimators for a fixed bandwidth.The results stated in Müller and Stadtmüller (1999) could be used to prove pointwise minimax results over arbitrary isotropic Hölder classes.To our best knowledge only Bertin, El Kolei, and Klutchnikoff (2018) proved adaptive results for integrated risks over D = [0, 1] d (in the sense that a single estimation procedure achives the minimax rate of convergence over a large scale of regularity classes).They introduced a new family of kernel density estimators that do not suffer from the boundary bias problem and they proposed a data-driven procedure based on the Goldenshluger and Lepski (see Goldenshluger and Lepski, 2014) approach that jointly selects a kernel and a bandwidth.
In the present paper, we aim at considering more general bounded domains of R d such as the disk, simple polygons -that may be used to define geographical regions -or more regular domains satisfying the rolling condition (see Arias-Castro and Rodríguez-Casal, 2017).We assume that the observations are extracted from a stationary β-mixing process.In this framework, we introduce a new family of kernel density estimators, and we propose a data-driven selection procedure inspired by Goldenshluger and Lepski (2014) and Bertin et al. (2018) to jointly select a kernel and a bandwidth.Our main contribution consists in the statement of an oracle-type inequality in L 2 -norm that allows us to prove the adaptivity (in the minimax sense) of our procedure over a large scale of Hölder classes without bound on the smoothness parameter.We also conduct simulation studies on the disk in the independent case and the dependent case where observations come from a diffusion process with reflection.This type of process is of particular interest since it can be used to model population dynamics in bounded geographical areas (see Cholaquidis, Fraiman, Mordecki, and Papalardo, 2016).Within this context, we consider a real data set corresponding to the trajectories of elephants in Hwange National Park in Zimbabwe.
The paper is organized as follows.We first introduce the statistical framework in Section 2. Section 3 is devoted to the description of our new estimation procedure and its theoretical properties are stated in Section 4. Simulation studies are shown in Section 5, as far as the application to the real data in Section 6.The proofs are postponed to Section 7.

Statistical framework
In what follows, we consider a strongly stationary and β-mixing process X = (X i : i ∈ Z) that lies into a bounded domain D ⊂ R d .We assume that the common marginal distribution of the random variables X i is absolutely continuous with respect to the Lebesgue measure restricted to D and we denote by f : D → R the density of this distribution.We aim at finding an accurate estimation procedure for f based on the observations X 1 , . . ., X n , where n ∈ N.
In the rest of this section, we present the main assumptions we make on the law of the process X and on the geometry of the domain D. We also present the adaptive minimax framework used to measure the statistical performances of the estimators.

Assumptions on the law of the process
The assumptions on the process X are divided into two parts: the assumptions on the marginal density f on the one hand and the assumptions on the dependence structure of the process on the other hand.
To state the assumptions on the marginal density, we recall the definition of a Hölder ball on the domain D. Let γ and L be two positive numbers.A function f : D → R belongs to the Hölder class H D (γ, L) if the following conditions are fulfilled: The absolute regularity (or β-mixing) condition of a process was introduced by Volkonskii and Rozanov (1959) and attributed there to Kolmogorov.
For the convenience of the reader we recall the definition of the β-mixing coefficients of the strictly stationary process X.For each k ≥ 1∈ N, we define: where the supremum is taken over all pairs of finite partitions {U i : i ∈ I} and {V j : j ∈ J} of the probability space Ω which are respectively measurable with respect to σ(X s : s ≤ 0) and σ(X s : s ≥ k).
Assumption 2 Let c be a positive number and set 0 < ρ < 1.We assume that the process X is strictly stationary and β-mixing at a geometric rate.More precisely, for any k ≥ 1 we have β(k) ≤ cρ k .
Assumption 3 Set f ∞ > 0. For any k ≥ 1 the distribution of the random pair (X 1 , X k+1 ) admits a density f k : D 2 → R (with respect to the Lebesgue measure restricted to Without loss of generality we assume that the bounds f ∞ that appear in Assumptions 1 and 3 are the same if the two assumptions hold simultaneousely.

Geometric assumptions on the domain
In this section, we first state technical assumptions on the domain D and we offer some examples that satisfy these conditions.
Remark that, since our goal is to consider the estimation on bounded domain, the existence of R > 0 is not a restrictive condition.Assuming that D is connected is also not restrictive since the same estimation procedure could be applied on each connected component.Finally D is assumed to be open to ensure that the ambiant dimension d is the correct one.
Assumption 5 There exist 0 < r < 1 and a finite family While Assumption 5 seems quite restrictive, it is satisfied by several domains as illustrated in the following examples.The first one was considered in Bertin et al. (2018) for independent data.

Example 1 (Hypercubes)
We consider the case where D = (0, 1) d and we define for u ∈ D and x ∈ D: In this case we have r = 1/2 and κ = 2 d .To state the second example, we denote by D r = {x ∈ R 2 : |x| 2 ≤ r} the Euclidean ball with radius r > 0. Note that this example is of particular interest since it is used throughout our simulation study in Section 5.
However, generic classes of open subsets of R 2 can be proven to satisfy Assumption 5 as in the two following examples (See Appendices A and B for the proofs).
Example 3 (Rolling conditions) Set r 0 > 0. The domain D is called r 0 -regular if, for any 0 < r ≤ r 0 , the ball D r rolls freely in both D and D c .That is, for each a ∈ ∂D, there exists x r a and y r a in R d such that : Such regularity condition on D is well-known and widely used in statistics (see Arias-Castro and Rodríguez-Casal, 2017, and references therein).The following figure illustrates this condition: Finally, since in practical situations the boundary of a domain can be approximated by a simple polygonal path (as for geographical areas), the following example (see Appendix B for more details) seems to be of prime interest.

Example 4 (Simple polygons)
The interior of any simple polygon satisfies Assumption 5.

Framework
Under Assumptions 1 and 4, the marginal density f belongs to the space of squared integrable functions that map D into R.This set, denoted by L 2 (D) is endowed with its natural Hilbertian norm: To measure the performance of an estimator, we consider its risk defined by: Let F be a subset of L 2 (D).The maximal risk of f over F is defined by: whereas the minimax risk over F (see Tsybakov, 2009) is: where the infimum is taken over all the estimators.An estimator whose maximal risk is asymptotically bounded, up to a multiplicative factor, by φ n (F) is called minimax over F. Such an estimator is well-adapted to the estimation over F but it can perform poorly over another functional space.
The problem of adaptive estimation consists in finding a single estimation procedure that is simultaneously minimax over a scale of functional classes.More precisely, given a family {F λ : λ ∈ Λ} of subsets of L 2 (D), the goal is to construct f * such that R n (f * , F λ ) is asymptotically bounded, up to a multiplicative constant, by φ n (F λ ) for any λ ∈ Λ.One of the main tools to prove that an estimation procedure is adaptive over a scale of functional classes is to prove an oracle-type inequality that guarantees that this procedure performs almost as well as the best estimator in a rich family of estimators.Ideally, we would like to have an inequality of the following form: where { fη : η ∈ H} is a family of estimators well-adapted to our problem: for any λ ∈ Λ, there exists η(λ) such that fη(λ) is minimax over F λ .However, in many situations, ( 1) is relaxed and we prove a weaker inequality of the type: where Υ 1 and Υ 2 are two positive constants and R * n (f, η) is an appropriate quantity to be determined that can be viewed as a tight upper bound on R n ( fη , f ).Inequalities of the form (2) are called oracle-type inequalities.

Statistical procedure
We propose to construct a specific family of kernel-type estimators which can tackle with the boundary bias problem encountered using classical kernel estimators (see Bertin et al., 2018, and references therein for more details).The construction of this family is linked with geometrical assumptions on the domain D. Before presenting the ideas behind the construction of the family of estimators, we introduce some notations used throughout the paper.

A first family of kernel estimators
For any bandwidth h > 0, the normalized multivariate kernel K h is defined by: Equipped with these notations we note that, under Assumption 5 and for h small enough, for any x ∈ D, K h • A x (u − x) = 0 as soon as u / ∈ D. This property implies that there is no loss of mass near the boundary (as in the classical convolution kernel).This leads us to consider the following estimators: The family of estimators { fK,h } K,h indexed by all bandwidths h > 0 and univariate kernels K is well-adapted to our problem (see theoretical properties stated in Section 4).

A second family of estimators
Even if the previous family of estimators is well-adapted to our framework, it is a very large family since we do not impose any restriction on the kernel nor on the bandwidth.This implies that selecting in a data driven way an element in this family is difficult from both theoretical and practical points of view.In this section we construct a one-parameter subfamily which consists of predefined well-chosen pairs of kernels and bandwidths.To do so, we define, for any m ∈ N ∪ {0}, the kernel where where [•] denotes the integer part.Using the notation introduced in (3) we define, for any ∈ N the estimator: To obtain a finite collection of estimators we impose an additional restriction on h( ) by considering only ∈ L n where Here, the bandwidths h n and h n are defined, for given c 1 > 0 and c 2 > 0, by Remark 1 Note that the kernel K m is of order m and satisfies with A = 1 and B = 2.More precisely it can be proven that Bertin et al. (2018) for more details.More generally, any sequence of kernels K m of order m satisfying (5) can be used instead of the family defined by (4).

Selection rule
Let τ > 0. For any , ∈ L n we define the following majorants: where ∧ denotes the minimum between and and Γ(K, h) is defined by . Now, we define: + with x + = max(x, 0) denotes the positive part of x.The final estimator, f is then defined by This selection rule is an adaptation of the so-called Goldenshluger-Lepski (GL) method which consists in selecting, in a data-driven way, an estimator that realizes the trade-off (6) beetwen B and M , estimators of respectively the bias term and the stochastic term.Finding tight majorants is the keypoint of this procedure.Let us briefly comment on the form of the majorant M ( ).The ideal majorant would be However the term Γ K mn( ) , h( ) depends on the unknown density f and is bounded, see ( 12), by the deterministic constant √ κ K d 2 which can be rough in some situations (in Example 1, κ = 2 d ).Recall that this is due to the specific form of our boundary kernels.To circumvent this drawback we replace this quantity by a simple estimator Γ K mn( ) , h( ) .For technical reasons that appear in the proof of Lemma 7 we add the small corrective term τ K d 2 .Finally, the extra √ 2 factor allows us to take into account the dependence structure of the observations using classical Berbee coupling techniques (see Berbee, 1979;Comte, Prieur, and Samson, 2017) in the proofs of Theorem 3. Note also that the final procedure depends on the parameters τ , c 1 and c 2 that can be chosen from a theoretical point of view as small as desired.

Main results
In the following, the minimax properties of the estimators fK,h and f are studied.In Section 4.2, we state an oracle inequality and adaptive properties of the selection procedure f .

Minimax results
In Proposition 1, we study the bias term and the variance term of the estimator fK,h .As a consequence we deduce in Proposition 2 that the family { f : ∈ L n } is well-adapted to our problem.
Proposition 1 Assume that Assumptions 1, 2, 4 and 5 are fulfilled.Set γ > 0 and L > 0. Let K be a kernel of order greater than or equal to γ .Then, there exist two absolute constants C(R, γ) and C(ρ, c, f ∞ ) such that for any h > 0 we have: With the additional Assumption 3, we get Moreover, in both cases, taking h = n −1/(2γ+d) the estimator fK,h reaches the minimax rate φ n (γ, L) n −γ/(2γ+d) .
Proposition 2 is a direct consequence of Proposition 1 and that the order of the kernel of f γ is larger than γ.

Adaptive results
Theorem 3 (Oracle-type inequality) Assume Assumptions 1, 2, 3, 4 and 5 are fulfilled.We have Note that this oracle-type inequality is of the form (2) with: which is, up to a multiplicative constant, an upper bound of the risk of f since, using (9): .
This allows us to prove that the procedure is adaptive over a large scale of Hölder spaces.
Theorem 4 (Adaptive estimation) Assume Assumptions 1, 2, 3, 4 and 5 are fulfilled.For any γ > 0 and L > 0, we have: The proof of this theorem relies on Proposition 2 as well as the fact that, see the proof for more details).This result is obtained without any restriction on the smoothness parameter γ > 0. This follows from the simultaneous choice of a bandwidth and a kernel and differs from the usual bandwidth selection procedure where the kernel remains fixed.

Simulation study
In this section, we study the performance of our procedure using simulated data in Sections 5.1 (in the independent framework) and 5.2 (for β-mixing data).More precisely, in Section 5.1, we aim at estimating several densities defined on the disk that exhibit various behaviors near the boundary of their support.In Section 5.2, we study the stationary density of a two dimensional reflected Langevin diffusion on the disk.In each situation, we study the accuracy of our procedure as well as usual kernel estimators, calculating empirical risks using M = 500 Monte-Carlo replications.In the following, we detail our simulation scheme and comment on the obtained results.

Comparison of estimation procedures
We consider a set H of 39 equally spaced bandwidths between 0.05 and 1 with step 0.025.For each h ∈ H we define the usual kernel estimator f h and fh which is a modified version of our estimator defined in Section 3.2 that we call boundary estimator.These estimators are defined as follows: fK,h (x) otherwise .
Here K = I [0,1] and the transformation A x is the one in Example 2. We consider only uniform kernels to allow an easier comparison of the performances of the different estimators.
In Figure 2 and Figure 3  • the Goldenshluger Lepki procedure based on the boundary estimators (called boundary_ GL in the boxplots), • the oracle of usual Kernel estimators (called usual in the boxplot) defined by for = arg min h∈H ISE( fh ), • the Goldenshluger Lepki procedure based on the usual Kernel estimators (called usual_ GL in the boxplots).In almost all the cases (except for n = 500, a = 2, b = 1 and c = 1), the oracle based on boundary estimators as well as the GL procedure based on boundary estimators outperform the oracle and the GL procedures based on usual kernels.Note also that the ratio between the MISE of the GL procedure and the one of the oracle, both based on boundary estimator is around 1.6 (see Table 1) which means the the GL procedure mimics quite well the oracle.

Diffusion
Let us consider in this section the following two dimensional reflected Langevin diffusion on the disk: (x, y) ∈ ∂D defined the normal vector to the boundary of the domain D = D r , W 1 and W 2 are two independent standard Brownian motions and L the local time on ∂D.The process Z t = (X t , Y t ) t≥0 is well known as Brownian motion with drift.This process is ergodic and exponential Φ-mixing.The invariant measure is absolutely continuous with respect to the Lebesgue measure restricted to the disk D r .The invariant density writes as follows: Note that this density has most of its mass concentrated in the centre of the disk.In the simulations we fix β = 2 and we run as in Cattiaux, León, and Prieur (2017) the Euler reflected scheme introduced in Bossy, Gobet, and Talay ( 2004).As we are interested in the stationary regime, we throw away the first runs of the scheme.As in Section 5.1, we simulate M = 500 Monte-Carlo replications with sample size n ∈ {500, 1000, 2000, 5000}.We obtain that for all sample size, the GL procedure based on boundary estimators outperforms the GL procedure based on usual kernels (see Figure 4 for more details).
Note that, similar type of processes (defined on a more complex domain) have been used to model population dynamics, in particular the home-range and the core-area of an animal based on tracking data (see Cholaquidis et al., 2016).Roughly speaking, the authors consider that the process under observation behaves in the interior of a geographical area D ⊂ R d like an ordinary Brownian motion with drift, and reflects (normally) at the boundary ∂D of D. Under regularity assumptions on the drift and geometric constraints on the support D, Cholaquidis et al. (2016) prove the existence of a unique stationary distribution, absolutely continuous with respect to the Lebesgue measure on D. They also prove that the process is geometrically ergodic.Then the trajectories of the animals allow estimating the density of the invariant probability measure.In Section 6 we study a real data set of this type.

Application to a real data set
In this section our method is applied to a database obtained from the study "African Elephant (Migration) Chamaillé-Jammes Hwange NP" that resides in We are interested in the spatial density of the whole set of elephants into the park.As in Cholaquidis et al. (2016) we assume that the animal movements can be modeled by a reflected diffusion.This allows us to guarantee that the process that modeled the trajectories of the elephants satisfies the main assumptions of our model.Moreover, nine elephants were removed from the initial database as their behavior seems atypic.Note that the boundary is very simple on the east side (it consists mainly in straight lines) and more complex on the west side.However most of the observations lie in the east side.This is especially true for observations that are close to a boundary.Our methodology requires the knowledge of this boundary.Moreover, the more complex the boundary, the more difficult our method is to implement.With this in mind we propose to approximate the boundary of HNP by a simple polygon that adjusts quite well the real boundary in the east part whereas a more rough approximation is used in the west part.Finally, a simple polygon with only 11 edges was chosen.Figure 6 represents both the real boundary (in yellow) and the approximating simple polygon (in blue):

Estimation procedure
In this section we follow the same strategy as in the simulation study: our boundary estimators are only used in a "neighborhood" of the boundary of the east side of HNP while classical kernel estimators are used otherwhere.More precisely, for each bandwidth h, we define two specific zones: the northeast zone C 1 and the southeast zone C 2 .Examples of such zones are represented, for different values of h, in Figure 7. Now, our procedure consists in selecting in a data-driven way a bandwidth among a family of 30 bandwidths equally spaced between 0.02 and 0.31.For each bandwidth h and any point x in HNP, the estimator fh (x) is defined in two different ways depending on the position of x.If x belongs to C 1 ∪ C 2 and if the distance of x to the boundary is less than h, then the boundary estimator is used and we define: 19 where Otherwise the usual kernel estimator defined by ( 11) is used.
Since our selection procedure depends on a tuning parameter τ that appears in the majorant, we propose to use a slope heuristic to determine this constant.We refer the reader to Baudry, Maugis, and Michel (2012) for more details.Note that our penalty M ( ) depends on τ via the quantity τ K 2 2 /( √ nh( )).As a consequence, for each tuning parameter τ , our procedure selects the quantity ˆ (τ ) and the slope heuristic consists in defining τ min as the largest slope of the function τ → M ( ˆ (τ )).As it can be observed in Figure 8, we obtain τ min = 0.8 so we implement our GL procedure with τ = 2τ min = 1.6.The bandwidth which is selected then equals to h = 0.07.

Results
Figure 9 represents the final density estimation plotted on a spatial regular grid (with step δ = 0.01) within the real boundary.The result we obtain seems to confirm that the density of elephants is related to the placement of artificial water pumps.It would be interesting to investigate further that issue with the owners of this database.

Preliminary notations and technical lemmas
For any ∈ L n , define where Γ( ) = Γ(K mn( ) , h( )) with Γ(K, h) defined by ( 7).For any kernel K and bandwidth h > 0, define: We denote The following lemma, whose proof is postponed in Section 7.5, provides some properties of ξ K,h .

Lemma 5
We have Moreover under Assumptions 1, 2, 3, 4 and 5, there exists an absolute constant C(c, ρ, f ∞ ) such that we have and for h ≤ h n and n large enough we also have: The proof uses the Bousquet inequality from Boucheron, Lugosi, and Bousquet (2004) (see also Theorem 12.5 in Boucheron, Lugosi, and Massart (2013)).It also makes use of ingredients in Lemma 4.2 in Viennet (1997).
Both Lemmas are recalled in C.

Proof of Proposition 1
Proof of bound (8) Set f ∈ H D (γ, L) and let K be a kernel of order Using a Taylor expansion we obtain where C(R, γ) is a positive constant that depends only on R and γ.
Proof of Bound (9) We have Then using Lemma 9 in Appendix C, we deduce that there exists a sequence of random variables 10) is a direct consequences of ( 13) and ( 14) of Lemma 5. Finally choosing h = n −1/(2γ+d) , the estimator fK,h reaches the minimax rate n −2γ/(2γ+d) .

Proof of Theorem 3
To prove Theorem 3, we use Berbee's coupling method as in Viennet (1997) (proof of Proposition 5.1) and proof of Theorem 1 of Comte et al. (2017).
We have (see Comte et al., 2017) For any kernel K and any bandwith h, let ξ * (1) n where C 0 , C 1 , C 2 positive constant that depends only on δ, ρ, c, κ and f ∞ .
We are now able to prove the oracle inequality.Set ∈ L n .Using the triangular inequality we get: Note that if ≥ , using the definitions of B( ) and M ( ), we easily obtain: Last inequality comes from the definition of .The same bound remains valid if ≤ .This implies: It remains to bound each term of the right hand side of this inequality.
1 We have: Last line follows from Bound (10) of Proposition 1.
2 Using triangular inequality and ( 12), last term can be bounded by: 3 Remark that, using the triangular inequality, we have: This implies that E B 2 ( ) is bounded by: 2(E( max where for i = 1, 2 ∆ . It remains to study the terms of the right hand side of (20).

Study of ∆ (i) n
First define for m ∈ N, the kernel and consider, for ∈ L n the event . Now, remark that, on the event D we have: Using (21), we obtain for Now, using triangular inequality (22) Assume that here and after we have n ≥ n 0 (c, ρ, f ∞ , c 1 , κ, τ ).We consider: where Last line follows from Lemma 6.Using Lemma 7 with δ = τ /(2 √ 2) and x = 0 we obtain: where α n = A d (log n + 3/2) Bd .This follows from the fact that the kernel K mn( ) satisfies ( 5) combined with the expression of m n ( ).Now, splitting the integral into two terms, depending on the position of x with respect to α n , we obtain: where C 3 is a positive constant that depends on τ , ρ, c, κ, A, B and f ∞ .This implies that there exists C 4 is a positive constant that depends on τ , ρ, c, κ, A, B and f ∞ such that we have: It remains to bound . Then, using Lemma 5 and Lemma 6, we have: Now using that K mn( ) 2 ≥ 1 and (5), we deduce that Now remark that: Moreover it is easily seen that, for n large enough: 4Bd) , Then we obtain for n large enough: . This implies, using Lemma 7 with δ = 1 and x = 0: We also have: Thus we have: This implies that we have that Thus there exists a positive constant C 6 such that as q n = [log n] 2 .Using ( 22), ( 23) and ( 25), we can conclude for i = 1, 2 that 5. Final step Using ( 16) and proceeding as in ( 24) we obtain that and taking ( 20), ( 26) and ( 27) together we obtain: with C 7 and C 8 are positive constants depending on τ , c, ρ, f ∞ , κ, A, B.

Proof of Theorem 4
Set α > 0 and L > 0 and let f ∈ H D (α, L).Define 0 = (2α + d) −1 log n where for x ∈ R x is the smallest integer greater or equal to x.Since 0 belongs to L n for n large enough, Theorem 3 implies that we only have to bound the two following quantities: max .
Defining h 0 = n − 1 2α+d we have: This, implies: Using ( 29) and ( 30), we obtain that 2α+d) , where C depends on A, B and α.It remains to bound the bias term.Set ≥ 0 .Remark that: . We consider two cases.
Case 1 Assume that m n ( ) ≥ α .Using Proposition 1 we obtain that where C depends on L, α, R, τ , κ, A and B.
Case 2 Assume now that m n ( ) < α .Define α = m n ( ) + 1 ≤ α and note that there exists L that depends on α, L and D such that f ∈ H D (α , L ).Using Proposition 1 we have: where C depends on L, α, R, τ , κ, A and B. Theorem follows.

Proof of lemma 5
We have We The variance term can be easily bounded since Concerning the covariance terms, on one hand we bound, using Assumption 3 the term c(k) by On the other hand, using Lemma 9 in Appendix C, we get: Combining ( 33) with (34), we obtain that for any a ∈ (0, 1) Thus, using (31), ( 32) and ( 35) with a = 1/2, we get which concludes the proof of ( 13) and ( 14).Finally (15) follows from previous inequality and from the fact that h ≤ h n .
Using these notations and the triangle inequality we remark that, for any 0 < r ≤ r 0 /2 and any x ∈ C(r), we have: Step 2 Define A r = 0, r √ 2 2 2 and consider the vector ν = (−1, −1)/ √ 2 ∈ S 1 .Note that, for any v ∈ S 1 , there exists a unique rotation ρ v (•) ∈ GL 2 (R) such that ρ v (ν) = v.In the following figure the sets A r and a + ρ v (A r ) are represented for some point a in R 2 and vector v in S 1 .
Now, assume that for any a ∈ ∂D r = {(x, y) ∈ R 2 : x 2 + y 2 = r 2 }, the outward-pointing unit vector is denoted by v(a).Then a + ρ v(a) (A r ) ⊆ D r and moreover a + ρ v (A r ) ⊆ D r as soon as v is a unit vector such that the angle θ between v and v(a) is less than or equals to θ 0 = arccos( √ 2/4) − π/4.This is the case if |v(a) − v| ≤ 2 sin(θ 0 /2) = δ 0 .This results generalizes to any ball centered at c ∈ R 2 in such a way: Step 4 Set 0 < r < r 0 δ 0 < r 0 /2 and define for any x ∈ C(r/6) the set which is an open neighborhood of x.Note also that, using the property proved at the previous step, for any y ∈ V (x) we have: B Simple polygons satisfy Assumption 5 Chazelle (1991) proved that any simple polygon can be triangulated (in linear time with respect to the number of vertices).In view of this result, our problem boils down to the case of a triangle.Let ABC be a triangle and let P ε and Q ε such that AP ε Q ε and ABC are similar with homothetic ratio 1 − 2ε > 0. Then for any point x in AP ε Q ε , the parallelogram generated by the vectors u = ε − − → AB and v = ε −→ AC with its origin located at x is included into the triangle ABC.This property is illustrated in the following figure.For p ≥ 1, p such that 1 p + 1 p = 1, assume k≥0 (k + 1) p−1 β(k) < ∞.Let φ a measurable function such that E φ 2p (χ 0 ) < ∞.We have Simulation schemeWe consider a family of densities {f a,b,c } (a,b,c)∈(0,+∞) 3 such that f a,b,c : D 1 → [0, +∞) is defined for any x and y byf a,b,c (x, y) = g a,b (x 2 + y 2 )g c,catan2(y, x) 2π mod 1 with g a,b the usual density of the beta distribution with parameters a and b.Equivalently, (X, Y ) is distributed as ( √ R cos Θ, √ R sin Θ) with R and Θ independent with respective distributions Beta(a, b) and Beta(c, c).Four densities, plotted in Figure 1, are studied: Case 1 a = 1, b = 1, c = 1.The density f 1 := f 1,1,1 is in fact the uniform density on the disk.Case 2 a = 1.5, b = 1, c = 1.The density f 2 := f 1.5,1,1 takes small values in the centre of the disk and its values increase slowly as one gets close to the boundary.Case 3 a = 2, b = 1, c = 1.The density f 3 := f 2,1,1 takes very small values in most of the disk and only on a small strip of the boundary takes larger values.Case 4 a = 1.5, b = 1, c = 3.The density f 4 := f 1.5,1,3 , contrary to the others, is not invariant by rotations.The mass is more important near the point (−1, 0).

Figure 1 :
Figure 1: Representation of the four densities we compare, plotting boxplots of the ISE for n ∈ {500, 1000, 2000, 5000} observations and M = 500 replications, the behavior of • the oracle of the boundary estimators (called boundary in the boxplots) defined by for = arg min h∈H ISE( fh ),

Figure 2 :
Figure 2: Boxplots of the integrated squared error (ISE) on the disk for the models described by Cases 1 to 4, and sample sizes equal to 500 and 1000 for the four estimators boundary, boundary_GL, usual and usual_GL.

Figure 3 :
Figure 3: Boxplots of the integrated squared error (ISE) on the disk for the models described by Cases 1 to 4, and sample sizes equal to 2000 and 5000 for the four estimators boundary, boundary_GL, usual and usual_GL.

Figure 4 :
Figure 4: Boxplots of the integrated squared error (ISE) of the GL procedure based on the boundary estimator for β = 2 and different sample sizes, n = 500, 1000, 2000 and 5000.

Figure 5 ,
Figure 5, obtained from c OpenStreetMap contributors, represents the boundary of HNP as well as the n = 17501 GPS positions of the elephants.

Figure 5 :
Figure 5: Map of HNP with the n = 17501 GPS positions of the elephants.

Figure 6 :
Figure 6: Real and approximating boundary of HNP.

Figure 9 :
Figure 9: The estimated repartition of the elephants within HNP.

y
+ ρ n(a(x)) (A r ) ⊆ D Finally {V (x) : x ∈ C(r/6)} is a covering of the compact set C(r/6) by open sets.Thus, there exists a finite number of points x 1 , . . ., x N such that {V (x n ) : n = 1, . . ., N } is also a covering of C(r/6).The result follows easily.

Table 1 :
Mean and standard-deviation for the ratio between the ISE of the GL procedure and the one of the oracle, both for the boundary estimator, computed from M = 500 replications.The results are provided for different sample sizes: n = 500, 1000, 2000 and 5000.