A Weighted k-Nearest Neighbor Density Estimate for Geometric Inference

Motivated by a broad range of potential applications in topological and geometric inference, we introduce a weighted version of the k - nearest neighbor density estimate. Various pointwise consistency re-sults of this estimate are established. We present a general central limit theorem under the lightest possible conditions. In addition, a strong approximation result is obtained and the choice of the optimal set of weights is discussed. In particular, the classical k -nearest neighbor estimate is not optimal in a sense described in the manuscript. The proposed method has been implemented to recover level sets in both simulated and real-life data.


Introduction and motivations
The problem of recovering topological and geometric information from multivariate data has attracted increasing interest in recent years.
Taking a statistical point of view, data points are usually considered as independent observations drawn according to a common distribution µ on the space R d .In this stochastic framework, the problem of estimating the support of µ and its geometric properties (e.g., dimension, number of connected components, volume) has been widely studied during the last two decades (for a review of the literature, see for instance Cuevas and Rodríguez-Casal [15], and Biau, Cadre, and Pelletier [5]).There are set-ups in which sets or boundaries are to be estimated from samples drawn from within and outside the set itself.Various models exist in this respect-it is the point of view taken, e.g., by Cuevas, Fraiman, and Rodríguez-Casal [14].Korostelev and Tsybakov [30] provide detailed analysis of the rate of convergence of various set or boundary estimation errors under several scenarios.Many approaches are rooted in kernel methods, placing a small weight, often in a carefully selected ball of small radius, around data points inside the support set (Devroye and Wise [19]).Object estimation can also be attacked by methods that are based on level sets of densities.Cuevas, Fraiman, and Rodríguez-Casal [14] provide a consistent estimate of the Minkowski content that turns out to also provide an estimate of the boundary of the studied object.However, this boundary estimate does not come with topological guarantees.Approaches like principal curves and surfaces (Hastie and Stuetzle [28]), multiscale geometric analysis (Arias-Castro, Donoho, and Huo [1]) and density-based methods (Genovese, Perone-Pacifico, Verdinelli, and Wasserman [26]) have been successfully used to detect "simple" geometric structures such as one-dimensional curves or filaments in data corrupted by noise.
On the other hand, taking a slightly different and nonstochastic point of view, purely geometric methods have also been developed to infer the geometry of general compact subsets of R d from point cloud data.In this context Chazal, Cohen-Steiner, and Lieutier [9,10] and Chazal, Cohen-Steiner, and Mérigot [11,13] argue that the study of distance functions to the data provides precise and robust information about the geometry of the sampled objects.
While statistical methods provide efficient tools to deal with noisy data, they do not however come with strong guarantees on the inferred geometric properties or are restricted to the inference of geometrically simple objects such as pieces of smooth curves or topologically trivial manifolds.On the other hand, purely geometric methods offer strong guarantees but, since they do not integrate any statistical model, they usually rely on sampling assumptions that cannot be met by data corrupted by noise.
In the so-called distance function approach, the unknown object is estimated by the union of balls centered on the data points or, equivalently, by an appropriate sublevel set of the distance function to the data.Thanks to classical properties of distance functions, this procedure has revealed fruitful both from the statistical (Devroye and Wise [19], Biau, Cadre, Mason, and Pelletier [4]) and geometric (Chazal, Cohen-Steiner, and Lieutier [10]) points of view.Unfortunately, the distance function approach obviously fails when the observations are corrupted by "background noise" (as shown for example in Figure 1 and Figure 2), or when the observed data is not exactly drawn from a unique distribution µ but from the convolution of µ with a noncompactly supported noise measure.Different solutions have been proposed to get rid of this problem.These solutions generally rely on statistical models assuming a strong knowledge on the nature of the noise.For example, Niyogi, Smale, and Weinberger [39] show that it is possible to infer the homology of a low-dimensional submanifold M ⊂ R d from data uniformly sampled on M and corrupted by a Gaussian noise in the normal direction to M .In lower dimensions, motivated by applications ranging from the inference of networks of blood vessels to the characterization of filaments in distributions of galaxies, the detection of filamentary structures has been carefully considered.For example, Genovese, Perone-Pacifico, Verdinelli, and Wasserman [26] address this problem using the gradient of a density estimate to exhibit filamentary structures in data; lately, these authors also proposed in [27] an asymptotically consistent geometric approach for the same problem but in dimension 2. Unfortunately, when the data is corrupted by outliers, the latter method requires a-usually tricky-preprocessing step consisting in identifying and eliminating these outliers.
Recently, Chazal, Cohen-Steiner, and Mérigot [12] proposed a framework to bridge the gap between the statistical and geometric points of view.The approach of the authors avoids the cleaning step by replacing the usual distance function by another distance-like function which is robust to the addition of a certain amount of outliers.This function extends the notion of distance functions from compact sets to probability distributions, allowing to robustly infer geometric properties of the distribution µ using independent observations drawn according to a distribution µ "close" to µ.In this framework, the closeness between probability distributions is assessed by a Wasserstein distance W p defined by x − y p π(dx, dy) , where Π(µ, µ ) is the set of probability measures on R d × R d that have marginals µ and µ , . is a norm and p ≥ 1 is a real number (see Villani [50]).
In the approach of Chazal, Cohen-Steiner, and Mérigot [12], given the probability distribution µ in R d and a parameter 0 ≤ m ≤ 1, the notion of distance to the support of µ is generalized by the function δ µ,m : x ∈ R d → inf{r > 0 : µ(B(x, r)) > m}, where B(x, r) denotes the closed ball of center x and radius r.To avoid trouble due to discontinuities of the map µ → δ µ,m , the distance function to µ with parameter m 0 is defined by where 0 < m 0 ≤ 1 is a real number.The function d µ,m 0 shares many properties with classical distance functions that make it well-suited for geometric inference purposes.In particular, if the space P(R d ) of probability measures in R d is equipped with the Wasserstein distance W 2 and the space of real-valued functions is equipped with the supremum norm, then the map This property ensures that W 2 -close measures have close sublevel sets in R d .The function d 2 µ,m 0 is also seen to be semiconcave (that is x → x 2 −d 2 µ,m 0 (x) is convex, see Petrunin [40] for more information on geometric properties of semiconcave functions).This regularity property implies that d 2 µ,m 0 is of class C 2 almost everywhere, thus ensuring strong regularity properties on the geometry of the level sets of d µ,m 0 .Using these properties, Chazal, Cohen-Steiner, and Mérigot prove, under some general assumptions, that if µ is a probability distribution approximating µ, then the sublevel sets of d µ ,m 0 provide a topologically correct approximation of the support of µ (see [12][ Corollary 4.11]).Figure 1 and Figure 2 below show some examples of level sets of distance functions to a measure illustrating this result.

Connection with density estimation
Let X 1 , • • • , X n be independent identically distributed observations with common distribution µ on R d , equipped with the standard Euclidean norm . .The empirical measure µ n based on This empirical distribution is known to provide a suitable approximation of µ with respect to the Wasserstein distance (Bolley, Guillin, and Villani [7]).Moreover, given a sequence of positive integers {k n } such that 1 ≤ k n ≤ n, for m n = k n /n the function d µn,mn takes the simple form where X (j) (x) is the j-th nearest neighbor to x among X 1 , • • • , X n and ties are broken arbitrarily.Thus, In other words, the value of d 2 µn,mn at x is just a weighted sum of the squares of the distances from x to its first k n nearest neighbors.Assume now that the common probability measure µ of the sequence is absolutely continuous with respect to the Lebesgue measure on R d , with a probability density f .In this context, it turns out that the function d µn,mn is intimately connected to both the geometric properties of µ and to the density f .To see this, observe that in the regions where f is high, the function d µn,mn takes small values while in the regions where f is low, d µn,mn takes larger values.Observe also that the function δ µn,mn is just the distance function to the k n -th nearest neighbor, i.e., δ µn,mn (x) = X (kn) − x .These remarks motivate the introduction of the density estimate fn of f , defined by .
From the geometric inference point of view, the estimate fn allows to infer both the geometric properties of the support of µ and the geometry of the upperlevel sets of f , i.e., sets of the form {x ∈ R d : f (x) ≥ t}.A more general density estimate is given, for p > 0, by where, for j = 1, and ν is a given probability measure on [0, 1] with no atom at 0. To avoid trivial complications in the proofs, we assume throughout the document that p = d, leaving the reader the opportunity to adapt the results to the case p = d.Therefore, we will consider the following generalized version of the k-nearest neighbor density estimate of Fix and Hodges [23] and Loftsgaarden and Quesenberry [31], defined by This estimate is but a special case of a larger class of estimates proposed by Rodríguez [43] and Rodríguez and Van Ryzin [41,42] that combine kernel smoothing with nearest neighbor smoothing.
In particular, we look at pointwise consistency, and derive a general central limit theorem under the lightest possible conditions.In addition, a strong approximation is obtained as well.The asymptotic mean square error, when optimized with respect to k n , reduces to a product of three factors, n −4/(d+4) (the rate of convergence), a factor depending upon the local shape of f (which involves the trace of the Hessian), and a factor depending upon ν only.The third factor is invariant for all x, and should thus be optimized once and for all-at least if performance is measured by local mean square error.Attempts at such an optimization are rare-we will optimize ν within a large parametric class of weight functions that also play a role in the optimal shapes of kernels in kernel density estimates as established in the classical papers of Bartlett [2] and Epanechnikov [22].Using simulations, we finally show in Section 4 the suitability of the class of estimates in a number of important applications.
For the sake of clarity, proofs are postponed to Section 5.
Our approach is close in spirit to the one of Samworth [46], who derived asymptotic expansions for the excess risk of a weighted nearest neighbor classifier and found the asymptotically optimal vector of weights.In contrast, we are considering density estimation and our optimization is quite different.

Some asymptotic results
Our goal in this section is to establish some pointwise asymptotic properties of the estimate f n .To this aim, we note once and for all that for any ρ > 0, all quantities of the form are finite and positive.Moreover, for ρ ≥ 1, as k n → ∞, The symbol λ stands for the Lebesgue measure on R d .We start by establishing the weak pointwise consistency of f n .
Our next result states the mean square consistency of the generalized knearest neighbor estimate.
The asymptotic normality of the original Loftsgaarden and Quesenberry knearest neighbor estimate has been established by Moore and Yackel in [37].These authors proved that for f sufficiently smooth in a neighborhood of x, where N is a standard normal random variable.This result was obtained for the generalized k-nearest neighbor estimate by Rodríguez [43,45].The novelty in Theorem 3.3 is that it is a strong approximation result, which is interesting by itself and implies the classical central limit theorem.We let Γ(.) be the gamma function and denote by [∂ 2 f (x)/∂x 2 ] the Hessian matrix of f at x, which is given by Notation tr(A) stands for the trace of the square matrix A.
Let also Then, if N denotes a standard normal random variable, and if k n → ∞ and . Theorem 3.3 can be used when k n is at its optimal value (about n 4/(d+4) ).It can also be used when k n is below this optimal value, that is and above it, when The usual k-nearest neighbor estimate has v 2 = 1.Consequently, for this estimate, provided k n → ∞ and k n /n 4/(d+4) → 0 as n → ∞.This is precisely the asymptotic normality result of Moore and Yackel [37].Note however that our condition k n /n 4/(d+4) → 0 is less severe than the condition k n /n 2/(d+2) → 0, which is imposed by these authors at the price of a less stringent smoothness condition on f however.In any case, consistency (3.1) deals with the uninteresting case of a k n which is suboptimal (that is, the bias in f n (x) is negligible with respect to the variance term).Note, in addition, that analogues of Theorem 3.1 (yet with different rates) may be obtained in the somewhat degenerated situations where f (x) and/or c(x) = 0 by pushing the asymptotic expansions.
Theorem 3.3 has also interesting consequences for the analysis of the mean square error development of the estimate f n .Let • be the nearest larger integer (or ceiling) function.
Theorem 3.4 With the notation and conditions of Theorem 3.3, if k n → ∞ and k n /n → 0, then whenever f (x) > 0. Thus, for such x, assuming that c(x) = 0 and for the choice .
For the standard k-nearest neighbor estimate, and thus, as n → ∞, for the optimal choice .
For the original Loftsgaarden and Quesenberry's density estimate, the optimization problem of k n with respect to the mean square error criterion is thoroughly discussed in Fukunaga and Hostetler [25].The best possible asymptotic quadratic error for the generalized k-nearest neighbor estimate, as given in Theorem 3.4, consists of a product of three factors: the first factor depends upon n and d only, and is the general rate of convergence.The second factor depends upon f (x) and c(x), and we have no control over that.The third factor is which depends directly on our measure ν.It is clear that we would like to minimize this factor.It is more convenient to work with a power of it, namely For the Dirac measure at 1 (the classical k-nearest neighbor estimate), we note that v = b = 1, so A = 1.
The first important consequence of the factorization is that the optimal ν is the same at all points x with f (x) > 0 and c(x) = 0.A similar property has been noted a long time ago for the form of the best positive kernel in the Parzen-Rosenblatt density estimate (see Bartlett [2] and Epanechnikov [22] for d = 1 and Deheuvels [16] for d > 1).
The functional optimization of A seems daunting, but one can make a good guess in the following manner.Assume that we let ν be the measure of U α , where U is uniform on [0, 1], and α ≥ 0 is a parameter.The case α = 0 again yields the atomic measure at 1. Repeatedly using the fact that E[U s ] = 1/(s + 1), simple calculations show that .
The behavior of A as a function of α (see Figure 3 below) is best captured by studying log A and taking derivatives.This reveals that A(0) = 1, that A decreases initially to reach a minimum at α = d/2, that the minimal value is and that A increases again to a limiting value given by 2 The latter limit is ≤ 1 for d ≥ 2. The value of A at the minimum is a strictly increasing function in d with limit tending to one (see Figure 4).
In other words, except for d = 1, any value of α > 0 is better than α = 0: the classical k-nearest neighbor estimate is actually the worst possible in this entire class of natural weights!Furthermore, for any d ≥ 1, by taking α = d/2, we obtain an improvement over the classical k-nearest neighbor estimate that is most outspoken for d = 1.It is interesting that for d = 1, ν is the law of √ U , which has a triangular (increasing) density on [0, 1].Rodríguez [43] obtained a similar result for the best weights in a weighted k-nearest neighbor rule for density estimation.For d = 2, ν is the uniform law on [0, 1] 2 : so it is best to weigh all of the k-nearest neighbors equally.We do not know whether U d/2 is in fact the optimal value.Note also that in this paper, we are fixing the distance metric which determines the ranking among neighbors.There is ample evidence, especially from practicing nonparametric statisticians, that in moderate and high dimensions, a lot can be gained by considering variable metrics, such as Euclidean metrics applied after performing a locally affine (or matrix multiplication) transformation, and letting the data select to some extent the metric.This strategy was already present in the work of Short and Fukunaga [47] and Fukunaga and Flick [24].Kernel estimates are better adapted to take advantage of local second order or Hessian structure.Combinations of nearest neighbor and kernel estimates that incorporate these ideas are being considered by a subset of the authors in [45].

Numerical illustrations
A series of experiments were conducted in order to compare the performance of our weighted estimate with that of the standard k-nearest neighbor estimate of Fix and Hodges [23] and Loftsgaarden and Quesenberry [31].We provide numerical illustrations regarding both the geometric and convergence properties of the estimates.On the geometrical side, particular attention was paid on the comparison of the geometry of the level sets of the various estimates.To this aim, we investigated three synthetic data sets, sampled according to known probability density models, and one real-life data set.These four data sets are denoted D1, D2, D3 and D4 hereafter and are described below.
D1: A two-dimensional data set of 5,000 points, randomly sampled according to a bivariate standard normal distribution (see Figure 5, left).D4: A three-dimensional data set of 50,000 points, randomly sampled according to a standard normal distribution in R 3 .
The computing program codes were implemented in C++ using the Approximate Nearest Neighbor library developed by Mount and Arya [38].Due to the efficiency of this library, all computations took a few seconds to a few minutes on a standard laptop.The programs are available upon request from the authors.
For the two-dimensional data sets D1, D2 and D3, the density estimates were first evaluated on the vertices of a regular 2000×2000 grid, and the level sets were extracted using the contour function in Matlab.Figures 5, 6 and 7 depict, for each of these data sets, some level sets of the standard k-nearest   Regarding the three-dimensional data set D4, we also used the uniform weights for the generalized estimate and meshed some level sets of the estimates using an implicit surface mesher from the C++ CGAL library [8] (Figure 11).
An important issue regarding k-nearest neighbor-based density estimates is how to select the number k n of neighbors.In our experiments, this pa-   rameter was selected using a standard leave-one-out cross validation method performing on the (global) As this procedure does not come with any theoretical guarantee (as far as we know), we also evaluated the errors between the cross-validated estimates and the true density when it was known (i.e., for data sets D1, D2 and D4).In all cases, the selected value of k n appears to be very close to the optimal oracle k n , which minimizes the L 2 norm between the targeted density and the estimate.The selected values of k n are shown in Table 1, together with the L 2 norm between the estimates (respectively, the oracles) and the true density (models D1, D2 and D4 only).A general observation is that, in all cases, the classical k-nearest neighbor estimate provides a pretty poor geometric approximation of the level sets of the true density.In the 2D case, these sets are very jagged and contain spurious small connected components (Figures 7, 8, 9 and 10), thereby preventing any direct inference on the geometry of the level sets of the true density (such as, for instance, their connectedness).On the other hand, the level sets of the generalized estimate are much smoother and, for values that are not too close to the critical values of the true density, they appear to be homeomorphic to the ones of the target.

Data
In the 3D situation, it is noteworthy that the level sets of the weighted estimate are smoother than the ones of the standard k-nearest neighbor (Figure 11).For technical reasons, the surface mesher was only able to mesh the component of the level sets containing the origin of R 3 .As a consequence, the spurious small components of the standard k-nearest neighbor estimate (similar to the ones depicted in the 2D figures) are not represented on Figure 11, left.
Finally, in order to illustrate the convergence properties of the generalized k-nearest neighbor estimate, we generated, for each n ∈ {1 • 10 For each x and each n, we first computed the average value of [f n (x) − f (x)] 2 over the 100 data sets of size n and then averaged the outcomes over the 900 points of the grid G. Figure 12 shows the results as a function of n: The red curve corresponds to the weighted estimate while the blue one refers to the unweighted one.In both cases, we see that the estimates converge to the true density with a smaller error for the generalized k-nearest neighbor estimate.) and the red one to the weighted estimate (uniform weights and k n = n 2/3 ).

Proofs
Throughout this section, we let B(x, r) be the closed ball in R d of radius r centered at x and denote by µ the probability measure associated with the density f .The collection of all x with µ(B(x, ε)) > 0 for all ε > 0 is called the support of µ.We denote it by supp µ and note that it may alternatively be defined as the smallest closed subset of R d of µ-measure 1.

Two basic lemmas
We will make repeated use of the following two lemmas. where Furthermore, for all positive integers r, sup . standard exponential random variables, then Devroye [17,Chapter 5]).Let G n+1 be the gamma (n + 1) random variable n+1 j=1 E j .Then, by the central limit theorem, where N is a standard normal random variable.Thus, by an application of the delta method, we obtain and the first part of the lemma follows by setting To prove the second statement, observe that by the Cauchy-Schwarz inequality, The first term in the above product is O(n r/2 ) (see, e.g., Willink [52]) whereas the second one is infinite for n + 1 ≤ 2r and O(1/n r ) otherwise.It follows that sup . standard exponential random variables and let {k n } be a sequence of positive integers.
where ν is a given probability measure on [0, 1] with no atom at 0. Fix ρ ≥ 1. where ) and, for all positive integers r, In addition, letting and then, on an appropriate probability space, there exists a standard normal random variable N such that where ) and, for all positive integers r, Proof of Lemma 5.2 Denote by .the ceiling function and observe that, since ν has no atom at 0, where we set Note that S tkn is a sum of i.i.d.zero mean random variables.Therefore, tk n ρ ν(dt) By an application of Donsker's and continuous mapping theorems (see, e.g., van der Vaart and Wellner [49]), as where and, for all positive integers r, Similarly, where ) and, for all positive integers r, Consequently, where ) and, for all positive integers r, The conclusion of the first assertion follows by observing that, for ρ ≥ 1, The proof of the second assertion requires a bit more care.We already know that where With respect to the second term on the right-hand side of (5.1), we have ν(dt) .
Clearly, letting we may write As a consequence, setting and using the fact that 0 ≤ (1 − Φ(t)) 2 ≤ 1 is a monotone nonincreasing function, a Riemannian argument shows that Therefore, we obtain via the Komlós, Major, and Tusnády strong approximation result (see Komlós, Major, and Tusnády [29] and Mason [34]) that, on the same probability space, there exists a sequence E 1 , E Using (5.2), we deduce that, for positive constants λ 2 , λ 3 and all n large enough, Thus, writing where for all positive integers r.Plugging this identity into (5.1)leads to the desired result.

Proof of Theorem 3.1
Let x be a Lebesgue point of f , that is, an x for which As f is a density, we know that λ-almost all x satisfy the property given above (see for example Wheeden and Zygmund [51]).
Assume first that f (x) > 0. Fix ε ∈ (0, 1) and find δ > 0 such that Let F be the (continuous) univariate distribution function of are uniform [0, 1] order statistics, we have in fact the representation jointly for all j.Thus, provided U (j) ≤ F (δ d ), .
(5.4) Therefore, on the event [U (kn) ≤ F (δ d )], the generalized k-nearest neighbor estimate may be written as follows: But, by Lemma 5.1, we know that where E 1 , • which goes to 1 in probability as k n → ∞ according to the first statement of Lemma 5.2.
If f (x) = 0, two cases are possible.Suppose first that x belongs to the complement of supp µ.Then, clearly, for some positive constant C and all n ≥ 1, almost surely, But f (x) = 0 and, using the condition k n /n → 0, we deduce that f n (x) → f (x) in probability as n → ∞.
If x belongs to supp µ, the proof is similar to the case f (x) > 0. Just fix ε ∈ (0, 1) and find δ > 0 such that sup

Proof of Theorem 3.2
Choose x a Lebesgue point of f .Assume first that f (x) > 0 and fix ε and δ as in (5.3).Note that we have, for some positive constant C 1 and all n ≥ 1, order statistics, we may write, using inequality (5.4), for some positive constant C 2 .It is known (see, e.g., Devroye [17, Chapter 1]) that U ( kn/2 ) is beta distributed, with parameters k n /2 and n + 1 Next, if f (x) = 0, two cases are possible.If x belongs to the complement of supp µ, then, clearly, for some positive constant C 5 and all n ≥ 1, If x belongs to supp µ, the proof is similar to the case f (x) > 0. Just fix ε ∈ (0, 1) and find δ > 0 such that This shows the first part of the theorem.One proves, with similar arguments, that there exists a positive constant C 6 such that, for all n large enough, Consequently, for all n large enough, the sequence {f 2 n (x)} is uniformly integrable and, since f n (x) − f (x) → 0 in probability (by Theorem 3.1), this implies E [f n (x) − f (x)] 2 → 0 as n → ∞ (see, e.g., Billingsley [6, Chapter 5]).

Proof of Theorem 3.3
Fix x ∈ R d and assume that f has derivatives of second order at x, with f (x) > 0. Let f (y)dy be the univariate distribution function of X − x .We may write, by a Taylor-Young expansion of f around x, where the symbol T denotes transposition and [∂f (x)/∂x] and [∂ 2 f (x)/∂x 2 ] are a vector and a matrix given by ∂f In view of the symmetry of the ball B(x, u), the first term in (5.5) is seen to be zero.Using the linearity of trace and relations tr(AZZ T ) = Z T AZ, tr(AB) = tr(BA) for matrices A, B and vector Z, (5.5) becomes Letting z = (y − x)/u, that maps B(x, u) to B(0, 1), and using a hyperspherical coordinate change of variables (see, e.g., Miller [35,Chapter 1]), the integral inside the trace term simplifies to B(0,1) where Id is the d × d identity matrix.Thus, denoting by Γ(.) the gamma function and recalling that, for the Euclidean norm, we obtain Consequently, Let F be the univariate distribution function of are uniform [0, 1] order statistics, using the representation jointly for all j, we may write Thus, putting all the pieces together, we obtain We see in particular that, for all positive integers r and all n large enough, the sequence {k and using the identity 1 1 + t = 1 − t + t 2 1 + t valid for t = −1, we finally get  An immediate adaptation of the proof of Theorem 3.2 shows that for some positive constant C 1 and all n large enough, It follows, coming back to identity (5.7), that for some positive constant C 2 and all n large enough, Eζ 4 n ≤ C 2 .
Therefore, using identity (5.8) and the Cauchy-Schwarz inequality, for all n large enough,

Figure 1 :
Figure 1: Left: A two-dimensional data set where 50% of the points have been uniformly randomly sampled on the union K of a circle and a segment, and 50% have been uniformly randomly sampled in a square containing K. Right: Three different level sets of the distance function d µ,m 0 , where µ stands for the empirical measure based on the observations and m 0 = 0.02, showing that the topology of the union of the circle and the segment can be correctly inferred.

Figure 2 :
Figure 2: Left: A three-dimensional set of points uniformly sampled on the surface of a mechanical part to which 10% of points sampled uniformly at random in a box enclosing the mechanical part have been added.Right: An isosurface of the distance function d µ,m 0 to the empirical measure based on the observations.This isosurface successfully recovers the topology of the mechanical part.In this example, m 0 = 0.003.

Figure 3 :
Figure 3: This figure shows A versus α for 1 ≤ d ≤ 4. Note that A exceeds 1 only for d = 1 and α large enough.

Figure 4 :
Figure 4: This figure shows the minimal value of A and the limiting value of A versus d.Note that both are nearly indistinguishable for d ≥ 10.

Figure 6 :
Figure 6: Left: Data set D2. Middle and right: Level sets of the standard (middle) and weighted (right) k-nearest neighbor estimates corresponding to level values 0.06, 0.085, 0.10, 0.14 and 0.21.

Figure 8 :
Figure 8: A zoom on the level sets of Figure 7.

Figure 9 :
Figure 9: Left: Plot of the 0.06-level sets of the true density (green), the standard (blue) and weighted (red) k-nearest neighbor estimates for the data set D1. Middle and right: A zoom on the level sets of Figure 5, showing that the unweighted estimate does not allow to correctly infer the connectedness of the level sets of the true density.

Figure 10 :
Figure 10: Left: Plot of the 0.085-level sets of the true density (black), the standard (blue) and weighted (red) k-nearest neighbor estimates for the data set D1. Middle and right: A zoom on the level sets of Figure 6.Here again, the unweighted estimate does not allow to correctly infer the connectedness of the level sets of the true density.

Figure 11 :
Figure 11: Level sets of the standard (left) and weighted (right) k-nearest neighbor estimates for the data set D4.As in the two-dimensional case, the weighted estimate provides much smoother level sets.

Figure 12 :
Figure 12: Estimation of E[f n (x) − f (x)] 2 averaged over 900 points of the regular grid G, as a function of n.The blue curve corresponds to the standard k-nearest neighbor estimate (k n = n 2/3 ) and the red one to the weighted estimate (uniform weights and k n = n 2/3 ).

Eζ 2 n ≤ C 2 P.P,
V (kn) > ε 0We know that V (kn) is beta distributed, with parameters k n and n + 1 − k n (see, e.g., Devroye[17, Chapter 1]).Thus, by Markov's inequality,P V (kn) > ε 0 = P V V (kn) > ε 0 = o k n n 4/d as k n /n → 0. In conclusion, as k n → ∞ and k n /n → 0,and squaring and taking the expectation on both sides of identity (5.7) leads to the desired statement.The last assertion of Theorem 3.4 is clear.

Table 1 :
Cross-validated selected k n and associated L 2 errors.
4, 2 • 10 4 , • • • , 15 • 10 4 }, 100 data sets of n points randomly sampled according to a standard normal distribution in R 2 .These observations were used to estimate E[f n (x) − f (x)] 2 at 900 points x distributed on a 30 × 30 regular grid G on [−3, 3] × [−3, 3], where f n was either the standard k-nearest neighbor density estimate or the generalized estimate with uniform weights.We took 2 , • • • of i.i.d.standard exponential random variables and a sequence N 1 , N 2 , • • • of standard normal random variables such that, for positive constants C 1 and λ 1 and for all x ≥ 0, • • , E n are i.i.d.standard exponential random variables and ζ n → 0 in probability.Consequently, ) and ζ n8 = o P (1).Besides, for all positive integers r, E|ζ n7 | r < ∞ and lim sup