Rates of convergence for robust geometric inference

Distances to compact sets are widely used in the field of Topological Data Analysis for inferring geometric and topological features from point clouds. In this context, the distance to a probability measure (DTM) has been introduced by Chazal et al. (2011) as a robust alternative to the distance a compact set. In practice, the DTM can be estimated by its empirical counterpart, that is the distance to the empirical measure (DTEM). In this paper we give a tight control of the deviation of the DTEM. Our analysis relies on a local analysis of empirical processes. In particular, we show that the rates of convergence of the DTEM directly depends on the regularity at zero of a particular quantile fonction which contains some local information about the geometry of the support. This quantile function is the relevant quantity to describe precisely how difficult is a geometric inference problem. Several numerical experiments illustrate the convergence of the DTEM and also confirm that our bounds are tight.


Introduction and motivation
The last decades have seen an explosion in the amount of available data in almost all domains of science, industry, economy and even everyday life. These data, often coming as point clouds embedded in Euclidean spaces, usually lie close to some lower dimensional geometric structures (e.g. manifold, stratified space, . . . ) reflecting properties of the system from which they have been generated. Inferring the topological and geometric features of such multivariate data has recently attracted a lot of interest in both statistical and computational topology communities.
Considering point cloud data as independent observations of some common probability distribution P in R d , many statistical methods have been proposed to infer the geometric features of the support of P such as principal curves and surfaces Hastie and Stuetzle (1989), multiscale geometric analysis Arias-Castro et al. (2006), density-based approaches Genovese et al. (2009) or support estimation, to name a few. Although they come with statistical guarantees these methods usually do not provide geometric guarantees on the estimated features.
On another hand, with the emergence of Topological Data Analysis (Carlsson, 2009), purely geometric methods have been proposed to infer the geometry of compact subsets of R d . These methods aims at recovering precise geometric information of a given shape -see, e.g. Chazal and Lieutier (2008); Niyogi et al. (2008); Chazal et al. (2009a,b). Although these methods come with strong topological and geometric guarantees they usually rely on sampling assumptions that do not apply in statistical settings. In particular, these methods can be very sensitive to outliers. Indeed, they generally rely on the study of the sublevel sets of distance functions to compact sets. In practice only a sample drawn on, or close, to a geometric shape is known and thus only a distance to the data can be computed. The sup norm between the distance to the data and the distance to the underlying shape being exactly the Hausdorff distance between the data and the shape, we see that the statistical analysis of standards TDA methods boils down to the problem of support estimation in Hausdorff metric. This last problem has been the subject of much study in statistics (see for instance Devroye and Wise, 1980;Cuevas and Rodríguez-Casal, 2004;Singh et al., 2009). Being strongly dependent of the estimation of the support in Hausdorff metric, it is now clear why standard TDA methods may be very sensitive to outliers.
To provide a more robust approach of TDA, a notion of distance function to a measure (DTM) in R d has been introduced by Chazal et al. (2011b) as a robust alternative to the classical distance to compact sets. Given a probability distribution P in R d and a real parameter 0 ≤ u ≤ 1, Chazal et al. (2011b) generalize the notion of distance to the support of P by the function whereB(x, t) is the closed Euclidean ball of center x and radius t. For u = 0, this function coincides with the usual distance function to the support of P . For higher values of u, it is larger than the usual distance function since a portion of mass u has to be included in the ball centered on x. To avoid issues due to discontinuities of the map P → δ P,u , the distance to measure (DTM) function with parameter m ∈ [0, 1] and power r ≥ 1 is defined by It was shown in Chazal et al. (2011b) that the DTM shares many properties with classical distance functions that make it well-adapted for geometric inference purposes (see Theorem 4 in Appendix A). First, it is stable with respect to perturbations of P in the Wasserstein metric . This property implies that the DTM associated to close distributions in the Wasserstein metric have close sublevel sets. Moreover, when r = 2, the function d 2 P,m,2 is semiconcave ensuring strong regularity properties on the geometry of its sublevel sets. Using these properties, Chazal et al. (2011b) show that, under general assumptions, ifP is a probability distribution approximating P , then the sublevel sets of dP ,m,2 provide a topologically correct approximation of the support of P . The introduction of DTM has motivated further works and applications in various directions such as topological data analysis Buchet et al. (2015a), GPS traces analysis Chazal et al. (2011a), density estimation Biau et al. (2011), deconvolution Caillerie et al. (2011) or clustering Chazal et al. (2013) just to name a few. Approximations, generalizations and variants of the DTM have also been recently considered in Guibas et al. (2013); Buchet et al. (2015b); Phillips et al. (2014). However no strong statistical analysis of the DTM has not been proposed so far.
In practice, the measure P is usually only known through a finite set of observations X n = {X 1 , . . . , X n } sampled from P , raising the question of the approximation of the DTM. A natural idea to estimate the DTM from X n is to plug the empirical measure P n instead of P in the definition of the DTM. This "plug-in strategy" corresponds to computing the distance to the empirical measure (DTEM). It can be applied with other estimators of the measure P , for instance in Caillerie et al. (2011) it was proposed to plug a deconvolved measure into the DTM.
For m = k n , the DTEM satisfies where x − X n (j) denotes the distance between x and its j-th neighbor in {X 1 , . . . , X n }. This quantity can be easily computed in practice since it only requires the distances between x and the sample points. Let us introduce Chazal et al. andΔ n,m,r (x) := d Pn,m,r (x) − d P,m,r (x). The aim of this paper is to study the deviations and the rate of convergence of Δ n,m,r (x). The functional convergence of the DTEM has been studied recently in Chazal et al. (2014a) where it is shown that the parametric convergence rate in 1/ √ n is achieved under reasonable assumptions. In this paper we address the question of the convergence in probability and the rate of convergence in expectation of Δ n,m,r (x), both from an asymptotic and non asymptotic point point of view.
The stability properties of DTM with respect to Wasserstein metrics suggests that this problem could be addressed using known results about the convergence of empirical measure P n to P under Wasserstein metrics. This last problem has been the subject of many works in the past (Rachev and Rüschendorf, 1998;del Barrio et al., 1999del Barrio et al., , 2005 and it is still an active field of research (Fournier and Guillin, 2013;Dereich et al., 2013). Contrary to the context of TDA with the standard distance function, where stability result provide optimal rates of convergence (see Chazal et al. (2015)), we show in the paper that Wasserstein stability does not lead to optimal results for the DTM. Moreover, such a basic approach does not provide a correct understanding of the influence of the parameter m (see Appendix A).
We adopt an alternative approach based on the observation that the DTM only depends on a push forward measure of P on the real line. Indeed, the DTM can be rewritten as follows: where F −1 x,r is the quantile function of the push forward probability measure of P by the function x − · r (see appendix B.1 for a rigorous proof). Then we have where F −1 x,r,n is the empirical quantile function of the observed distances (to the power r): x − X 1 r , . . . , x − X n r . We study the convergence of Δ n,m,r (x) to zero from both an asymptotic and non asymptotic points of view. An asymptotic approach means that we take k = k n := mn for some fixed m and we study the mean rate of convergence to zero of Δ n, kn n ,r (x). A non asymptotic approach means that n is fixed and then the problem is to get a tight expectation bound on Δ n, k n ,r (x). In particular, we are particularly interested in the situation where k n is chosen very close to zero. This situation is of primary interest since it corresponds to the realistic situation where we use the DTM to clean the support from a small proportion of outliers.
Our results rely on a local analysis of the empirical process to compute tight deviation bounds of Δ n, k n ,r (x). More precisely, we use a sharp control of a supremum defined on the uniform empirical process. Such local analysis has been successfully applied in the literature about non asymptotic statistics, for instance Mammen et al. (1999) obtain fast rates of convergence in classification. For a more general presentation of these ideas in model selection, see Massart (2007) and in particular Section 1.2 in the Introduction of this monograph. We show that the rate of convergence of Δ n, k n ,r (x) directly depends on the regularity at zero of F −1 x,r . This quantile function appears to be the relevant quantity to describe precisely how difficult is a geometric inference problem. The second contribution of this paper is relating the regularity of the quantile function F −1 x,r to the geometry of the support, establishing a link between the complexity of the geometric problem and a purely probabilistic quantity. In particular, our results apply to the case of a probability measure supported on a compact manifold of dimension b, when the measure is absolutely continuous with respect to the Hausdorff measure on the manifold, with a lower bounded density. In this context our results show that E(|Δ n, k n ,2 (x)|) 1 √ n [ k n ] 1/b−1/2 . Our main results, the deviations bounds and the rate of convergence of Δ n, k n ,r (x) derived from the local analysis, are given in Section 2. These results are given in terms of the regularity of the quantile function F −1 x,r . Generally speaking, it is not easy to determine what is the regularity of the quantile function F −1 x,r given a distribution P and an observation point x ∈ R d . Indeed, it depends on the shape of the support of P , on the way the measure P is distributed on its support and on the position of x with regards to the support of P . This is why, in the results given in Section 2, the assumptions are made directly on the quantile functions F −1 x . Section 3 is then devoted to the geometric interpretation of these results and their assumptions. In Section 4, several numerical experiments illustrate the convergence of the DTEM and also confirm that our bounds are sharp. Rates of convergence derived from stability results of the DTM are presented in Appendix A. Proofs and background about empirical processes and quantiles can be found in the appendices also.
Notation. Let a ∧ b and a ∨ b denotes the minimum and the maximum between two real numbers a and b. The Euclidean norm on R d is · . The open Euclidean ball of center x and radius t is denoted by B(x, t). For some point x and a compact set K in R d , the distance between x and K is defined by K − x := inf y∈K y − x . The Hausdorff distance between two compact sets K and K is denoted by Haus(K, K ). A probability distribution on R defined by a distribution function F is denoted by dF . The quantile function F −1 of dF is defined by By monotonicity, the quantile function F −1 can be extended in 0 and at 1 by setting F −1 (0) = inf{t ∈ R , F (t) > 0}, and F −1 (1) = sup{t ∈ R , F (t) < 1}. Finally, for two positive sequences (a n ) and (b n ), we use the standard notation a n b n if there exists a positive constant C such that a n ≤ Cb n .

Main results
We fix r ≥ 1 and we henceforth write F x for F x,r to facilitate the reading. In the same way we will use the notation F −1 x ,Δ P,m , d P,m since there is no ambiguity on the power term r.
Given an observation point x ∈ R d , we introduce the modulus of continuitỹ Note that the fact thatω x is finite is equivalent to the fact that the support of P is bounded. An extensive discussion about the relation between the measure P and the modulus of continuity of F −1 x is proposed in Section 3. The functioñ ω x being non decreasing and non negative, it has a non negative limitω x (0 + ) at zero. In particular we do not assume here thatω x (0 + ) = 0. In other terms we do not assume that F −1 x is continuous. We extendω x at zero by taking ω x (0) =ω x (0 + ).
In the following, it will be sufficient in our results to consider upper bounds on the modulus of continuity, that is a non negative function ω x on [0, 1] such that ω x (v) ≥ω(v) for any v ∈ [0, 1]. A modulus of continuity being a non decreasing function, we will assume that such an upper bound ω x is non decreasing on [0, 1]. For technical reasons and without loss of generality, we will also assume that ω x is a continuous function, which takes its values in [ω(0), ω(1)] ⊂R + . For such a function ω x we also introduce its inverse function ω −1 x which is defined on [ω(0), ω(1)]. We extend this function to R + by taking ω −1 x (t) = 0 for any t ∈ [0, ω(0)] and ω −1 x (t) = 1 for any t ≥ ω(1). In particular, ω −1 x (ω x (u)) = u for any u ∈ [0, 1].
In this section, we show that the rate of convergence of Δ n, k n (x) is of the order of ωx( k n ) √ k .

Local analysis of the distance to the empirical measure in the bounded case
We first consider the behavior of the distance to the empirical measure when the observations X 1 , . . . , X n are sampled from a distribution P with compact support in R d . Let x . Assume moreover that ω x is a strictly increasing and continuous function on [0, 1].
The proof of the Theorem is based on a particular decomposition of Δ n, k n (x), see Lemma 5 in Appendix B.1. This decomposition allows us to consider the deviations of the empirical process rather than the deviations of the quantile process. The proof is given in Appendix B.
Let us now comment on the final bound on expectation (7). This bound can be rewritten as follows: The term n k comes from the definition of the DTM, it is the renormalization by the mass proportion k n . The term 1 √ n corresponds to a classical parametric rate of convergence. The term k n is obtained thanks to a local analysis of the empirical process. More precisely, it derives from a sharp control of the variance of the supremum over the uniform empirical process. The term ω x ( k n ) corresponds to the statistical complexity of the problem, expressed in terms of the regularity of the quantile function F −1 x . Theorem 1 can be interpreted with either an asymptotic or a non asymptotic point of view. Taking a non asymptotic approach, we consider n as fixed. A first result here is that we obtain sharp upper bounds for small values of k n . In the most favorable case whereω x (u) ∼ u, we see in (7) that an upper bound of the order of 1 n is reached. This is direct consequence of the local analysis we use to control the empirical process in the neighborhood of the origin. As mentioned before, assuming that k n is very small corresponds to the realistic situation where we use the DTM to clean the support from a small proportion of outliers. Now, taking an asymptotic approach, a second result of Theorem 1 is that it allows us to consider the asymptotic behavior of Δ n, k n (x) under all possible regimes, that is for all sequences (k n ) n∈N . For instance, with the classical approach where k n is such that k n /n = m for some fixed value m ∈ (0, 1), we then obtain the parametric rate of convergence 1/ √ n, as in the asymptotic functional results given in Chazal et al. (2014a).
Another key fact about Theorem 1 is that the upper bound (6) depends on the regularity of F −1 x through the function Moreover, if ω(0 + ) = 0, we see that the upper bound (6) depends on the reg- , coming back to (6), we find that for n large enough: In this context, the right hand term of Inequality (6) is of the order of Robust geometric inference 2251 We now give additional remarks about Theorem 1.

Remark 1. If the quantile function F −1
x is η-Hölder, then ω x (u) = Au η for some constant A ≥ 0 and thus we have Remember that Hölder functions with power η > 1 are constants, we can thus assume that η ≤ 1. If P is supported on a compact manifold of dimension b, and if P is absolutely continuous with respect to the Hausdorff measure on the manifold with a lower bounded density then E(|Δ n, k for more details about the implications in the case of (a,b) standard measures.

Remark 3. For values of k n not close to zero, the rate is consistent with the upper bound (13) deduced from the approach based on the stability results (see Appendix A). However, Theorem 1 is more satisfactory since it describes the statistical complexity of the problem through the regularity of the quantile function.
Remark 4. The application u → u 1/r is 1/r-Hölder on R + with Hölder constant 1 since 1/r < 1. It yields: whereΔ n, k n ,r (x) is defined by (1). We deduce an expectation bound onΔ n, k n ,r (x) from Jensen's Inequality and Inequality (8):

Remark 5.
As already mentioned before, to prove Theorem 1, we consider the deviations of the empirical process rather than the deviations of the quantile process. Indeed, the more direct approach that consists in directly controlling the deviations of the quantile process gives slower rates. More precisely, using Proposition 7 given in Appendix C borrowed from Shorack and Wellner (2009), it can be shown that n ) η which is slower than the rate given in Remark 1.
To complete the results of Theorem 1, we give below a lower bound using Le Cam's lemma (see Lemma 8 in Appendix C). Let ω be a continuous and strictly increasing function on [0, 1] and let x ∈ R d . We introduce that class of probability measures: In the previous definition, the functionω is as before the modulus of continuity of the quantile function of the distribution of the push-forward measure of P by the function y → y − x r .
Then, there exits a constant C which only depends on c, such that for any k ≤ūn.
where the infimum is taken over all the estimatord n (x) of d P,m,r (x) defined from a sample X 1 , . . . , X n of distribution P .
The Assumption (9) is not very strong. It means that ω is not a too large upper bound on the modulii of continuity of the quantile functions. More precisely, it says that there exists a distribution P ∈ P ω,x for which ω can be comparable to the modulus of continuity of the quantile functions F −1 x in the neighborhood of the origin.
Note that this lower bound matches with the upper bound of Theorem 1 when k is very small since it is of the order of ω( k n ). Providing the correct lower bound for all values of k is not obvious. As far as we know there is no standard method in the literature for computing lower bounds for this kind of functional and we consider that this issue is beyond the scope of this paper.

Local analysis of the distance to the empirical measure in the unbounded case
The previous results provide a description of the fluctuations and mean rates of convergence of the empirical distance to measure. However, when the support of P is not bounded, the quantile function F −1 x tends to infinity at 1 and the modulus of continuity of F −1 x is not finite. In such a situation, Theorem 1 can not be applied. We now propose a second result about the fluctuations of the DTEM, under weaker assumptions on the regularity of F −1 x . The following result shows that under a weak moment assumption, the rate of convergence is the same as for the bounded case, up to a term decreasing exponentially fast to zero.
Theorem 2. Letm ∈ (0, 1) and some observation point Assume moreover that ω x,m is a strictly increasing and continuous function on [0,m]. Then, for any k < n 1 2 ∧m and any λ > 0: Assume moreover that ω x,m (u)/u is a non increasing function and that P has a moment of order r. Then where C is an absolute constant and C x,r,m only depends on the quantity E X − x r and onm.
As for the bounded case, if ω(0 , then the rate of convergence is still of the order of Ψx(m) √ n . Note that this result is interesting even when the measure P is supported on a compact set. Indeed, assume that the quantile function F −1 is smooth in the neighborhood of zero, form small enough the assumption (10) may be satisfied with a function ω x,m which can be very small in the neighborhood of zero. Theorem 2 may provide better bounds in this context than those given by Theorem 1. This fact also confirms that the deviations of the DTEM mainly relies on the local regularity of the quantile function F −1 x at the origin rather then on its global regularity.

Convergence of the distance to the empirical measure for the sup norm
The previous results address the pointwise fluctuations of the DTEM. We now consider the same problem for the sup norm metric on a compact domain D of R d . Let N (D, t) be the covering number of D, that is the smallest number Since the domain D is compact, there exists two positive constants c and ν ≤ d such that for any t > 0: We assume that there exists a function ω D : (0, 1] → R + which uniformly upper bounds the modulus of continuity of the quantile functions (F −1 x ) x∈D : for any u, u ∈ (0, 1] 2 and for any x ∈ D: We also assume as before that ω D is a strictly increasing and continuous function on [0, 1].

Theorem 3. Under the previous assumptions, for any
where log + (u) = (log u) ∨ 1 for any u ∈ R + . The constant C is an absolute constant if r = 1 otherwise it depends on r and on the Hausdorff distance between D and the support of P .
This bound is deduced from a deviation bound on sup x∈D |Δ n, k n (x)| which is given in the proof. Up to a logarithm term, the rate is the same as for the pointwise convergence. As for the pointwise convergence, this result could be easily extended to the case of non compactly supported measures.

The geometric information carried by the quantile function F −1 x
The upper bounds we obtain in the previous section directly depend on the regularity of F −1 x . We now give some insights about how the geometry of the support of the measure in R d impacts the quantile function F −1 x .

Compact support and modulus of continuity of the quantile function
A geometric characterization of the existence ofω x on [0, 1] can be given in terms of the support of the measure P . The following Lemma is borrowed and adapted from Proposition A.12 in Bobkov and Ledoux (2014): Lemma 1. Given a measure P in R d and an observation point x ∈ R d , the following properties are equivalent: 1. the modulus of continuity of the quantile function F −1 x satisfiesω x (u) < ∞ for any u ≤ 1 ; 2. the push-forward distribution of P by the function x − · r is compactly supported ; 3. P is compactly supported.
In particular, if P is compactly supported, we can always take as an upper bound onω x the constant function ω x = Haus ({x}, K). Of course this is not a very relevant choice to describe the rate of convergence of the DTEM.

Connectedness of the support and modulus of continuity of the quantile function
While discontinuity of the distribution function corresponds to atoms, discontinuity points of the quantile function corresponds to area with empty mass in R d (see the right picture of Figure 1). The fact thatω x (0 + ) = 0 is directly related to the connectedness of the support of the distribution dF x . Indeed, it is equivalent to assuming that the support of dF x is a closed interval in R + , see for instance Proposition A.7 in Bobkov and Ledoux (2014).
In the most favorable situations where the support of P is a connected set, thenω x (0 + ) = 0 and the fasterω x tends to 0 at 0, the better the rate we obtain. However, for some point x ∈ R d , it is also possible for the support of dF x to be an interval even when the support of P is not a connected set of R d (see the left picture of Figure 1). In the other case, when the support of dF x is not a connected set, the termω x (0) roughly corresponds to the maximum distance between two consecutive intervals of the support of dF x (see the right picture of Figure 1). Our results can still be applied in these situations but the upper bounds we obtain in this case are larger because ω x ( k n ) can not be smaller thanω x (0).

Uniform modulus of continuity of F −1 x,r versus local continuity of F −1
x,r at the origin Though stronger than continuity, a natural regularity assumption on F −1 x,r is assuming that this function is also concave: ular, if x is in the support of P then we can take ω x = F −1 x .

2256
F. Chazal et al. If we take r = 1, in many simple situations we note that the cumulative distribution function F x,1 roughly behaves as a power function t , where is the dimension of the support. In this context, the quantile function F −1 x,1 roughly behaves as a power function in u 1/ . We then have that F −1 x,r (u) = F −1 x,1 (u) r behaves as u r . This is for instance the case for (a, b) standard measures, as shown in the next section. These considerations suggest that if r/ < 1, in many situations the quantile function is concave and then ω x is of the order of F −1 x − F −1 x (0). This means that the upper bound on E|Δ P,n, k n | is of the order of 1 √ n Ψ x ( k n ). More generally, as noticed in the comments following Theorem 1, the term is the dominating term in the upper bound (6). We may check with the numerical experiments of Section 4 that the function Ψ x yet captures the correct monotonicity of E|Δ P,n, k n | as a function of k n .

The case of (a, b) standard measures
The intrinsic dimensionality of a given measure in R d can be quantified by the so-called (a, b)-standard assumption which assumes that there exists a > 0, where K is the support of P . This assumption is popular in the literature about set estimation (see for instance Cuevas, 2009;Cuevas and Rodríguez-Casal, 2004). More recently, it has also been in used in Chazal  Since K is compact, by reducing the constant a to a smaller constant a if necessary, we easily check that Assumption (11) is equivalent to We now give control on the two key terms ω x and F −1 x (u) − F −1 x (0) which are involved in the bounds on expectations of Section 2.
Lemma 3. Let P be a probability measure on R d which is (a, b) standard on its support K. Then, for any u ∈ [0, 1], where r is the power parameter in the definition (2) of the DTM. Assume moreover that K is a connected set of R d . Then, for any h ∈ (0, 1) we havẽ Proof. We have (see the left picture of Figure 2) We now assume that K is a connected set. Let (u, h) ∈ (0, 1) 2 such that u+h ≤ 1 and F −1 ] 1/r (see the right picture of Figure 2). By definition of a quantile, there exists a point For any small δ > 0, it can be easily checked that By taking the limit we obtain that h ≥ P (B(x 3 , α(h) 2 )). The measure P being (a, b)-standard, we find that h ≥ a( α(h) 2 ) b , and then which proves the Lemma.

Numerical experiments
In this section, we illustrate with numerical experiments that the expectation bounds given on Δ n, k n in Section 2 are sharp. In particular, we check that the function Ψ x has the same monotonicity as the function m → E|Δ n, k n (x)| . We consider four different geometric shapes in R, R 2 and R 3 , for which a visualization is possible: see Figures 3 and 4.
• Segment Experiment in R. The shape K is the segment [0, 1] in R.
• 2-d shape Experiment in R 2 . A closed curve has been drawn at hand in R 2 . It has been next approximated by a polygonal curve with a high precision. The shape K is the compact set delimited by the polygon curve. • Fish Experiment: a 2-d surface in R 3 . The shape K is the discrete set defined by a point cloud of 216979 points approximating a 2-d surface representing a fish. This dataset is provided courtesy of CNR-IMATI by the AIM@SHAPE-VISIONAIR Shape Repository. • Tangle Cube Experiment in R 3 . The shape K is the tangle cube, that is the 3-d manifold defined as the set of points (x 1 , x 2 , x 3 ) ∈ R 3 such that x 4 1 − 5x 2 2 + x 4 2 − 5x 2 2 + x 4 3 − 5x 2 3 + 10 ≤ 0.  For each shape, we consider three generative models. These models are standard in support estimation and geometric inference, see Genovese et al. (2012) for instance.
• Noiseless model: X 1 , . . . X n are sampled from the uniform probability distribution P uni on K. • Clutter noise model: X 1 , . . . X n are sampled from the mixture distri-bution P cl = πU + (1 − π)P where U is the uniform measure on a box B which contains K and where π is a proportion parameter. • Gaussian convolution model: X 1 , . . . X n are sampled from the distribution P g = P Φ(0, σI d ) where Φ(0, σ) is the centered isotropic multivariate Gaussian distribution on R d with covariance matrix σI d . We take σ = 0.5 in all the experiments. We use the same notation P for any of the probability distributions P uni , P cl or P g . An observation point x is fixed for each experiment. For each experiment and each generative model, from a very large sample drawn from P we compute very accurate estimations of the quantile functions F −1 x,r and of the DTM d P ,m,t (x). Next, we simulate n-samples from P and we compute the DTEM for each sample. We take n = 500 for the two first experiments and n = 2000 for the two others. The trials are all repeated 100 times and finally we compute some approximations of the error EΔ n, k n ,r (x) with a standard Monte-Carlo procedure, for all the measures P . The DTMs and the DTEMs are computed for the powers r = 1, r = 2 , and also for r = 3 for the Tangle Cube Experiment. We also compute the function m →Ψ(m). The simulations have been performed using R software (R Core Team, 2014) and we have used the packages FNN, rgl, grImport and sp.

Results
The figures 5 to 8 give the results of the four experiments with the three generative models. The top graphics of Figures 5 to 8 represent the quantiles functions F −1 x,r in each case. For the noiseless models, the behavior of F −1 x,r at the origin is directly related to the power r and to the intrinsic dimension of the shape. For r = 1, the quantile is linear for the the segment, it is roughly in √ m for the 2-d shape and for the Fish Experiment. It is of order of m 1/3 for the Tangle Cube. We observe that F −1 x,r is roughly linear with r = 2 for the 2-d shape and the Fish shape, and with r = 3 for the Tangle Cube.
The quantile functions of the noise models in the four cases start from zero since the observation is always taken inside the supports of P cl and P g . A regularity break for the quantile function of the clutter noise model can be observed in the neighborhood of m = P (B(x, K − x r )). The quantile functions for the Gaussian noise is always smoother.
The main point of these experiments is that, in all cases, the function m → Ψ(m) shows the same monotonicity as the expected error studied in the paper : m → |EΔ n,m,r (x)|. These results confirm that the functionΨ provides a correct description of EΔ n,m,r .
We also observe that the function : m → E|Δ n,m,r (x)| does not have one typical shape: it can be an increasing curve, a decreasing curve or even an Ushape curve. Indeed, the monotonicity depend on many factors including the intrinsic dimension of the shape, its geometry, the presence of noise and the power coefficient r.

Conclusion
When the data is corrupted by noise, the distance to measure is one clue for performing robust geometric inference. For instance it can be used for support estimation and for topological data analysis using persistence diagrams, as proposed in Chazal et al. (2014a). In practice, a "plug-in" approach is adopted by replacing the measure by its empirical counterpart in the definition of the DTM. The main result of this paper is providing sharp non asymptotic bounds on the deviations of the DTEM.
The DTM has been recently extended to the context of metric spaces in Buchet et al. (2015b). For the sake of simplicity, we have assumed that P is a probabil-ity measure in R d . However, all the results of the paper can be easily adapted to more general metric spaces by considering the push forward distribution of P by d(x, ·) r where d is the metric in the sampling space.
This paper is a step toward a complete theory about robust geometric inference. Our results give preliminary insights about how tuning the parameter m in the DTEM, which is a difficult question. The experiments proposed in Section 4 show that the term EΔ n,m,r (x) does not have a typical monotonic behavior with regard to m and thus classical model selection methods can be hardly applied to this problem. We intend to study this non standard model selection problem in future works.

Appendix A: Rates of convergence derived from the DTM stability
The DTM satisfies several stability properties for the Wasserstein metrics. In this section, rates of convergence of the DTEM are derived from stability results of the DTM together with known results about the convergence of the empirical measure under Wasserstein metrics. We check that the results derived in this way are not as tight as the results given in Section 2.
Let us first remind the definition of the Wasserstein metrics in R d . For r ≥ 1, the Wasserstein distance W r between two probability measures P andP on R d is given by where Π(P,P ) is the set of probability measures on R d × R d with marginal distributions P andP , see for instance Rachev and Rüschendorf (1998) or Villani (2008). The stability of the DTM with respect to the Wasserstein distance W r is given by the following theorem.
Notice that Chazal et al. (2011b) prove this theorem for r = 2, but the proof for any r ≥ 1 is exactly the same.
We now give the pointwise stability of the DTM with respect to the Kantorovich distance W 1 between push forward measures on R. This result easily derives from the expression (4) of the DTM given in Introduction, a rigorous proof is given in AppendixB.1.

Proposition 2.
For some point x in R d and some real number r ≥ 1, let dF x,r and dF x,r be the push-forward measures by the function y → x − y r of two probability measures P andP defined on R d . Then, for any x ∈ R d : Convergence results for Δ n,m,r can be directly derived from the stability results given in Theorem 4 and Proposition 2. For instance, it can be easily checked that, for any x ∈ R d , W 1 (dF x,r , dF n,x ) tends to zero almost surely (see for instance the Introduction Section of del Barrio et al., 1999). This together with Proposition 2 gives the almost surely pointwise convergence to zero of Δ n,m,r (x).
Regarding the convergence in expectation, using Theorem 4 in R d for d > r/2, we deduce from Fournier and Guillin (2013) or from Dereich et al. (2013) that Nevertheless this upper bound is not sharp: assume that k n := mn for some fixed constant m ∈ (0, 1) then the rate is of the order of n −1/d . We show below that the parametric rate 1/ √ n can be obtained by considering the alternative stability result given in Proposition 2. In the one-dimensional case, a direct application of Fubini's theorem gives that (see for instance Theorem 3.2 in Bobkov and Ledoux, 2014) where dF x,r and dF x,r,n are the push forward probability measures of P and P n by the function x − · r . Note that Bobkov and Ledoux (2014) have completely characterized the convergence of EW 1 (μ, μ n ) in the one-dimensional case, in term of J 1 (μ) for μ a probability measure on the real line and μ n its empirical counterpart. From Proposition 2 and the upper bound (12) we derive that We thus obtain a pointwise rate of convergence of 1/ √ n under reasonable moment conditions, if we take k n := mn for some fixed constant m ∈ (0, 1). However, the upper bound (13) does not allow us to describe correctly how the rate depends on the parameter m = k n . For instance, if k n is very small, the bound blows up in all cases while it should not be the case for instance with discrete measures. The reason is that the stability results are too global to provide a sharp expectation bound for small values of k n .

Rewritting the DTM in terms of quantile function
Let P a probability distribution in R d , x ∈ R d and r ≥ 1. Let F x,r be the distribution function of the random variable x − X r , where the distribution of the random variable X is P . The preliminary distance function to P can be rewritten in terms of the quantile function F x,r : Lemma 4. For any u ∈ (0, 1), we have δ r P,u (x) = F −1 x,r (u). In particular, δ P,u (x) = F −1 x,1 (u).
Proof. Note that for any t ∈ R + , F x,r (t) = P (B(x, t 1/r )). Next, and we deduce that where we have used the continuity of s → s r for the last equality.
From Lemma 4 we directly derive the expression of the DTM in terms of the quantile function F −1 x,r , as given by Equation (4) in the Introduction Section:

Proof of Proposition 4.
Let F andF be the cdfs of two probability measures dF and dF on R. Recall that, for any r ≥ 1, and any measure μ andμ in R: see for instance see for instance Cambanis et al. (1976) or Theorem 2.10 in Bobkov and Ledoux (2014). Thus and the proof follows using Equation (4). A decomposition of Δ n, k n ,r .
For any x ∈ R d , any r ≥ 1 we have F −1 x,n,r (0) ≥ F −1 x,r (0) ≥ 0 since F x,r is the cdf of the random distance x − X r whose support is included in R + . From Equation (5) and geometric considerations (see Figure 9) we can rewrite Δ n,m,r as given in the following Lemma.
Lemma 5. The quantity Δ n, k n ,r can be rewritten as follows:

B.2. Proof of Theorem 1
We recall that we use the notation F for F x,r and F n for F x,n,r in the proof.
Upper bound on the fluctuations of Δ n, k n (x) We first check that P (|Δ n, k n (x)| ≥ λ) = 0 for λ ≥ ω x (1). Note that ω x (1) < ∞ because the support of P is compact. Let G n and G −1 n be the empirical uniform distribution function and the empirical uniform quantile function (see Appendix C). Starting from the definition (5) of the DTM and using Proposition 3 in Appendix C, we obtain that for λ ≥ 0 and k ≤ n: and this probability is obviously zero for any λ ≥ ω x (1). We now prove the deviation bounds starting from Lemma 5. If In all cases, Inequality (14) is thus satisfied.
• Local analysis : deviation bound of Δ n, k n (x) for k n close to zero. We now prove the deviation bound for k n < 1 2 . We first upper bound the term A in (14). According to Proposition 5 in Appendix C, for any u 0 ∈ (0, 1 2 ) and any λ > 0: Chazal et al. For u 0 ≤ 1 2 and λ > 0 it yields where we have used Proposition 3 in Appendix C for the first equality, (15) for the second inequality, and that for any u, v > 0, exp(−u/(1+v)) ≤ exp(−u/2)+ exp(−u/(2v)). The term A can be upper bounded by controlling the supremum We now upper bound B. We have Thus, according to Proposition 3 in Appendix C, Let θ ∈ (0, 1) to be chosen further. We have Then we can write Thus, where we have used Propositions 4 and 6. According to (14), we have P (|Δ n, k n (x)| ≥ λ) ≤ P (A ≥ λ 2 ) + P (B ≥ λ 2 ). We then obtain the following deviation bound from Inequalities (16) and (18) for any k n < 1 2 and any λ > 0: where θ will be chosen further in the proof.

2272
F. Chazal et al. • Deviation bound of Δ n, k n (x) for k n ≥ 1 2 . For controlling A, we now use the DKW Inequality (see Theorem 5), it gives that We find that for anyθ > 0: whereθ will be chosen further in the proof.
Upper bound on the expectation of Δ n, k n (x) • Case k n ≤ 1 2 . By integrating the probability in (19), we obtain Since ω x (u)/u is a non increasing function, we have that ω −1 x (t)/t is a non decreasing function. Then, for any positive constants λ 1 and λ 2 : We then take λ 1 = 2 θ n k ω x 2 √ k n 2 and λ 2 = 2 θ n k ω x 8 3n 2 and we find that We then choose to balance the terms I 1 and 8 θ 2 n in (21). The deviation bound given in the theorem corresponds to this choice for θ.
Finally, note that ω x because ω x (u)/u is a non increasing function and we obtain that there exists an absolute constant C such that • Case k n ≥ 1 2 . We integrate the deviations (20) and we obtain that

We then chooseθ
The deviation bound given in the Theorem correspond to this choice forθ. Since n k ≤ √ 2 ≤ 2, we see that the expectation bound (23) for this choice of θ can be rewritten as the expectation bound (22). This concludes the proof of Theorem 1.

B.3. Proof of Proposition 1
We first consider the case d = 1. For applying Le Cam's Lemma (Lemma 8), we need to find two probabilities P 0 and P 1 which distances to measure are sufficiently far from each other. Without loss of generality we can assume that x = 0. LetP ∈ P ω,x which satisfies (9). We can assume thatP is supported on R + since the push forward measure ofP by the norm is in P ω,x and also satisfies (9). LetF −1 be the quantile function ofP . For some n ≥ 1, let P 0 :=P and let P 1 := 1 n δ 0 +P [0,F −1 (1−1/n)] , where δ 0 is a Dirac distribution at zero and wherē P [a,b] is the restriction of the measureP to the set [a, b]. For i = 0, 1, let P i,r be the push-forward measure of P i by the power function t → t r on R + . Let also F i,r and F −1 i,r be the distribution function and the quantile function of P i,r , see Figure 10 for an illustration. Note that that P 1,r = 1 n δ 0 + P 0,r | [0,F −1 0,r (1−1/n)] . Thus P 1 is in P ω,x because F −1 1,r (u) = F −1 0,r (0) if u ≤ 1 n , F −1 0,r (u − 1/n) otherwise.
The probability measures P 0 and P 1 are absolutely continuous with respect to the measure μ := δ 0 +P . The density of P 0 with respect to μ is p 0 := 1 (0,+∞) whereas the density of P 1 with respect to μ is p 1 = 1 n 1 {0} + 1 (0,F −1 (1−1/n)] . where we have used Assumption(9) for the last inequality. We conclude using Le Cam's Lemma. We now consider the case d ≥ 2. LetP ∈ P ω,x which satisfies (9). By considering the push-forward measure ofP by the function y −→ ( y , 0, . . . , 0) , we see that it is aways possible to assume that there exist a probabilityP supported on R + × {0} d−1 which satisfies (9). Now, it is then possible to define P 0 and P 1 as in the case d = 1 except that their support is now in R + ×{0} d−1 . Following the same construction, the quantities T V (P 0 , P 1 ) and d r P0,r (x) − d r P1,r (x) take the same values as in the case d = 1. We thus obtain the same lower bound as in the case d = 1.

B.4. Proof of Theorem 2
Inequality (14) in the proof of Theorem 2 is still valid. We can also use the deviation bound (16) on A for the case k n ≤ 1 2 . Regarding the deviation bound on B, we restart from Inequality (17) and we note that By definition of B,B and B 3 , and using Proposition 3 in Appendix C, we obtain that where P (A ≥ λ 2 ) + P (B ≥ λ 2 ) has already been upper bounded in the Proof of Theorem 2. We now upper bound the deviations of B 3 . For any θ 3 ∈ (0, 1) to be chosen further, we have: We have P (B 3 ≥ λ 2 ) ≤ P B 4 ≥ λ 2 + P B 5 ≥ λ 2 where P B 5 ≥ λ 2 ≤ 2 exp − kθ 2 3 λ 8 + 2 exp − 3θ 3 k 8 The probability P B 4 ≥ λ 4 can be upper bounded in two different ways: one using a concentration argument et one based on the Beta distribution of G −1 n . According to Proposition 6, we have