Uniform convergence rates for the approximated halfspace and projection depth

The computational complexity of some depths that satisfy the projection property, such as the halfspace depth or the projection depth, is known to be high, especially for data of higher dimensionality. In such scenarios, the exact depth is frequently approximated using a randomized approach: The data are projected into a finite number of directions uniformly distributed on the unit sphere, and the minimal depth of these univariate projections is used to approximate the true depth. We provide a theoretical background for this approximation procedure. Several uniform consistency results are established, and the corresponding uniform convergence rates are provided. For elliptically symmetric distributions and the halfspace depth it is shown that the obtained uniform convergence rates are sharp. In particular, guidelines for the choice of the number of random projections in order to achieve a given precision of the depths are stated.


Introduction
Data depth is a general concept that intends to provide a base for nonparametric statistical analysis of multivariate data. The idea is to quantify the centrality of each point x in R d , with respect to a given dataset. Points that tend to lie in the central bulk of the data are assigned high depth values; points that fail to follow the prevailing pattern of the observations are flagged by their low depth. One obtains a data-dependent ranking of points in R d , that enables constructions of analogues of quantiles, ranks, and orderings, applicable to multivariate datasets. Many definitions of depths appeared in the literature; we refer to [40,22,27] and references therein. Here we focus on two important depths that share common traits -the halfspace depth of Tukey [35], and the (generalized) projection depth whose history traces back to Stahel [34] and Donoho [10].
While theoretically the depths are appealing, it is in practice often difficult to evaluate them exactly. For instance, the computation of the halfspace depth of a single point in arbitrary dimension is known to be NP-hard [19]. Therefore, a great deal of research has focused on procedures that approximate the true depth [13,6,4,32,2,33]. A particularly simple upper bound on the halfspace and projection depth can be devised if one uses their so-called projection property [13], which means that the overall (multivariate) depth of a point x is expressed as the infimum of (univariate) depths of projections of x with respect to the projected dataset. This suggests the following approximation procedure: (i) draw a random sample of n directions U i , i = 1, . . . , n, uniformly from the unit sphere S d−1 of R d ; (ii) evaluate the (univariate) depths of U i , x with respect to the dataset projected onto U i for each i; and (iii) approximate the depth of x by the minimum of these numbers. This approximation was first proposed in [13]; for the halfspace depth, it leads to what is sometimes called the random Tukey (or halfspace) depth [6].
Here we study the statistical properties of this approximation procedure. We are interested in conditions under which uniform convergence of the approximated depth to its theoretical counterpart can be guaranteed, and its uniform convergence rates. It turns out that for finite datasets, uniform convergence is never achieved. Thus, we focus on general probability measures on R d , and show that under appropriate regularity conditions, uniform approximations of the depths are valid, and their sharp convergence rates are possible to be devised. These results lend valuable insights into the behaviour of the approximated depths. They allow us to state general guidelines for the number of directions n that need to be taken in order to achieve a desired precision. Nevertheless, even in very regular models, n grows fast with the dimension d. Above d = 5, for reasonable n the considered randomization scheme does not attain precision sufficient for practical applications, and more elaborate approximation methods should be preferred.
In Sections 2, 3 and 4 we deal with the halfspace depth. Section 2 introduces notations and fixes the basic ideas of the paper. Our two main theorems, of a rather technical nature, are given in Section 3. Their applications to different classes of probability distributions are discussed in Section 4. Explicit, and exact, rates of convergence are derived for (i) general unimodal elliptically symmetric distributions; (ii) multivariate Gaussian distributions; (iii) multivariate t-distributions; (iv) uniform distributions on balls; and (v) the general collection of multivariate p-symmetric measures [15]. We give explicit guidelines for the choice of the parameter n that allow to achieve a pre-specified quality of the approximation. In Section 4.5 we discuss situations when uniform approximation cannot be achieved; in particular, we deal with empirical measures. Extensions to the (generalized, or asymmetric) projection depth are treated in Section 5. Some concluding remarks can be found in Section 6. All the proofs are deferred to the appendix -Appendix A for the halfspace depth, and Appendix B for the generalized projection depths.

Halfspace depth and its approximations
Let (Ω, A, P) be the probability space on which all random elements are defined. Denote by P R d the space of all Borel probability measures on R d equipped with the Euclidean norm · and the scalar product ·, · , d = 1, 2, . . . . We write S d−1 = x ∈ R d : x = 1 for the unit sphere in R d . Notation X ∼ P stands for a random variable X whose distribution is P ∈ P R d .
For P ∈ P R d and x ∈ R d , the halfspace depth (or Tukey depth) of x with respect to X ∼ P is defined as (1) HD(x; P ) = inf Where no mistake can be made, HD(x; P ) is shortened to HD(x). The halfspace depth was proposed in [35], see also [10,11] and [29]. For n = 1, 2, . . . consider U 1 , . . . , U n a random sample from the uniform distribution on S d−1 . We are interested in the quality of the approximation of the depth (1) by its randomized counterpart [13,Section 6] (2) HD n (x; P ) = min i=1,...,n P ( U i , X ≤ U i , x ) .
Again, HD n (x; P ) will be shortened to HD n (x). It is evident that HD n (x) ≥ HD n+1 (x) ≥ HD(x) for all x ∈ R d and n = 1, 2 . . . , and that for d = 1 the random depth HD n with high probability reduces to the true depth HD In the interesting case d > 1, for any x ∈ R d the random HD n (x) approximates HD(x).
In [13,Proposition 11] it was shown that for any x ∈ R d and P ∈ P R d the convergence HD(x) holds true. Here we are interested in establishing uniform extensions of that convergence result, and deriving the corresponding rates of convergence; for an array of applications of these results to the computation of the depth see [28,Section 2.3].
Define the halfspace function of X ∼ P , given for x ∈ R d by Both HD(x) and its approximation HD n (x) can be expressed in terms of ϕ x as When deriving the rates of convergence of the depth approximations that are uniform on a set C ⊂ R d , it is necessary to assume a certain form of the equicontinuity of the halfspace functions (4) {ϕ x : x ∈ C} .
In the first step, in Section 3.1 we derive our main result under the assumption of uniformly Lipschitz continuous functions ϕ x . In the second step in Section 3.2 we extend the previous theorem, and deal with a general equicontinuous class of functions (4).

Approximation of the halfspace depth: Main results
We are interested in the rate of convergence of the random sequence as n → ∞. We start by two technical results that will be of great importance in the sequel. For Γ (·) the gamma function, denote by where u − v g = arccos ( u, v ) is the great-circle distance, i.e. the geodesic distance between u and v. Note that the great-circle distance and the Euclidean distance are related via This relation follows easily from the cosine formula and the half-angle formula. A condition similar to (6) was used in [3] when deriving the uniform rates of convergence for the ordinary empirical halfspace depth process. Examples of distributions that satisfy property (6) will be listed in Section 4. The proof of the following result can be found in Appendix A.1.
Theorem 1. Let P ∈ P R d be such that (6) holds true for a set C ⊂ R d . Then lim sup n→∞ n a d (∆ n (C)/L) − log n log log n ≤ d a.s.

3.2.
General uniformly continuous projections. Now, assume that the class of functions (4) is uniformly equicontinuous, but not necessarily uniformly Lipschitz. Consider the minimal modulus of continuity of this class of functions Using the minimal modulus of continuity we can bound the distance between ϕ x (u) and ϕ x (v) uniformly for all x ∈ C, Under the condition of uniform equicontinuity of (4), the function δ(t) is continuous, non-negative and non-decreasing, with δ(0) = 0, see [7,Chapter 2,§ 6]. With no loss of generality, in the sequel we assume that δ is increasing, and denote its inverse function by δ −1 . If δ is constant on some interval, proper use of a generalized inverse function of δ in place of δ −1 yields the same conclusions [14]. The proof of the next result is in Appendix A.2.
Theorem 2. Let P ∈ P R d and C ⊂ R d be such that the function δ from (7) is continuous at 0 with an inverse function δ −1 . Then lim sup n→∞ n a d (δ −1 (∆ n (C))) − log n log log n ≤ d a.s.

4.
Applications to probability distributions 4.1. Approximations and affine invariance. A depth D is said to be affine invariant if for any X ∼ P ≡ P X ∈ P R d , A ∈ R d×d non-singular, and b ∈ R d Affine invariance is a desired property of a depth function. Both the halfspace depth and the (generalized) projection depth satisfy it.
The following theorem asserts that for the halfspace depth, computation of approximations with respect to affine images of a measure affects the resulting rates of convergence only by a multiplicative constant. This allows us to focus in some examples in the sequel only on the isotropic situation, which typically means that the measure is considered to be centred to have the expected value in the origin, and the scatter matrix a multiple of the identity matrix. The proof of the next result can be found in Appendix A.3.
Theorem 3. Let Theorem 2 hold true. Then for A ∈ R d×d non-singular, b ∈ R d , and X ∼ P ∈ P R d there exists a constant K > 0 that depends only on A such that for it holds true that lim sup n→∞ n a d δ −1 ∆ n (C) /K − log n log log n ≤ d a.s.

4.2.
General distributions with densities. Let us now explore when the equicontinuity conditions (6) or (7) hold true. The first interesting case is that of a bounded set C ⊂ R d .
Theorem 4. Let P ∈ P R d be a distribution such that Condition (9) is satisfied if P admits a density in R d . If the density is bounded, an explicit rate of convergence of ∆ n (C) can be provided.
Theorem 5. Let P ∈ P R d be a distribution with a density bounded from above by a constant M > 0, that vanishes outside a bounded set C ⊂ R d . Let further be diam(C) = sup{ x − y : x, y ∈ C} the diameter of C. Then the rate from Theorem 1 holds true with For the proofs of these two theorems see Appendix A.4 and A.5. Note that the rate from Theorem 5 is quite general, yet usually weak.

Elliptically symmetric distributions.
A much finer result can be shown for elliptically symmetric distributions with unimodal densities on R d . By a unimodal elliptically symmetric density we understand a density with Σ ∈ R d×d non-singular, µ ∈ R d , and g : [0, ∞) → [0, ∞) a non-increasing scalar function. This important collection of distributions covers all Gaussian distributions, or multivariate extensions of the t-distributions. In this situation we are allowed to consider C = R d , and study the uniform convergence of the approximated depth over the whole sample space. In accordance with the discussion in Section 4.1 we primarily focus on distributions with Σ a multiple of the identity matrix. Such distributions are frequently called spherically symmetric distributions.
Theorem 6. Let P ∈ P R d be a distribution with an elliptically symmetric density and Σ a multiple of the identity matrix.
Note that in the proof of the previous theorem, given in Appendix A.6, for F p the distribution function of the random variable u, X , u ∈ S d−1 , the better bound sup t≥0 (F p (t) − F p (t cos(ε))) ≤ (1 − cos(ε))/2 is still not the tightest possible one. Thus, Theorem 6 is suboptimal for particular choices of P . For elliptically symmetric distributions, define the tight modulus of continuity of the halfspace functions (the modulus of continuity of F p evaluated near the argument of the minimum of ϕ x ) by (10) δ(ε) = sup t≥0 (F p (t) − F p (t cos(ε))) for ε ∈ [0, π].
Using the same proof as before with this choice of the modulus δ, it can be asserted that for all n large enough the rate from Theorem 2 holds true for P also with δ given in (10) and C = R d . Interestingly, the latter bound using the modulus (10) can be shown to be optimal for spherically symmetric distributions. For this, recall that for two sequences of real numbers a = {a(n)} ∞ n=1 and b = {b(n)} ∞ n=1 we say that a is asymptotically bounded both from above and below by b, written a(n) = Θ(b(n)), if The proof of the next theorem is given in Appendix A.7.
Theorem 7. Let P ∈ P R d be a distribution with a unimodal elliptically symmetric density with Σ a multiple of the identity matrix, and let δ(ε) be defined as in (10). Then In particular, for d > 2 P n a d δ −1 ∆ n (R d ) − log n = Θ (log log n) = 1.
Specific rates of convergence can be explored with particular models at hand. Here we study three important special cases -multivariate Gaussian distributions, multivariate elliptically symmetric t-distributions, and uniform distributions on ellipsoids. In accordance with the discussion from Section 4.1, we treat only standardized distributions.
Example (Multivariate Gaussian distribution). For a standard multivariate Gaussian distribution X, u, X is standard univariate Gaussian for any u ∈ S d−1 , i.e.
To get tight convergence rates, we must evaluate the modulus (10) for ε > 0 fixed. It is not difficult to find that the maximal value of the latter function is attained at The first bound above is the optimal one from Theorem 7; the other two follow from Theorem 6. All of them can be used for numerical evaluation of the exact convergence rates. For that, it is enough to invert the inequality in Theorem 2 and obtain an error bound of the form ∆ n (C) ≤ δ a −1 d ((d log log n + log n) /n) which will approximately hold true almost surely for n large enough. A comparison of the optimal uniform approximation errors for Gaussian distributions can be found in Table 1. Graphically, these errors are compared with the general bounds from Theorem 6 in Figure 1, see also the function δ(ε) from (11) displayed in Figure 3. The tight bound presents a substantial improvement over the general bounds from Theorem 6.
Example (Multivariate t-distribution). Consider the multivariate t-distribution P ∈ P R d with ν degrees of freedom, whose density is given by where Σ ∈ R d×d is non-singular, µ ∈ R d , and ν = 1, 2, . . . . For ν = 1 it is also known as the multivariate Cauchy distribution. Obviously, P satisfies the assumptions of Theorem 6. We again focus on the case µ = 0, and Σ the identity matrix. The multivariate t-distribution is then spherically symmetric about the origin, with the distribution function of the univariate standard t-distribution with ν degrees of freedom, see [21,Theorem 1]. Let us evaluate the tight modulus of continuity δ from (10) as in the previous example. For general ν, the argument of the maximum t * of the function t → (F p (t) − F p (t cos(ε))) Ellipt. 1 For other small values of ν the formulas for t * can still be obtained explicitly, yet their expressions are already rather complicated. With t * known, the tight bound on the approximation errors (10) can be obtained as in (11). In Figure 1 and Table 1 we see the numerically computed bounds on the approximation error in the present setup for several values of n and d. The function δ defined in (10) can be found in Figure 3. It can be seen that the approximation for heavy-tailed distributions P is in fact more precise than for light-tailed ones. Indeed, if the tails of P are rather flat and decrease slowly, the halfspace function ϕ x (u) will never change drastically for small deviations from its argument of the minimum u(x).  Example (Uniform distribution on a ball). Consider the uniform distribution P on the unit ball in R d . The marginal distribution of u, X for any u ∈ S d−1 is given by the distribution function Note that unlike in our previous examples, F p depends on the dimension d. For small values of d, it is possible to express the point that realises the tight bound in (11) explicitly In higher dimensions, approximations are easily obtained. Though, it is well known that as d → ∞, the distribution of u, X is approximately Gaussian [9], which reduces the problem for higher values of d approximately to the case of the multivariate Gaussian distribution. In Table 1 and Figure 2 the error bounds are compared for d = 2, 3, and 5. As can be seen, the approximation is slower than for both Gaussian and t-distributions. This is explained by the discontinuous nature of the density of P -for x near the boundary of the unit ball and d small, the halfspace function ϕ x deviates substantially from its minimum value already for u rather close to its minimizer u(x). Though, the general bound from Theorem 6 still performs somewhat worse.  with p ∈ (0, 2], see [15]. That family of measures extends the spherically symmetric distributions treated in Section 4.3 that can be written as p-symmetric distributions with p = 2, as well as the multivariate extensions of stable distributions. Recall that for p ∈ (0, 2] a random vector X = (X 1 , . . . , X d ) T ∼ P ∈ P R d is said to have a p-symmetric distribution if for any u ∈ R d the random variable X, u has the same distribution as u p X 1 , [15,Theorem 7.1]. We shall also write u ∞ = max i=1,...,d |u i |. Observe that in this notation · 2 ≡ · .
For P ∈ P R d a p-symmetric distribution, the exact depth HD(x) was computed in [26, Example (C)] and [5, Theorem 3.1]. Let F denote the (univariate) distribution function of the marginal X 1 of X, and set to be the conjugate index to p. As demonstrated in [5, Lemma A.1 and Theorem 3.1], we can write The following result generalizes Theorem 6. Its proof is in Appendix A.8.
Furthermore, the rate of convergence from Theorem 2 holds true with C = R d and either where in both expressions t ∈ [0, π] and q is given by (12).
For p = 2 we can compare the obtained rate with that from Theorem 6 and see that here the bounds are somewhat weaker. That is not surprising, since Theorem 8 holds under much more general conditions. The unimodality assumption is satisfied for most commonly used p-symmetric distribution. For instance, it holds true for symmetric pstable distributions [  in two examples. In the first one we construct distributions that are atomic, yet their approximated depth fails to converge uniformly to its true counterpart. In the second example we demonstrate that if the density of P is unbounded, uniform rates of convergence of the halfspace depth approximation are not possible to be derived. The latter result should be compared with Theorems 4 and 5.
Example. If P does not admit a density, the halfspace functions ϕ x may contain discontinuities. In particular, this occurs when the depth is computed with respect to the empirical measure of a random sample of observations, i.e. for the sample halfspace depth. In that case, it can be inferred that the convergence of the approximations is not uniform, even on bounded sets C.
Consider the example of P atomic, supported in a set Without loss of generality, assume that the convex hull K ⊂ R d of these points is d-dimensional, i.e. that K is not fully contained in an affine subspace of R d of dimension lower than d. Let x be a point on a facet of K, and let x ε = x + ε v for ε > 0, where v ∈ S d−1 is the outward normal of the facet of K on which x lies. Since x ε / ∈ K by definition, HD(x ε ) = 0 for any ε > 0. For its approximation clearly HD n (x ε ) ≥ p 1 if and only if for some halfspace whose boundary passes through x ε with outer normal U i ∈ S d−1 intersects K. For ε small, this will occur with high probability if the directions U i are sampled uniformly on S d−1 -as ε → 0 from the right, the condition H(x ε , U i ) ∩ K = ∅ effectively reduces to U i = −v, an event of null probability. Thus, the convergence of the approximations cannot be uniform on the line segment x + ε v with ε ∈ (0, 1), and lim sup n→∞ sup ε∈(0,1) (HD n (x ε ) − HD(x ε )) ≥ p 1 > 0 a.s.
Example. Assume that P is supported in a bounded subset of R d and has a density, but that density is unbounded. By Theorem 4 the uniform convergence of the approximations holds. But, no uniformly valid rate can be derived. To see this, we construct a distribution with an arbitrarily slow convergence rate.
Let {x i } ∞ i=0 be distinct points on the upper half of the unit circle in R 2 , indexed in a clock-wise sense, and denote l i = x i − x i−1 /2 for i ≥ 1, and set l 0 = l 1 . Then, is a sequence of positive numbers converging to zero. For i ≥ 0, let B i denote a ball centred at x i with radius e −i l i , and let C i be the intersection of B i with the convex hull of the points {x i } ∞ i=0 . Each C i is a convex subset of the unit ball in R 2 , with x i ∈ C i , and the sets {C i } ∞ i=0 are pairwise disjoint. Let P be the mixture of uniform distributions on the sets C i with mixing proportions {p i } ∞ i=0 , where ∞ i=0 p i = 1 and p 0 > p 1 > · · · > 0. Consider the sequence of mid-points y j = (x j + x j−1 )/2, j = 1, 2, . . . . Because each y j lies on the boundary of the convex hull of the support of P , the depth HD(y j ) with respect to P is zero for all j. Though, obviously, the single minimizer u(y j ) of the halfspace function ϕ y j is the inwards normal of the facet corresponding to x j−1 and x j . If u − u(y j ) g ≥ ε, then by the construction of P for all j ≥ j(ε) = ⌈− log sin ε + 1⌉, the smallest integer larger than − log sin ε + 1, it can be seen that either C j , or C j−1 lies inside the halfspace whose normal is u and passes through y j . Thus, Using a result on maximal spacings in R from [8, Theorem 5.2] it can be seen that for U 1 , . . . , U n uniformly distributed on the circumference of the unit circle, it almost surely holds true that for all n large enough and any j we have min i=1,...,n u(y j ) − U i g ≥ a(n), for a fixed positive sequence {a(n)} ∞ n=1 that converges to zero. This means that almost surely for all n large enough, for j = ⌈− log sin a(n) + 1⌉ we have Because {p j } ∞ j=0 can be made to converge to zero slowly, this means that no universal rate of convergence can be found, if the density of P is allowed to be unbounded.

Extensions to generalized projection depths
Let us now focus on another example of a depth that satisfies the projection property, yet is difficult to compute exactly -the (generalized) projection depth. To define the depth, consider first the mappings m : P (R) → R and s : P (R) → [0, ∞], that satisfy the following conditions: • m(aX + b) = a m(X) + b for all a, b ∈ R and X ∼ P ∈ P (R); • s(aX + b) = a s(X) for all a > 0 and X ∼ P ∈ P (R).
Functionals m and s are called the location and the scale parameter of X, respectively. Using m and s for univariate distributions, it is possible to follow the ideas from [34,10], and define the outlyingness function in a multivariate space. For x ∈ R d and X ∼ P ∈ P R d , the (projection) outlyingness of x with respect to P is given by (14) O(x; P ) = sup Function O(·; P ) measures the largest deviation of a projection of a point from the location parameter of the corresponding projected distribution. The outlyingness function (14) is closely related to depth -high outlyingness indicates low centrality. Therefore, to construct a depth it suffices to transform the outlyingness index (14) by a non-increasing function c. A family of depth functions based on this idea is called the family of projection depths. It was studied in detail in [40,38], and [13].
Consider a continuous function c : [−∞, ∞] → [0, 1] such that c(x) = 1 for x < 0, and the restriction c to [0, ∞] is bijective, and strictly decreasing. The generalized projection depth of x ∈ R d with respect to X ∼ P ∈ P R d is defined by The family of depths (15) was proposed in [13] as a generalization of the class of projection depths from [38]. The original projection depths are obtained by considering a scale functional s invariant for reflections s(−X) = s(X) for all X ∼ P ∈ P (R) .
Just as the halfspace depth, also the projection depth is difficult to calculate exactly, especially in high dimensions [39,24]. In the same way as for the halfspace depth in (2), define the approximated projection depth of x ∈ R d with respect to P ∈ P R d based on the independent randomly sampled directions U 1 , . . . , U n ∈ S d−1 distributed uniformly on S d−1 by For a given set C ⊂ R d , we are interested in the uniform approximation of P D by P D n ∆ P n (C) = sup x∈C |P D n (x) − P D(x)| .
Note that for the projection depth, the following function plays a role similar to that of the halfspace function from (3) for the halfspace depth A distinctive role will be now played by the argument(s) of the maxima of the function ϕ P x (·) on S d−1 . Any representative of this set will be denoted by u P (x). The next theorem provides simple conditions that guarantee the almost sure convergence of ∆ P n (R d ) to zero. In the proof of that result, the minimal multiplicative modulus of continuity of the function c given by (16) ζ(τ ) = sup t>0 (c(τ t) − c(t)) , for τ ∈ (0, 1), plays a crucial role. Note that for a continuous function c the modulus ζ is a well-defined function continuous from the left at τ = 1. To see this, pick a sequence τ n → 1 from the left, and apply Dini's theorem [12,Theorem 2.4.10] to the sequence of continuous functions ψ n (t) = c (τ n t) defined on a compact set [0, ∞] (with ψ n (∞) = 0) that converge monotonically on t ∈ [0, ∞] to ψ 0 (t) = c(t) (again, ψ 0 (∞) = 0). Dini's theorem asserts that this convergence must be uniform on the domain of ψ n , which can be rewritten to ζ(τ n ) = sup t≥0 (ψ n (t) − ψ 0 (t)) = sup t≥0 (c(τ n t) − c(t)) − −− → n→∞ 0.
Theorem 9. Let X ∼ P ∈ P R d be such that the functions u → m u = m( u, X ), and u → s u = s( u, X ), are Lipschitz continuous on S d−1 with Lipschitz constants C m and C s , respectively. Let Theorem 9 stands as an analogue of Theorem 1 stated for the halfspace depth. Further minor modifications to the proof of Theorem 9, stated in Appendix B.1, allow to formulate analogous results for continuous (thus uniformly continuous) location and scale parameters, in the spirit of Theorem 2. We omit those technical details for brevity.
For the important case of spherically symmetric distributions, the rate from Theorem 9 can be improved. The proof of this statement can be found in Appendix B.2.
Because the projection depths are affine invariant, analogously as in Theorem 3 this result extends to elliptically symmetric distributions.

Approximation of projection outlyingness. Contrary to the projection depths (15), the outlyingness (14) is not uniformly approximated by its randomizations
even if the conditions of Theorems 9 and 10 are satisfied. We illustrate this in a simple example.
Example. Let X ∼ P ∈ P (R 2 ) have the uniform distribution on the unit circle. Because each projection u, X has the same distribution symmetric about the origin, m u = 0 and s u = S for all u ∈ S 1 and some constant S ≥ 0. For any reasonable scale functional s the constant S is positive. Consider a point x = (x 1 , 0) T ∈ R 2 with x 1 > 0. Then This function is maximized at u P (x) = (1, 0) T ∈ S d−1 with O(x; P ) = x 1 /S. For any u ∈ S 1 with u 1 = u, u P (x) < cos(ε), on the other hand, ϕ P almost surely, because for any degree of approximation of the target direction u P (x) there exists a point x = (x 1 , 0) T with a large norm so that its outlyingness is approximated poorly.

Approximation size
Uniform error bound

Discussion
In the present paper we discussed the uniformity aspects of the depth approximation task. We demonstrated that for a distribution P that is regular enough, the approximated depth does converge uniformly to its exact counterpart. This result justifies the simple approximation procedure from a theoretical point of view, as the almost sure uniform convergence of the depth approximations carries over to the approximated depth-statistics such as the depth median (i.e. an argument of the maximum of the depth on R d ), or the level sets of the depth.
In addition, we have presented and compared several almost sure upper bounds on the uniform discrepancies between the true halfspace and projection depth, and their approximated counterparts. Depending on the degree of regularity assumed about P , guidelines for the choice of the number of approximating directions n in order to achieve a desired precision can be devised from the theory presented. For the halfspace depth, in Table 1 we saw that the random approximation scheme is feasible in lower dimensions, especially for distributions whose densities are continuous and rather flat. In dimensions d > 5, hundreds of thousands of random directions may not be enough for sufficiently close approximations, and more elaborate algorithms appear to be needed.
In Figure 4 we saw that for particular distributions, the theoretical bounds match the simulated results already for intermediate n closely. Nevertheless, in practice, one does not observe P directly, but rather only an empirical measure P N of a random sample X 1 , . . . , X N of size N = 1, 2, . . . generated from P , and computes the depth with respect to P N as a surrogate for HD(x; P ) or P D(x; P ). For practical considerations, it would therefore be desirable to obtain also upper bounds on the difference between the true depth and its approximations with respect to the empirical measure P N . This can be done. For the halfspace depth, for instance, one may still use the theory presented, and devise a simple bound valid almost surely, for any fixed ε > 0 and all N and n large enough, that takes the form We used the fact that the sample halfspace depth process can be bounded by an empirical process given by the collection of closed halfspaces in R d . For the latter process, the bound follows from the law of the iterated logarithm devised in [20,Corollary 2.4], see also [36]. The third summand in the last expression above can be handled using the results provided in this paper. Thus, the approximation of the empirical halfspace depth can be, at least for the number of observations N large enough, approached by the approximation results for the true sampling distribution P , and an upper bound on the deviation of P from P N . An empirical argument supporting this finding is presented in Figure 5, where for d = 2 and the standard Gaussian distribution, the theoretical bound from Theorem 7 is compared with simulated trajectories of ∆ n,N = max i=1,...,100 HD n x i , P N − HD x i , P N for several choices of n ∈ [50, 1000] and four sample sizes N = 10 3 , 10 4 , 10 5 and 10 6 . Note that for sample size N, the quantity ∆ n,N is a multiple of 1/N, and therefore unless ∆ n,N = 0 (which did not happen in our simulation study), the simulated trajectory never decreases below 1/N (i.e. the tick 1e − 3 for N = 1 000 etc.). In addition, because the supremum ∆ n (R 2 ) is compared with a maximum of only 100 points, also for N large the simulated trajectories do not follow the theoretical bound as closely as in Figure 4. Nevertheless, the obtained rates of convergence do appear to couple, even though the true distribution is replaced by an empirical one. Therefore, the general guidelines for the choice of n are relevant also in the situation when the sample depth is approximated. For instance, if the theoretical bound in Table 1 is already too high for the practical application in mind, the simple approximation of the halfspace depth is certainly not a good idea, and more sophisticated methodologies must be employed. In any case, for the sample depth it must be kept in mind that according to the first example of Section 4.5, for empirical distributions the depth approximations are inherently non-uniform. where a d is the function defined in (5). Take ε > 0 fixed, and for all x ∈ C let u(x) ∈ S d−1 be such that ϕ x ( u(x)) − ε < HD(x).
By j(x) denote any index such that For the halfspace depth (1) and its approximation (2) this implies that by (6) where the last inequality holds for any ε > 0. Therefore, S n ≥ ∆ n (C)/L and we may conclude that lim sup n→∞ n a d (∆ n (C)/L) − log n log log n ≤ d a.s.

A.2.
Proof of Theorem 2. The proof follows analogously to that of Theorem 1. The only difference is that in (18) we use a bound defined by means of the modulus of continuity δ, instead of the Lipschitz property (6) of the class (4). Since U j(x) − u(x) g ≤ S n and because of (8), we obtain for any ε > 0, and the conclusion follows.
Remark. Note that the key step in the proofs of Theorems 1 and 2 is a conceptually simple application of a continuity argument to an asymptotic result on maximal spacings in S d−1 . Analogously, using the equicontinuity of the halfspace functions (or their analogues for the projection depth) only, any other asymptotic expression for the maximal spacings could be translated to an appropriate formula for the depth approximations using very similar proof techniques. We have opted for an asymptotic result of Janson [17,18] that takes the form of the law of the iterated logarithm for maximal spacings. This leads to upper bounds on the convergence rates that are quite strong in their formulation, yet may be rather conservative for some practical applications.
Remark. If the random sample of approximating directions U 1 , . . . , U n is taken from a non-uniform distribution on S d−1 , results analogous to those described here can be obtained using the recent study of Aaron et al. [1], which complements the theory of Janson [18].
A.3. Proof of Theorem 3. For the approximated depth of the affinely transformed measure we can write where on the right hand side we see the approximated depth of x with respect to the untransformed random vector X, with the random directions U i sampled from a possibly non-uniform distribution on S d−1 . As in (18) and (19) it follows that for any ε > 0 Thus, we need to bound the spherical maximal spacing that corresponds to a random sample from A T U/ A T U for U uniform on S d−1 . Let u, v ∈ S d−1 . We bound the distance between the transforms of u and v (20) where λ 1 and λ d are the largest and the smallest singular value, respectively, of the matrix A T and K = π 2 λ 1 λ d 1 + λ 1 λ d . The third inequality above is justified by [16, Theorem 3.1.2], formula inf u∈S d−1 A T u = λ d that follows from the same theorem, and the mean value theorem applied to function g(t) = 1/t. The first and the last inequality in (20) We can now bound the spherical maximal spacing S n of the non-uniform distributed directions Now it follows that 0 ≤ ∆ n (C) ≤ δ S n + ε ≤ δ (KS n ) + ε for any ε > 0, and the proof can be concluded as that of Theorem 2.
A.4. Proof of Theorem 4. With no loss of generality, assume that C ⊂ R d is compact with P (C) = 1. Define the function Because P satisfies (9) Thus, we can apply Theorem 2 with the modulus δ ϕ , and the approximated depth approaches the theoretical one uniformly in x.
A.5. Proof of Theorem 5. For any x ∈ C, u, v ∈ S d−1 we can bound where on the right hand side W (r, u, v) is a wedge of a ball in R d of radius r = diam(C) and center x (that therefore contains C), whose bounding hyperplanes have normals u, v ∈ S d−1 . Since the volume of this wedge is and Theorem 1 provides the result.
A.6. Proof of Theorem 6. Due to the translation invariance of HD, we can without loss of generality assume that µ = 0. We thus assume that P admits a density . For such P , it is known that the true depth takes the form All contours of the depth HD are therefore centred spheres. Consequently, for any x ∈ R d , x = 0, the single halfspace that realises the depth HD(x) at x is the one whose inner normal is x/ x . In other words, for each x ∈ R d For x = 0, ϕ x (u) = 1/2 for each u ∈ S d−1 , and for any approximation we obtain the exact depth, i.e. HD n (x) − HD(x) = 0 almost surely for all n. To assess the quality of the approximation at x = 0, it is necessary to control only the values ϕ x (u) − ϕ x ( u(x)) for u close to u(x) = −x/ x . This makes the problem easier than the general case described in Theorems 1 and 2 when no information about the position of u(x) is available.
Since X ∼ P is assumed to be spherically symmetric about 0, the random vector AX has the same distribution as X for any orthogonal matrix A ∈ R d×d [31]. Thus, any projection u, X for u ∈ S d−1 has the same univariate distribution. Denote the cumulative distribution function of this projected random vector by F p (t) = P ( u, X ≤ t) .
Since P was assumed to admit a density, F p is continuous. The halfspace functions ϕ x can be expressed in terms of F p (22) ϕ x (u) = P ( u, X ≤ u, x ) = F p ( u, x ) for all x ∈ R d and u ∈ S d−1 .
For the depth, we can write Writing again U j(x) for the closest element of the random sample U 1 , . . . , U n from u(x), now we can bound where we used the fact that U j(x) − u(x) g ≤ S n , further that F p is non-decreasing, and that the distribution of u, X is symmetric about zero, i.e. F p (t) = 1 − F p (−t). Take an arbitrary decreasing sequence {ε ν } ∞ ν=1 such that ε ν − −− → ν→∞ 0, and define a sequence of functions This sequence consists of continuous functions defined on a compact set [0, ∞]. As F p is continuous and non-decreasing, for all t ∈ [0, ∞] and ν = 1, 2, . . . Therefore, it is possible to use Dini's theorem [12,Theorem 2.4.10] to obtain that the sequence of functions (25) converges to F p , uniformly in [0, ∞]. Thus, the limit of the right hand side of (24) is zero and finally, using a result on maximal spacings of Janson [18] we may conclude that the uniform convergence of the approximated depth holds true. Let us now obtain the rate of convergence of the approximation if the density f is unimodal. First, we show that for d > 1 and t ≥ 0 the distribution function F p (t) must be concave. To see this, note that for the density of the random variable u, X with u = (0, . . . , 0, 1) ∈ S d−1 we have For f unimodal, function g in (21) must be non-increasing. For 0 ≤ t 1 ≤ t 2 this means that g ( (x 1 , . . . , x d−1 , t 1 ) ) ≥ g ( (x 1 , . . . , x d−1 , t 2 ) ) for any x 1 , . . . , x d−1 ∈ R, and that Since f p (t) is non-increasing for t ≥ 0, the distribution function F p (t) must be concave for t ≥ 0. Thus, using the obvious fact that F p (0) = 1/2, for S n ≤ π/2 and t ≥ 0 we have (26) 0 ≤ F p (t) − F p (cos(S n )t) ≤ F p (t) − (1 − cos(S n ))F p (0) − cos(S n )F p (t) = (1 − cos(S n )) F p (t) − (1 − cos(S n ))/2 ≤ (1 − cos(S n ))/2.
Combining the last formula with (24) gives This inequality holds true under the condition that for any u ∈ S d−1 there exists a random direction U i such that u − U i g ≤ S n . In other words, the inequality remains valid if the polar angle that corresponds to the maximal spacing given by the random sample of directions U 1 , . . . , U n does not exceed S n . This leads to an inequality of the same type as (19) from Theorem 2, and the conclusion follows by the same argument as that used in the proof of Theorem 2. The simpler bound follows by application of the inequality 1 − cos(t) ≤ t 2 /2 to (27). This way one obtains for S n the maximal spacing from the proofs of Theorems 1 and 2. Again, the same technique as that from the proof of Theorem 2 gives the final rate of convergence.
A.7. Proof of Theorem 7. Again, without loss of generality µ = 0. It is enough to show the first inequality -the second assertion then follows from Theorem 6. Consider the main theorem of Janson [18] once more. By the first part of that result, for S n the spacings defined in the proof of Theorem 1 we have (28) lim inf n→∞ n a d (S n ) − log n log log n = d − 2 a.s., which means that almost surely for n large there exists a vector v n ∈ S d−1 such that each sampled direction U i , i = 1, . . . , n, is far enough from v n , i.e. for any ε > 0 almost surely for all n large enough there is v n such that Denote the right hand side of the previous inequality by r n . By the definition of δ from (10) for any η > 0 there exists K n > 0 such that Consider the point x n = −K n v n . Then x n = K n , and v n is the minimizer of ϕ xn , i.e. v n = u(x n ). The latter follows from the proof of Theorem 6, where it was argued that for a spherically symmetric P , u(x n ) minimizes ϕ xn for all x n on the ray λ u(x n ) with λ < 0. Let u ∈ S d−1 be any direction with u − u(x n ) g ≥ r n . By formulas (22) and (23) from the proof of Theorem 6 and the monotonicity of F p we have We have that if there exists v n ∈ S d−1 that satisfies (29), then for any η > 0 it is possible to find x n such that for random directions U i the above inequality holds true with u replaced by U i for all i = 1, . . . , n. In other words, we have that in this situation By (28) and (29), however, we know that almost surely for all n large such v n exists. Therefore, it is possible to invert (30), which allows us to write that almost surely for any ε > 0 and η > 0 and the conclusion follows.
A.8. Proof of Theorem 8. The proof is led in a spirit similar to that of Theorem 6. Define For p ≤ 1 the minimizing direction u(x) ∈ S d−1 is not uniquely defined. Nevertheless, from (13) we see that for any choice of the minimal direction and the function r is well defined with The essential part of this proof is to show that the function r converges to 1 uniformly in x ∈ R d \ {0} when u ∈ S d−1 comes from a small neighbourhood of u(x). More precisely, we show that for any ε > 0 there exists t > 0 such that Note that for the special case p = 2 it is possible to proceed as in the proof of Theorem 6 and write (see formula (24)) For general p ∈ (0, 2] the derivation below is somewhat similar, yet more involved. From formula (32) it follows that for any x ∈ R d \{0} and u ∈ S d−1 with u − u(x) 2 ≤ t we can write and By the unimodality assumption, the distribution function F must be concave on [0, ∞). Thus, using derivation analogous to (26) we can write B.1. Proof of Theorem 9. Take M > λ m and denote B M = x ∈ R d : x < M . Consider x ∈ B M first. For such x we can write where the constants A, B > 0 are given by the last equality. From this it follows that (44) where δ c is the minimal modulus of continuity of c. Now suppose that x / ∈ B M . In what follows we bound, with u P (x) abbreviated to u for simplicity, the expression in terms of u − u g . First of all, note that |s u /s u | ≤ λ s,2 /λ s,1 . Secondly, by the definition of the maximizer u = u P (x), we have that We can therefore write The second summand in (45) can be bounded by Altogether, we can write This means that for u ∈ S d−1 such that u − u g ≤ ε, i.e. u − u ≤ 2 sin(ε/2) ≤ ε, we have or, because ϕ P x ( u) ≥ ϕ P x (u) for all u ∈ S d−1 by the definition of u, For any n ∈ 1, 2, . . . such that min i=1,...,n U i − u g ≤ ε for all x / ∈ B M the last bounds result in P D(x; P ) = c ϕ P x ( u) , P D n (x; P ) ≤ c(ϕ P x (u)) ≤ c ϕ P x ( u) 1 − This holds true for any x / ∈ B M . Consequently, if min i=1,...,n U i − u g ≤ ε, we can write Finally, the rate of convergence of the approximations of the projection depth can be obtained as a combination of (44) and (46) where ε is substituted by the maximal spacing S n introduced in the proof of Theorem 1. The final rates of convergence then follow by application of the same strategy as in the proof of Theorem 2, where the result of Janson [18] is employed to devise an almost sure upper bound on the rate of convergence of the depth approximation. Note that to obtain explicit, almost sure bounds such as those from Theorems 1 and 2 it is necessary to invert (an upper bound to) one of the functions on the right hand side of (47) considered as a function of ε.
B.2. Proof of Theorem 10. Due to the translation invariance of P D, the distribution P can be assumed to be spherically symmetric about the origin. Then, it can be seen that m u = 0 and s u = S for some constant S ≥ 0. The latter fact follows because for such spherically symmetric distributions, all projections of P onto lines passing through the origin have the same univariate distribution [31]. If S = 0, the assertion of the theorem is trivially satisfied, as in this case P D(x; P ) = P D n (x; P ) = 0 for all x ∈ R d . For S > 0 for any x ∈ R d P D(x; P ) = c sup u∈S d−1 x, u S = c ( x /S) , and the depth is realised in u P (x) = x/ x . Likewise, for function ϕ P x from the approximating projection depth it holds true that ϕ P x (u) = x, u S .
Similarly as in the proof of Theorem 9 let us bound the expression bounding (45). Here we can for x = 0 write Thus, in the same way as in (46) in the proof of Theorem 9 we obtain that under the maximal spacing condition min i=1,...,n U i − u g ≤ ε we can write ∆ P n R d ≤ c ϕ P x ( u P (x))τ − c(ϕ P x ( u P (x))) ≤ ζ(τ 1 ) ≤ ζ(τ 2 ), where τ 1 = 1 − 2 sin(ε/2), τ 2 = 1 − ε. For the final rates we apply the same technique as in the proof of Theorem 2; note that in the statement of the theorem only the second, simpler rate is stated.
The formula for ζ in (17) can be obtained by direct maximization in the expression for the multiplicative modulus of continuity.