Asymptotic Moments of Near Neighbor Distances for the Gaussian Distribution

We study the moments of the k-th nearest neighbor distance for independent identically distributed points in $\mathbb{R}^n$. In the earlier literature, the case with power higher than n has been analyzed by assuming a bounded support for the underlying density. The boundedness assumption is removed by assuming the multivariate Gaussian distribution. In this case, the nearest neighbor distances show very different behavior in comparison to earlier results.


Introduction
Consider a set of independent identically distributed (i.i.d.) random variables (X i ) M i=1 with a common density p(x) on ℜ n .We study the moments of the nearest neighbor distance in the limit M → ∞.The quantity (1) appears commonly in the literature on random geometric graphs, where directed and undirected nearest neighbor graphs are analyzed as special cases of more general frameworks [10,11,15].In this paper, the nearest neighbor distance serves as the quantity of interest with the hope that in the future the ideas can be represented in a more abstract form.
The expectation ( 1) is also of interest in its own right and tends to appear under various scientific contexts.A significant application is found in the nonparametric estimation of Rényi entropies, where asymptotic analysis provides theoretically sound estimators [5,7,6,9].Moreover, nearest neighbor distances and distributions play a major role in the understanding of nonparametric estimation in general [1,4,13].Finally, it should be mentioned that quantities related to (1) are encountered in physics, especially statistical mechanics and the theory of gases and liquids [3].
In the earlier literature, it has been shown that under general conditions (Γ denotes the Gamma function) in the limit M → ∞ if 0 < α < n [14,2].However, the case α > n is quite different and usually a boundedness condition must be imposed on the support of p(x).As the contribution of this paper, we analyze what happens if α > n, while p(x) is unbounded.To simplify matters, we examine only the multivariate Gaussian distribution p(x) = (2π) −n/2 e − 1 2 x 2 with the long term goal of extending the results to more general classes of densities.It turns out that the asymptotic behavior is very different to the case 0 < α < n.We show that if α > n, then in the limit M → ∞, where the definition of g depends on n, k and α (see Section 3).

Definitions
We start with some basic definitions.V n denotes the volume of the unit Euclidean ball in ℜ n and B( y, r) denotes the ball with center y and radius r.I(•) refers to the indicator function of a random event.For a vector x ∈ ℜ n , x ( j) denotes component j of that vector.The volume of a set A with respect to the Lebesgue measure is denoted by λ(A).If g(r) is a function defined on an open subset of ℜ, we denote the derivative of g by D g.
(X i ) M i=1 is taken as an i.i.d.sample with X i ∈ ℜ n .Each X i follows a common density p(x); our work concerns the Gaussian case p(x) = (2π The first nearest neighbor of X i is defined by (in the Euclidean norm, other norms are not considered in this paper) N [i, 1] = argmin 1≤ j≤M , j =i X j − X i and by recursion, the k-th nearest neighbor is The corresponding k-th nearest neighbor distance is d i,k = X N [i,k] − X i .The goal of the paper is to analyze in the limit M → ∞ with everything else fixed.Because the sample is independent identically distributed (i.i.d), we set i = 1.
Throughout the paper there will be constants, which depend on some variables, but not on the others.Such variables are denoted by c(. ..),where inside the parentheses we indicate the dependency.Strictly speaking, c is a function of some variables, but in the standard convention, it will be called a constant.During the course of our proofs, several different unknown constants will emerge.To keep them separate, lower indices (in the form c i ) are used.
General error terms, which can be bounded but not written in closed form, will be denoted by R (or R i with a lower index i).After the appearance of each such term, we write an equation of the form where c is a constant and f is a function of M or some other variables.Inside proofs, the Big-Oh notation will be invoked as another way to express unknown but negligible terms.

Main Results and Previous Work
The analysis of nearest neighbor distances can be viewed as part of the general framework of random geometric graphs.In this field, results are established for quantities of the form ξ(X 1 , (X i ) M i=1 ), where ξ has some locality properties.By imposing higher levels of abstraction, very general functions can be analyzed as long as locality arguments are available.We refer to [10,11,15] as a starting point to understand the issues arising in the field.However, abstract theories do not directly give exact information about the asymptotic behavior of the moments (3).The step towards concretizing the results concerning nearest neighbor graphs was taken in [14,12].The following has been proven: Theorem 1. Suppose that 0 < α < n and p(x) is a density with in the limit M → ∞.Γ(•) refers to the Gamma function.If α ≥ n, the limit holds if p(x) is bounded from below and above on a bounded convex set with p(x) = 0 when x / ∈ .
As a downside, Theorem 1 imposes the convexity requirement on if α > n.Furthermore, it does not provide a rate of convergence.These issues have been addressed by the concrete approach in [2], where it was shown that if inf x∈ p(x) > 0 and p(x) has a bounded gradient on , then under rather weak conditions on the space , we have for any ρ > 0 removing the convexity requirement.
As a common factor between the results, observe that in the case α > n, two requirements must be satisfied: 1.The set must be bounded.

inf x∈ p(x) > 0.
In this paper we ask, what happens when neither 1. nor 2. hold but α > n (the case α = n is not addressed).The early works in random geometry took the uniform distributions as a case of special interest.Analogously, we choose the Gaussian density (2) as our target of study.
It turns out that the behavior for α > n is very different to Theorem 1 for the Gaussian distribution.
As the main contribution of the paper, we prove the following.
Theorem 2. Suppose that p(x) is the multivariate Gaussian distribution (2) and α > n.Then where f −1 refers to the inverse function of e t y (1)  d y.
The main difference to Theorem 1 is that now . Theorem 2 can be further developed by analyzing the rate of convergence.In fact, the results suggest that even in the well studied case 0 < α < n the rates of convergence obtained for example in [2,8] do not hold in the unbounded case especially when α is close to n.The rather deep questions related to the rates of convergence are left as an important topic of future research.
Another open question is the extension to a general density p, which the author believes is possible.This could possibly unify the case with boundary effect [8] and the more general unbounded case.

Outline of the Proof
We will use the small ball probability ω x (r) = B(x,r) p( y)d y due to its useful distribution free properties.In fact, [13,2] shows that the distribution of the quantity ω X 1 (d 1,k ) does not depend on the density p and moreover, concentrates on values of order M −1 .Another useful fact is that conditioning on X 1 does not change the distribution of ω X 1 (d 1,k ).We approximate e −r x T y d y (5) assuming that e − 1 2 r 2 is close to 1.By a change of variables (rotation inside the last integral in ( 5)) we have ω x (r) ≈ p(x)r n B(0,1) e −r x y (1)  d y.
Now if we take f (t) = t n B(0,1) e −t y (1)  d y, then x n ω x (r) ≈ p(x) f ( x r) and we solve x .
f −1 refers to the inverse of f .By substituting d 1,k in place of r and X 1 in place of x , we get conditionally on X 1 The argument for f −1 looks rather complicated.However, because the conditional distribution of ω X 1 (d 1,k ) does not depend on the density p(x) or X 1 , it would be sufficient to somehow control the dependency on X 1 .Our strategy can be summarized as dividing ℜ n into the three regions S 1 , S 2 and S 3 together with decomposing The three sets depend on a variable 0 < ε < 1 and the number of samples M .We think ε > 0 as a parameter, which at the end of the analysis is set to approach zero after first taking the limit M → ∞.As a sidenote, it should be clear at this point that the parameters (n, k, α) are assumed to stay fixed all the time.
The motivation for S 1 might be seen in the idea of performing a Taylor expansion of f −1 (•) α at zero, which might render the analysis into the well-known case [2].Keeping in mind that ω X 1 (d 1,k ) is of order of magnitude M −1 , we take (the definition applies for any n ≥ 1) then for large M , X 1 = O( log M ) when X 1 ∈ S 1 and M to analyze the order of magnitude.If ε is small, then this shows that the argument of f −1 is small suggesting that a Taylor expansion might be possible.However, during the course of the proof, it turns out that points in S 1 contribute little in comparison to the set In this case, a Taylor expansion does not seem possible.Fortunately, we are able to show that conditionally on X 1 ∈ S 2 , the variable is approximately uniformly distributed on [ε, ε −1 ] and moreover, it is independent of ω This is useful, because for large M , X 1 ≈ 2 log M and we get Because the probability P(X 1 ∈ S 2 ) turns out to admit a convenient asymptotic expression, it is possible to use Equation ( 9) to estimate the quantity In addition to S 1 and S 2 , there is the set However, similarly as S 1 , nearest neighbor distances corresponding to X 1 ∈ S 3 turn out to have a negligible effect if ε is small.

Auxiliary Results
In this section, we give some results and applications for ω X 1 (d 1,k ), where The following result characterizes the distribution of ω X 1 (d 1,k ), which conveniently does not depend on X 1 or the density p(x). Moreover, Proof.Equation ( 11) can be derived from the the cumulative distribution function in Equation (4.35) of [2].Some algebraic manipulation is needed to simplify the first derivative of the sum of terms appearing in [2] in order to reach the simpler formula (11).
It is useful to observe that for any β > 0, to understand better the moments (12).The following lemma is useful for technical reasons.

Lemma 2. Assume that p(x) is the multivariate Gaussian distribution (2). Then for
Proof.By a slight modification to Equation ( 5), we have The moments do not get too large if X 1 stays close enough to the origin.The following lemma can be proven for example by observing that Next we show that the α-moments are at most of order (p(x)M ) −α/n if the quantity inside the parentheses does not get too small.The result is an application of Lemmas 1-2.Without losing generality, we prove the claim after some threshold M 0 , which is natural as in any case later the limit M → ∞ is taken.As a somewhat subtle detail, we will generally adopt this way of expressing our statements in those cases, where proving the claim for all M > 0 is not an obvious task.

Lemma 4. Suppose that p(x) is the multivariate Gaussian distribution (2) and fix any
, we find a threshold M 0 (n, k, α, δ) such that for all M > M 0 , we have for some constant c(n, k, α).
Proof.We decompose We consider next the first term in the right side.By Lemma 2, (for some constant c 1 (n)) and using this we have by Lemma 1 together with Equations ( 13) and ( 16), for some constant c 2 (n, k, α).We have proven the claim for the first term in (15).For the second term, we apply Hölder's inequality: ω x (r) is a strictly increasing function with respect to r and Equation ( 16) implies that ω x (1) ≥ c −1 1 p(x).Using this fact, integration by parts and the inequalities k ≤ M k and 1 − ω ≤ e −ω together with Lemma 1 and a change of variables, we have The second line in (19) can be easily calculated in closed form, but for our purposes it is convenient to use the upper bound to simplify the notation.Assuming M > 2k + 2, we have for some c 3 (n, k).By the assumption p(x) after some threshold M 0 (n, δ) and for all M > M 0 .By Lemma 3 we then have for some constant c 4 (n, k, α) (assuming trivially M > 1).Equations ( 20) and (21) together with (18) now imply The assumption p(x)M ≥ δ log n/2 M implies that Equation ( 22) approaches zero faster than (p(x)M ) −α/n in the limit M → ∞.
We formalize the argument in Section 4, which connects ω x (r) to the function f : Lemma 5. Suppose that p(x) is the multivariate Gaussian distribution (2).Then e t y (1)  d y and 0 ≤ R ≤ p(x)r 2 f ( x r).f is defined and continuous on [0, ∞) and it has the range [0, ∞).It is also strictly increasing implying the existence of an inverse function f Proof.The proof involves extracting the error term and bounding it.By rearranging terms and a change of variables (see also Equation ( 5)) The main task is to bound A. This is achieved by the mean-value theorem: for y ≤ 1 and r > 0, for some δ ∈ (0, ∞).This inequality implies that e r x T y d y e r x y (1)  d y = p(x)r 2 f ( x r).
In the last inequality, the vectors have been conveniently rotated.The same rotation shows that in (23), we have e r x T y d y = p(x) f ( x r).
For t > 0, we define The integral always exists because f −1 is a non-negative function.We show that g approaches zero at least as fast as t α/n and grows at most logarithmically if t → ∞.The same holds for f −1 (t) α : Lemma 6.The functions f (t) and g(t) are bounded by Proof. 1. Bounds on f −1 Consider t ∈ (0, 1).For any z > 2V −1/n n t 1/n , we have Then This means that f −1 (t) ≤ 2 log t +A+1.The outcome for f −1 (t) α follows by recalling that (a+ b) α ≤ 2 α (a α + b α ) for any a, b > 0.

The function g
Bounds for g can be established for example by using When t ∈ (0, 1) the proof can be established by examining the terms with 2 i t < 1 and 2 i t ≥ 1 separately, whereas for t > 1 a straightforward application of the logarithmic upper bound for f −1 gives the result.
6 Region S 1 Recall that region S 1 is defined by It may happen that S 1 is an empty set; from now on we always assume that M is large enough in comparison to ε −1 and n in order to ensure that S 1 is non-empty with a positive volume.Similar convention is adopted for the sets S 2 and S 3 .
As stated in Section 4, 0 < ε < 1 is a fixed constant until the end, where the limit ε → 0 is taken after the limit M → ∞.We define (assuming that α > n) [•] refers to the integer part of the number inside the bracket.As our proof strategy, S 1 is divided into smaller subsets, which are easier to control with the tools we have available this far: The remaining part is denoted by The following bounds the nearest neighbor distance when X 1 ∈ S1,i .Lemma 7. Assume that p(x) is the multivariate Gaussian distribution (2) and α > n.Then there exists a threshold M 0 (n, k, α, ε) > 0 such that for all M > M 0 and 0 for some constant c(n, k, α).
Proof.By Lemma 4, for some constant c 1 (n, k, α) and M 0 (n, k, α, ε).We should now compute the volume λ( S1,i ).The set S1,i consists of points x ∈ ℜ n with x in the interval [a i , b i ) and the volume of the set is λ( . By a Taylor expansion, in the limit M → ∞ with everything else fixed and for some constant c 3 (n, k, α, ε).By substitution into (28), we have where the bound holds for 0 ≤ i ≤ i * .
After removing the sets S1,i , we are left with S1,C .However, it does not pose problems.
Lemma 8. Assume that p(x) is the multivariate Gaussian distribution (2) and α > n.Then there exists a threshold M 0 (n, k, α, ε) such that for any M > M 0 , we have for some constant c(n, k, α).
Proof.By Lemma 4 and the definition of S1,C , for some constant c 1 (n, k, α).It is a simple task to show that for all x ∈ S1,C we have x ≤ 3 log M once M exceeds some threshold M 0 (n, k, α, ε).This implies that Substituting Equation (31) and the inequality 2 Lemmas 7 and 8 imply that for α > n and M > M 0 for some constant c(n, k, α).We conclude Lemma 9. Assume that p(x) is the multivariate Gaussian distribution (2) and α > n.Then there exists a threshold M 0 (n, k, α, ε) such that for any M > M 0 , we have for some constant c(n, k, α).

Region S 2
Region 2 is defined by Again, M is assumed to be large enough to ensure that S 2 has a positive volume.It is necessary to obtain an approximation to P(X 1 ∈ S 2 ).This can be done rather straightforwardly: Lemma 10.Assuming that p(x) is the multivariate Gaussian distribution (2), it holds that where for some constant c(n, ε), By some algebraic manipulation, with During the proof it is easiest to employ the Big-Oh notation.Such error terms depend here on n and ε.

The term I 1
By a Taylor expansion (see also Equation ( 29)), it can be shown that and for any β > 0, Using ( 36) and (37) with β = 1, we have The remainder terms depend on n, ε and M .Using Equations ( 38) and (37) with β = n − 2 in the expression for I 1 yields

The term I 2
By the mean value theorem, for some constant c 1 (n, ε) we have for some c 2 (n, ε).We have

The term I 3
Now by e Moreover, by the expansion for b − a appearing in Equation (36), for some constant c 3 (n, ε) Finally, for some constant c 4 (n, ε).Substituting (40)-( 41) into (39) yields The proof is finished since the terms I 1 ,I 2 and I 3 have been addressed.
In general, to establish asymptotics, it is useful to truncate d 1,k to avoid too large values.To this end, we choose some L > 0 (recall that at this point, α, n, k and ε stay fixed) and define the indicator ).
The power for log M is chosen to ensure the correct order of magnitude with large L rendering the event 1 − I L negligible.The following lemma verifies this fact; the bound is designed to hold after some threshold M 0 , which depends on L itself.However, after the threshold we get an upper bound which goes exponentially to zero with respect to L.
Lemma 11.Suppose that p(x) is the multivariate Gaussian distribution (2).Then for any L > 0, there exists a threshold M 0 (n, k, α, ε, L) and a positive constant c(n, k, α, ε) such that for all M > M 0 , it holds that Proof.The proof employs Hölder's inequality: By Lemma 4 and the definition of S 2 , there exists M 0 (n, k, α, ε) such that for some constant c 1 (n, k, α) and all M > M 0 .We want to bound in order to finish the proof.By Lemma 2, we have for 0 < r < 1 and x ∈ S 2 , for some constant c 2 (n).Then because ω x (r) is strictly increasing with respect to r, using Lemma 1 we have when c 2 L n M −1 < 1 (which can be imposed by taking a sufficiently large threshold M 0 ).We use The last integral can be bounded by integration by parts as in (19): assuming without losing generality that c 2 L n ≥ 1.We conclude that In light of ( 42), ( 43) and ( 45) we have arrived to the conclusion The term L nk/2 can be dropped in the final conclusion, as it is negligible compared to the exponential decay with respect to L.
The variable Y emerged in Equation (8).It was defined by A major idea behind our proofs is the asymptotic uniformity of Y as shown by Lemma 12. Suppose that (2) holds.Let h( y) be a measurable function in the limit M → ∞.
Proof.The function is strictly decreasing on y ∈ [a, b] with a and b defined in Equations ( 34) and ( 35).It has the inverse with the first derivative denoted by Ds −1 .Conditionally on X 1 ∈ S 2 , the variable X 1 has the density and Y has the density (on Because y ∈ [ε, ε −1 ], we have in the limit M → ∞ with everything else fixed, and By Equations ( 48)-( 50) and Lemma 10 we have This approximation implies that Next we will find out the asymptotic behavior of , which together with the approximation for P(X 1 ∈ S 2 ) takes care of region S 2 .The key to the analysis is Lemma 12.The following represents the nearest neighbor distance in terms of the small ball probability and the variable Y .We invoke the event I L to bound d 1,k ; L stays fixed in this considerations the idea being the limit L → ∞ after taking the limit M → ∞.

Lemma 13. Assume that p(x) is the multivariate Gaussian distribution (2) and
where Y is defined in Equation ( 46) and Proof.We first collect a few useful facts.If x ∈ S 2 , then by Lemma 5 or equivalently for some constant c 1 (n, ε).The indicator function I L ensures that we only need to consider Then by (54) By a Taylor expansion, for any real number β ∈ ℜ and x ∈ S 2 , for some constant c 2 (n, ε, β).Moreover, f is an increasing continuous function allowing a bound on R 1 : Having made the preliminary observations, we are ready for the first step towards completing of the proof.We have for x ∈ S 2 by Equation ( 53) r and multiplied by I L ).The challenging part is to modify the argument for f −1 .We first tackle the easier task of replacing X 1 α with a function of M .To this end, we observe that with By Lemma 5 and Equations ( 52), ( 55) and (57) we find a constant c 4 (n, ε, L) such that for x ∈ S 2 and 0 < r < Lε −1/n log −1/2 M .Using the previous inequality and the fact that f −1 is an increasing function together with Equation (56) allows us to bound We move to the argument for f −1 .Again, it would be useful to get rid of the norm x n .This is achieved by modifying the argument appearing in (59) (due to conditioning, we may use x instead of X 1 in the expressions): where by Equation (56) (to bound ω x (d 1,k ), we use Equations ( 60) and ( 54)) for some constant c 5 (n, ε, L).
In summary, this far we have shown that where ( 58), ( 61) and (62) bound the three correction terms.
While the correction terms R 2 and R 4 are small, they appear inside the argument for f −1 .The best we can say about their effect is assuming without losing generality that |R 2 + R 4 | ≤ c 4 .So, we need to bound the derivative of the function f −1 (t) α on bounded intervals.We observe that e t y (1)  d y + t n B(0,1) y (1) e t y (1) because y (1) e t y (1)  d y = t 0 B(0,1) ( y (1) ) 2 e s y (1)  d y ds ≥ 0.
Using (66) in (65) together with the fact that f Using the upper bound in (64) shows that for x ∈ S 2 , The proof is finished by recalling the earlier observation (63).
In Lemma 13, we find the term Y , which has the asymptotic uniformity property as proven in Lemma 12. Connecting the two results mainly involves removing the truncation I L , but takes some technical effort.The function g was defined in Equation (25).
Lemma 14. Assume that p(x) is the multivariate Gaussian distribution (2) and α > n.Then in the limit M → ∞ Proof.By Lemma 13, we know that Using Equation (47) and Lemma 1 (recall that Y depends only on X 1 ), with |R 1 | ≤ c 1 M k−1 for some constant c 1 (k).Also, because x behaves asympotically as 2 log M and p(x) > ε log n/2 M M on S 2 , Equation (60) shows that for some constant c 2 (n, ε, L).This implies that for ω < ω x (Lε −1/n log −1/2 M ), with for some constant c 3 (n, k, α, ε, L).By Equations ( 67)-( 69) together with the fact that f −1 is an increasing function, Observe that the bounds for R 3 and R 4 hold for any x ∈ S 2 .By a change of variables, We would like to show that lim L→∞ lim sup To see that this is true, we observe that by Lemma 6, for some constant c 4 (n, k, α, ε) there is the bound with the upper bound integrable on [0, ∞) and independent of x ∈ S 2 .Moreover, by Equation ( 44) In summary, we have shown that and similarly with lim inf instead of lim sup.The last limit exists by Lemma 12, which shows that in the limit M → ∞.On the other hand, Lemma 11 shows that lim Now we are able to put everything together to conclude region S 2 : Lemma 15.Assume that p(x) is the multivariate Gaussian distribution (2) and α > n.Then Proof.The claim follows from Lemmas 10 and 14: in the limit M → ∞.To finish the proof, we would like the replace the integration limits by 0 and ∞ when ε → 0, which amounts to showing that g( y −1 ) is an integrable function.Integrability can be established using Lemma 6 to show that 8 Region S 3 S 3 consists of points where the density p takes small values: To bound nearest neighbor distances on S 3 we need similar tools as for S 2 , but only upper bounds are needed providing some more flexibility.The sets S3,i are defined analogously to (27): Moreover, S3,C = S 3 \ ∪ i * i=0 S3,i .Then we have Lemma 16.Assume that p(x) is the multivariate Gaussian distribution (2).Then for some threshold M 0 (n, ε) we have for M > M 0 and 0 Proof.The set S3,i consists of points x ∈ ℜ n with x ∈ [a, b] and Using the mean value theorem for a and b we have for 0 after some threshold M 0 (n, ε).Also, we may take b ≤ 3 log M for 0 ≤ i ≤ i * as the term 2 log M inside the square root (71) grows faster than the other terms.Then Assessing the contributions from S3,i is convenient by using the function f together with the small ball probability.The proof idea is essentially similar to that used for S 2 in Section 7, but because we need only an upper bound the proof is easier.
Proof.We decompose where the function f was defined in Lemma 5.This implies that By taking M 0 large enough, we may ensure that log M ≤ x ≤ 3 log M (76) on x ∈ S3,i for 0 ≤ i ≤ i * .Then by Lemma 6 and Equations ( 75)-( 76), for some constant c 1 (n, α).Using log(1 + z) ≤ z for z ≥ 0, we have recalling that 0 < ε < 1.The α-moment of the conditional expectation of the last expression is bounded by c 2 (log α ε −1 + i α + 1) for some constant c 2 (n, k, α) by Lemma 1 and Equation (13).

The term I 2
By Hölder's inequality, Lemma 3 and Equation (76), where c 1 (n) is some constant, and to be exact, c 2 = 2(2π) −n/2 c 1 .The factor 2 in c 2 comes from the fact that log α/2 M > 1 for M > 3 (which can be assumed without losing generality).Now it is rather obvious that the sum does not pose problems.
Lemma 18 finalizes the proof.

Proof of Theorem 2
Previously we have examined the regions S 1 , S 2 and S 3 , which were defined in terms of ε and M .We decompose

∞ 0 g 1 y d y ≤ c 1 0( 1 + 1 y
log α y −1 )d y + c ∞ −α/n d y for some constant c(n, k, α).Both terms in the right side are finite (the second one because α > n).