On density estimation at a fixed point under local differential privacy

We consider non-parametric density estimation in the framework of local, both pure and approximate, differential privacy. In contrast to centralized privacy scenarios with a trusted curator, in the local setup anonymization must be guaranteed already on the individual data owners’ side and must therefore precede any data mining tasks. Thus, the published anonymized data should be compatible with as many statistical procedures as possible. We consider different mechanisms to establish pure and approximate differential privacy, respectively. We obtain minimax type results over Sobolev classes indexed by a smoothness parameter s > 1/2 for the mean squared error at a fixed point. In particular, we show that appropriately defined kernel density estimators can attain the optimal rate of convergence if the bandwidth parameter is correctly specified. Notably, the optimal convergence rate in terms of the sample size n is n−(2s−1)/(2s+1) under pure differential privacy and thus deteriorated to the rate n−(2s−1)/(2s) which holds both without privacy restrictions and under approximate differential privacy. Since the optimal choice of the bandwidth parameter depends on the smoothness s and is thus not accessible in practise, adaptive methods for bandwidth selection are necessary and must, in the local privacy framework, be performed based on the anonymized data only. We address this problem by means of variants of Lepski’s method tailored to the privacy setups at hand and obtain general oracle inequalities for private kernel density estimators. In the Sobolev case, the resulting adaptive estimators attain the optimal rates of convergence at least up to logarithmic factors. On the side, we discuss some critical issues related with the notion of approximate differential privacy. MSC2020 subject classifications: Primary 62G05; secondary 68P25.


Introduction
In the modern information era data are routinely collected in all areas of private and public life. Although the availability of massive data sets is essential to answer important scientific and societal questions, the individual data owners (who may be individuals, households, research institutions, companies, . . . ) might refuse to share their, maybe sensitive, raw data with others. Even more, in view of regularly reported data leaks, they may not even want to entrust their data to a central curator who stores the data and publishes anonymized summary statistics. Finding ourselves in such a dilemma, the question whether and, if yes, how data analytics can still be performed is of special importance. For the evaluation of this question, several aspects have to be taken into account.
Firstly, in absence of a trusted curator, privacy of the data has to be achieved already locally at the individual data owners' level. The i-th data holder takes its datum, say X i , as the input of a privacy mechanism and creates an output Z i that is considered sufficiently anonymized, for instance, in the sense of any of the privacy definitions listed below. For the purpose of the present paper, a privacy mechanism is a Markov kernel Q i between measurable spaces (X, X ) and (Z, Z ) generating Z i given X i = x according to the distribution Q Zi|Xi=x . This definition of local privacy is in contrast to the framework of centralized or global privacy where the trusted curator can take the whole data set {X 1 , . . . , X n } to create an output Z. In this sense, the local privacy model can be seen as a proper submodel of the global one because the trusted curator can also mimic any conceivable procedure in the local model.
Secondly, for the quantification of privacy, different solutions have been proposed so far (see [1], Section 2 for a comprehensive overview of existing privacy definitions): • In this paper, we will exclusively work in the framework of α-differential privacy and its generalization (α, β)-differential privacy as defined in Definition 2.1 below. These two privacy definitions are also referred to as pure and approximate differential privacy, respectively. Originally, these concepts have been suggested for the anonymization of microdata tables in a global privacy setup, more precisely in a framework where queries are answered by a server that has direct access to the sensitive data [12,14,13]. In the statistics community, working under privacy constraints has been popularized in the past decade, amongst others, through the articles [25,17] (in the global setup) and [11] (in the local privacy setup). Another strict relaxation of pure differential privacy is random differential privacy as introduced in [16]. • An alternative quantification of privacy can be given as follows: Let ϕ be a function from [0, ∞] to R ∪ {+∞} with ϕ(1) = 0. Then, the associated ϕ-divergence between two distributions P, Q is where μ is a measure such that P, Q μ and p, q denote the corresponding Radon-Nikodym densities. Then, the mechanism Q is called β-ϕ-divergence private if sup x,x ∈X D ϕ (Q(·|X = x)||Q(·|X = x )) β.
The intersection of these two concepts is non-empty: For instance, taking ϕ(x) = |x − 1|/2, the ϕ-divergence D ϕ (P||Q) is the total variation distance, and the resulting β-ϕ-divergence is equivalent to (0, β)-differential privacy. Thirdly, the published data Z 1 , . . . , Z n should ideally be multi-purpose in the sense that they can serve as input data for several types of analyses. Thus, when the unmasked data are for instance a sample from an unknown probability distribution, the anonymized data should contain as much information as possible about the whole distribution and not only about certain of its characteristics. One main motivation for this work is to introduce novel methodology in the framework of density estimation that aims to address also this issue by proposing a local approximate (β > 0) differential private mechanism whose output can be used for various types of analyses.

Roadmap of the article
Throughout the article, we consider the paradigmatic example of non-parametric density estimation. For the sake of simplicity, we assume that each of n data holders D i observes a size-one sample X i from a (in this paper) univariate target density f , but refuses to share this observation. In Section 2, we introduce several mechanisms to estrange the datum X i . The first approach is based on adding appropriately scaled Laplace noise to a kernel density estimator at a single fixed point t ∈ R. The idea of the second approach is to publish the original datum X i with a certain probability p and publishing another random value with probability 1 − p (thus, the distribution of the Z i is the discrete mixture of the target density and another distribution).
In Section 3, we consider estimation of the unknown density function under approximate differential privacy from a minimax point of view. As the performance measure to evaluate arbitrary estimators, we consider the mean squared error at the fixed point t ∈ R. Via the Laplace perturbation approach, we attain the convergence rate n −(2s−1)/(2s+1) in terms of n over Sobolev ellipsoids with smoothness index s under (α, 0)-differential privacy which is slower than the optimal rate n −(2s−1)/(2s) in the setup without privacy constraints. However, this slower rate can be shown to be optimal for the case of pure differential privacy when β = 0. In turn, the standard rate n −(2s−1)/(2s) from the setup without privacy can be attained under (α, β)-differential for β > 0 by means of the second, mixture approach to obtain privacy combined with suitable kernel density estimators. As a consequence, the second approach attains the optimal rate of convergence and furthermore does not hinge on a priori knowledge of the point t that has to be chosen prior to the anonymization procedure. Hence, this approach enables the statistician to apply a wider spectrum of inference procedures. Investigating theoretical guarantees of such general procedures, however, is outside the scope of this work and deferred to future research.
As usual for kernel density estimators, the choice of the bandwidth parameter is crucial. In the considered minimax framework over Sobolev classes, the optimal order of the bandwidth that leads to a rate optimal estimator depends on the smoothness index s which is typically unknown. In Section 4, we apply a Lepski scheme tailored to the privacy framework to overcome this problem and obtain an adaptive choice of the bandwidth. In the case β > 0, there is no additional problem since a standard approach via Lepski's method can be applied with the data Z 1 , . . . , Z n and one can conclude as in the case for known smoothness. However, in the case β = 0, the considered privacy mechanism depends already on the choice of the bandwidth that one actually wants to choose in an adaptive way. In order to perform the Lepski scheme here, any data owner has to publish the kernel density estimator not only for one single bandwidth but for a finite set of potential bandwidths. Such a multiple output still guarantees the desired privacy condition provided that the additive noise is multiplied with a factor proportional to the number of potential bandwidths which is logarithmic in the number of data sources in our case. Note that this issue specifically arises in the local privacy setup since in the global framework the trusted curator can apply the existing plethora of methods for bandwidth selection on the unmasked data, and then only publish the resulting estimator with the adaptively determined bandwidth in its anonymized form. We derive general oracle type inequalities for the estimator resulting from the Lepski procedure adapted to the privacy framework. For the specific example of Sobolev ellipsoids, the rates of convergence are merely deteriorated by logarithmic factors with respect to the case of a priori known smoothness.

Definition of approximate differential privacy
Let (X, X ) and (Z, Z ) be measurable spaces. A privacy mechanism is a Markov kernel Q : X × Z → [0, 1] with the interpretation that, given original data X = x, an anonymized output is randomly drawn from the probability measure Q(·|X = x). In the non-interactive setup that we are going to consider, we work under the following definition of approximate or (α, β)-differential privacy.
holds true.
Let us emphasize that in Definition 2.1 the spaces (X, X ) and (Z, Z ) do not necessarily need to coincide. In the literature, the case β = 0 is also referred to as α-differential privacy or pure differential privacy. Evidently, the privacy condition (1) becomes more restrictive for smaller values of the two parameters α and β. Although Definition 2.1 smoothly bridges the cases β = 0 and β > 0, the classical anonymization techniques used for β = 0 and β > 0 are essentially different: In the case β = 0, Laplace perturbation as well as randomization techniques as considered in [11,21] can be used. In the case β > 0, adding appropriately scaled Gaussian noise has been suggested in [17]. However, as proved in [18], appropriately scaled Laplace noise can also lead to approximately differential private outputs (see Propositions 2.2 and 3.1 as well as Remark 3.2). In the sequel, we discuss how to achieve approximate differential privacy in the scenario of non-parametric density estimation at a fixed point.

Pure differential privacy by adding Laplace noise
Throughout, we exclusively consider the case that both the input and the output of the privacy mechanism are univariate and real-valued, that is (X, X ) = (Z, Z ) = (R, B(R)). First, we consider so-called Laplace perturbation which is also used to derive an upper bound in Section 3. To introduce this mechanism, let Y i = g(X i ) ∈ R a quantity derived from the X i that should be masked. Define the sensitivity of g as Recall that the univariate Laplace distribution, denoted by L(b), is given by the probability density function p b (x) = 1 2b exp(−|x|/b) (we include also the case b = 0; then the Laplace distribution is, by convention, the Dirac measure concentrated at 0). In particular, the variance of an L(b) distributed random variable is 2b 2 . The following proposition establishes approximate differential privacy by Laplace perturbation.
) provides an (α, β)-differential private view of g(X) (and of X as well).
A benefit of Proposition 2.2 in contrast to the often proposed perturbation by Gaussian noise to establish approximate differential privacy is that it allows to deal with the cases β = 0 and β > 0 by the same approach. Moreover, letting the parameter β vary permits natural interpretations: If β = 0, the variance of √ 2bξ corresponds to the one that is usually encountered in the case of pure differential privacy. When β tends to one, the privacy constraint gets weaker and the variance of the centred noise √ 2bξ tends to 0. In the extreme case β = 1 it is even allowed to publish g(X) directly. Let us however note that, with the exception of the extreme cases when β = 1, we are not able to obtain optimal convergence rates for β ∈ (0, 1) via this approach following the calculations in the proof of Proposition 3.1 below. This is why we consider different strategies for the anonymization in the case β ∈ (0, 1) in Subsection 2.3.
We now introduce kernel density estimators giving the main example that we have in mind for the function g in Proposition 2.2.

M. Kroll
Example 2.3. Let X 1 , . . . , X n i.i.d. according to an unknown probability density function f : R → R. Let t ∈ R be fixed. Then the i-th dataholder, who observes X i ∈ R, can compute for a bounded kernel function K, that is, K : R → R is integrable and K(u)du = 1. The quantity K h (X i − t) will play the role of g(X) in Proposition 2.2. By the triangle inequality Δ(K h (· − t)) 2 K ∞ /h, and one can take . Note that t ∈ R has to be fixed in advance before the anonymization procedure.

Approximate differential privacy by random replacement
Although Proposition 2.2 provides us with a mechanism in order to achieve (α, β)-differential privacy, we will see later on in Proposition 3.1 that this mechanism leads to a convergence rate proportional to n −(2s−1)/(2s+1) which is not optimal for β ∈ (0, 1). In order to resolve this defect, we suggest another (quite simple) mechanism by which even the standard optimal rate n −(2s−1)/(2s) is attainable. For this mechanism, the data holders have to agree on an arbitrary but fixed (α, 0)-differentially private mechanism which we denote with Q. Then, any of the n data holders D i draws (independently of the others) a random number U i ∈ [0, 1] according to the uniform distribution on the unit interval, and a random outcome Y i ∼ Q(·|X = X i ). We have the following result.
Proposition 2.4. The mechanism where the i-th data holder publishes Z i given through guarantees (α, β)-differential privacy.
For β ∈ [0, 1], the mechanism in Proposition 2.4 publishes the original datum with probability β, and the (α, 0)-private datum Y i with probability 1 − β. In the special case of (0, β)-differential privacy, the mechanism Q(·|X = x) must indeed be independent of x (this choice of Q is then evidently also admissible for (α, β)-differential privacy with α > 0). Of course, for α = 0 and β = 0, the mechanism guarantees even perfect privacy but is completely useless for further analyses. For β > 0, the mixture structure of the density of Z i allows to estimate the density f with the usual rate of convergence that holds without privacy restrictions (see Proposition 3.3). A delicate aspect of the conception of (α, β)-differential privacy for β > 0 becomes evident via the admissible privacy procedure (2): it does not exclude algorithms that publish the original datum X i with a strictly positive probability which will surely not be acceptable in certain applications. Even worse, even procedures where an observer does not only observe the original datum but also knows that this is the case are not excluded by the very concept of approximate differential privacy. This important issue will be further discussed below.

A composition lemma for approximate differential privacy
For kernel density estimation, bandwidth selection is usually a delicate issue and so it is in our local privacy setup. Whereas in the centralized setup existing methods can be applied by the trusted curator on the unmasked data, this is not possible in our local setup when working with the Laplace mechanism from Example 2.3. Thus the data holders have to publish versions of the kernel density estimator for different bandwidths, and one has to adapt general strategies from the non-private framework to the one with approximate local differential private data. In order to do this under our privacy constraint it is necessary to understand how multiple outputs influence the defining condition of approximate differential privacy. The following lemma provides a result of this flavour and is known in the research literature on privacy for statistical databases. The setup is the following: Given the unmasked datum X, the data owner does not only want to publish Z 1 = Z 1 (X) but also Z 2 = Z 2 (X), i.e., the vector (Z 1 , Z 2 ). The following result tells us how α and β for the single components have to be scaled in order to obtain (α, β)-differential privacy for multiple outputs.
Of course, Lemma 2.5 can be successively applied. For instance, if we want to publish Z i,h from Example 2.3 for different h in a finite set H, then α and β should be replaced with α = α/#H and β = β/#H, respectively, in order to get differential privacy for Z = (Z i,h ) h∈H .

Private minimax estimation
Minimax theory provides a standard framework to study convergence properties of estimators in non-parametric statistics [24]. In this section, we apply this general toolbox to the specific case of density estimation under privacy constraints. For fixed t ∈ R and any estimator of the linear functional f (t) based on the private views Z = {Z 1 , . . . , Z n }, we study its mean squared error The guiding principle of minimax theory is to look for estimators that perform best in a worst-case scenario. However, due to the privacy framework, we have not only the freedom of choosing the estimator but also the privacy mechanism Q that generates the private outputs. Hence, following [11], classical minimax theory has to be adapted and a natural quantity to consider is the private minimax risk inf where P is some function class containing probability densities and the infimum is taken over all local (α, β)-differential private Markov kernels Q ∈ Q α,β and all estimators based on the local approximate differential private views Z of the corresponding original sample X 1 , . . . , X n . We specify the function class P by so called Sobolev ellipsoids S(s, L) that we define for s > 1/2 and L > 0 by means of In the first definition, F[f ] denotes the Fourier transform of the density f , in the second one f (s) denotes the weak s-th derivative of f .

Upper bound for Laplace perturbation
We first derive an upper bound on the minimax risk by specializing both the privacy mechanism Q ∈ Q α,β and the estimator of f (t). Concerning the privacy mechanism, we first consider the mechanism mapping X i to private views Z i,h of K h (X i −t) from Example 2.3 for one single h > 0. More precisely, we consider the Laplace mechanism given through Given Z 1,h , . . . , Z n,h as in (3), a natural estimator of f (t) is given by The following proposition provides a uniform upper risk bound for this estimators over the Sobolev ellipsoids S(s, L) introduced above.
The proof of the results exploits the special choice of the kernel function as the so-called sinc-kernel defined via Proposition 3.1. Consider the kernel density estimator f h (t) for some fixed t ∈ R where the kernel used in the anonymization procedure (3) is the sinc-kernel given in (5). Then, for any s > 1/2, Since the noise added by the privacy mechanisms is centred, the bias term in the proof of Proposition 3.1 remains unchanged in comparison to the standard setup without privacy constraints. However, the variance term changes due to the additional Laplace noise, and the classical variance term 1/(nh) is augmented by the additional term 1/(nh 2 ) which is of higher order for h → 0. Consequently, the optimal bandwidth is no longer of order n −1/(2s) as in the standard setup but of the larger order n −1/(2s+1) . However, consistency of f h is already guaranteed if h → 0 and nh 2 → ∞ simultaneously (in the standard density estimation setup one only needs nh → ∞ in addition to h → 0). Remark 3.2. Proposition 3.5 below shows that the rate obtained in Proposition 3.1 is optimal for β = 0. However, in the following we will show that the upper bound on the rate of convergence in Proposition 3.1 is distinct from the optimal rate of convergence for β > 0. In this latter regime, one can even achieve the optimal rate of convergence from the non-privacy setup by appropriately chosen privacy mechanisms.

Upper bound for random replacement mechanisms
For our first specialization of the general approach in (2) we assume that the support of the X i is bounded, say X i ∈ [0, 1] without loss of generality. Then, Proposition 2.2 shows that with ξ i ∼ L(1) is a α-differential private view of X i , and can be used as a building block in the definition (2) of an (α, β)-differential private algorithm. Let us denote with g the density of the random variable 2ξ. Then, the density In terms of the Fourier transform this yields which motivates to consider the kernel K n defined via the Fourier transform We have the following result.
If X i is not assumed to be bounded, the above Laplace mechanism is not applicable to obtain differential privacy. In this case, the only obvious mechanism to obtain (0, β)-differential privacy is to draw Y i according to some arbitrary density g (on which the individual data holders have to agree) and then to publish In this case, the density ϕ of Z is a mixture with components f and g, namely Then, the statistician can estimate the density ϕ via a usual kernel density estimator, say ϕ h , and then estimate f via This leads to the risk decomposition If the density g is at least as smooth as the unknown density f , then ϕ inherits the smoothness s from f , and we obtain the following result (the easy proof of which is left to the reader). (11) where we use the sinc-kernel in the definition of ϕ h and assume that the density g in (10) and (11) belongs to S(s, R) for some large enough R. Then, for all s > 1 2 ,

Lower bound
The following result states a lower bound over Sobolev ellipsoids in the case of pure differential privacy (β = 0).
where C(α) > 0 depends on the privacy parameter, and the infimum is taken over all estimators based on private views Z 1 , . . . , Z n and privacy mechanisms providing (α, 0)-differential privacy.
Remark 3.6. The lower bound of Proposition 3.5 still holds true when one allows a slight amount of interaction between the data holders, namely when the distribution of every Z i is determined by X i and the previously masked values Z 1 , . . . , Z i−1 . The proof remains the same because the data processing inequality (14) from [11] still holds true in this more general setup. Proposition 3.5 shows that, regarding the privacy parameter α as an a priori fixed constant, the estimators f h (t) from Proposition 3.1 attain the optimal rate n −(2s−1)/(2s+1) in terms of n under pure local differential privacy.
Recall that without privacy restrictions the optimal rate over Sobolev ellipsoids is n −(2s−1)/(2s) (as mentioned in [3], this rate can, other than by a reduction scheme as used in our proof, be easily obtained via the theory developed in [10], see also [22]).
Of course, lower bounds on the rate of convergence in the scenario without privacy still hold true in the setup of differential privacy (since the privacy restriction can be interpreted as restricting the set of admissible estimators). Thus, for approximate differential privacy the rates of convergence derived in Propositions 3.3 and 3.4 are optimal.
In this work, we consider the parameter α (and also β in the case of approximate differential privacy) as fixed and are interested in the behaviour of the rate as a function of n only but remarks concerning α analogous to the ones made in [4] could be made (as in that paper, α and β could also be allowed to vary with n). The optimal behaviour, however, of the rates in terms of the privacy parameters α and β, especially if β > 0, remains an open issue which is outside the scope of this paper.
Instead, we give in the following a heuristic which links (α, β)-differential privacy to missing data problems. We have designed the privacy procedures considered in Propositions 3.3 and 3.4 such that either the original value is published with zero probability (Proposition 3.3) or that it is at least not evident whether the original value has been published (provided that g in Proposition 3.4 is appropriately chosen, for instance assuming that g has the same support as the original data). However, this (from a privacy point of view reasonable assumption) is not enforced by the notion of (α, β)-differential privacy alone: For example, the local mechanism that publishes where ∅ denotes the empty set ensures (α, β)-differential privacy, the distribution of the random variable Y i in (9) corresponding here to the trivial distribution on the one point set {∅}. This mechanism, though, will certainly not be regarded as a legitimate privacy mechanism in most applications since it does not only reveal the true value with a positive probability but also tells the observer of the 'privatized' data if this is the case or not. This problem might become even more severe in multivariate setups where exact knowledge of one (maybe not per se sensitive) value associated with a certain individual might help to identify this individual and in addition reveals values associated with this person that are considered as sensitive.
Concerning the (optimal) rate of convergence, the missing data problem (13) is asymptotically equivalent to a standard nonparametric experiment where the number of observations is now βn instead of n which would lead to a rate of order (nβ) − 2s−1 2s over Sobolev spaces. Of course this rate is better than the one we have obtained above in Proposition 3.3 which is in turn better than the one from Proposition 3.4 (the latter fact being intuitive since in the setup of Proposition 3.4 the random variable Y i does not contain any information on f any more which still holds in Proposition 3.3).
Finally, we would like to mention that the mixture structure in Equation (10) is well-known under the name of Huber's contamination model in the theory on robust statistics. In this contamination model, however, the nuissance component g is itself unknown which leads to slower rates when the supremum in the minimax formulation is also taken over a set of potential contamination distributions g [8]. In the framework of differential privacy, the data holders can agree on a suitable choice of g which even allows to maintain the standard rate of convergence that holds for unmasked observations.

Adaptation to unknown smoothness
The estimators of the previous section are not completely satisfying since the optimal choice h n of the bandwidth, as usually in non-parametric statistics, depends on a priori knowledge of the smoothness of the unknown function f . Such knowledge is usually not available in practise. At least, using the approach suggested in Subsection 2.3 (with its specializations considered in Propositions 3.3 and 3.4) we relieved ourselves from the drawback of the Laplace method that one can privatize only one functional of the form f (t) for one single t that has to be fixed even before the anonymization. Note that this drawback is, for instance, also present in the mechanisms suggested in [21]. From this point of view, (α, β)differential privacy with strictly positive β via one of these approaches should be preferred.
The purpose of this section is to address the remaining issue of adapting to the unknown smoothness of f . Note that the problem of adaptation has, to the best of the author's knowledge, only been addressed in the recent work [4] so far, where the authors use wavelet estimators for density estimation on a compact interval. The approach in that paper is thus conceptionally different from the one presented in the sequel. In order to tackle this problem, we use a variant of Lepski's method (see [20] for a general account in the Gaussian white noise model, and [7] for an application to a tomography problem whose concise presentation has inspired our one).
Recall that in the case of global privacy (which is not considered here) the trusted data curator can choose the bandwidth in an adaptive way using all the data X 1 , . . . , X n and, as a consequence, can build on the existing plethora of methods and theoretical results for this standard case; hence bandwidth selection does not provide any additional difficulty for centralized privacy since only the final output is anonymized. Nearly the same holds true for our approaches to achieve (α, β)-differential privacy presented in Subsection 2.3. Here, slight modifications of standard Lepski's method yield completely data-driven estimators that attain the optimal rates of convergence up to logarithmic factors. For these cases, we will state results but omit the proofs since these are obtained by adapting the one for the Laplace perturbation case (or standard results from the literature). However, we study in detail the case of pure differential privacy which is the one that differs most from the standard framework.

Adaptive estimation for Laplace perturbation
In order to apply Lepski's method for the case of Laplace perturbation, the observations (3) must be available for different values of the bandwidth parameter h, say h ∈ H n . This can be realized using Lemma 2.5 provided that the privacy parameters α and β are appropriately scaled. Thus, we can assume that Z i,h (t) in (3) are accessible for any i ∈ J1, nK and h ∈ H n if we replace α and β by α = α/#H n and β = β/#H n , respectively. For any h ∈ H n and t ∈ R, we can then consider the estimator defined in (4). In our case, we define the set of potential bandwidths by a geometric grid, where a > 1 is a fixed constant, h n is such that a log(h n √ n)/ √ n h n 1, and h n satisfies h n = (log(h n √ n) ∨ 1)/ √ n. For h ∈ H n and some M > 0, define nh 2 where C α β is defined as in Section 3. The proof of Proposition 3.1 shows that Put λ(h) = max(1, (κ log(h n /h)) 1/2 ) with κ being a sufficiently large constant (an explicit value can be determined from the proof of Theorem 4.3) and define for all η ∈ Hn, η h} (15) where f η := E f η . If the set in the definition of h * n is empty, we set h * n = h n by convention. However, in the proof of Proposition 4.1 we will show that this set is non-empty for n large enough. The bandwidth h * n is an oracle in the sense that it is not accessible by the statistician since it depends on the unknown parameter f . The definition of h * n provides some kind of ideal criterion: The bandwidth h is increased along the grid H n as long as the bias term |f η (t) − f (t)| is bounded by the 'rate' v(h)λ(h), a procedure that aims at mimicking the classical biasvariance tradeoff. In order to state a risk bound for the pseudo estimator f h * n , we further define Proposition 4.1. Consider the pseudo-estimator f h * n defined via (4) and (15) where α and β are replaced with α and β , respectively. Assume that Choose h n = 1. Then, for n sufficiently large, Remark 4.2. Assumption (16) is satisfied in many cases. For instance, if |K(u)|du < ∞, then (16) is a special case of Bochner's lemma (see [23], Lemma 1.1). However, the sinc-kernel is not absolutely integrable and thus Bochner's lemma cannot be applied. In this case, one can alternatively assume that f belongs at least to some Sobolev space S(s, L) for some s > 1/2. Then, the analysis of the bias term as in the proof of Proposition 3.1 guarantees the validity of (16).
The pseudo estimator f h * n is a stopover on our road to an adaptive estimator. We now construct a genuine estimator of f that aims at mimicking this oracle. For this, we first define Then, calculations similar to those in the proof of Proposition 3.1 show that Then, we define an adaptive choice of the bandwidth parameter by This choice of the bandwidth is well-defined since the maximum is taken over a non-empty set. The definition of h n is characteristic for Lepski's method [19], and the motivation of this procedure is neatly described in [7], p. 67: One chooses the largest bandwidth h such that the difference between the two estimators f h and f η is not too large (in the sense of (17)) for all η h. Evidently, the motivation of this procedure is to mimick the trade-off between squared bias and variance in a purely data-driven manner. Note also that (17) provides, as well as the oracle version (15), a local choice of the bandwidth in the sense that h n depends on t. Such a local criterion might result in a better adaptation to spatial inhomogeneity of the target density than global selection rules.
As a consequence, taking h n = 1, we obtain The following corollary is obtained by specializing Theorem 4.3 with the sinckernel and h n = 1. Note that a logarithmic loss for adaptation is commonly accepted and even known to be indispensable for pointwise estimation in the non-private framework [2].

Corollary 4.4.
Consider the estimator f hn defined via (4) and (17) where Z i,h (t) for h ∈ H n are defined via (3) for the sinc-kernel with α and β replaced with α and β , respectively. Then,

Adaptive estimation for approximate differential privacy
We now state the adaptation results for approximate differential privacy that are obtained via Lepski's method in analogy to the ones for pure differential privacy. For this, we have to redefine some of the quantities from the previous subsection. Taking H n as defined in (14) with the exception that we now take h n and h n satisfying log(h n n)/n h n 1 and h n = (log(h n n) ∨ 1)/n. We put (defining λ(h) = max(1, (κ log(h n /h)) 1/2 ) as above) With this redefinition, we obtain the following result.
Consider the estimator f hn defined via (8) and (18) with Z i defined in (2) and Y i as in (6). Then, uniformly for all f with As a consequence, taking h n = 1, we obtain

Corollary 4.6.
Consider the estimator f hn defined via (8) and (18) where Z i (t) is defined via (2) with Y i as in (6). Then, For a priori not bounded X i we can consider the estimation procedure considered in Propostion 3.4. In the definitions preceeding Theorem 4.5, we replace the deconvolution-type kernel K n by a general kernel in all of the quantities.

Theorem 4.7.
Consider the estimator f hn defined via (11) and (18) with Z i defined in (2) and Y i ∼ g for some distribution g. Then, uniformly for all f As a consequence, taking h n = 1, we obtain

Corollary 4.8.
Consider the estimator f hn defined via (11) and (18) where Z i (t) is defined via (2) with Y i ∼ g for some infinitely smooth distribution (that is, g belongs to S(s, L) for some L and all s > 0; for instance, one can take g as the density of a standard normal). Then,

Discussion
We have investigated the optimal rates of convergence for pointwise estimation of a probability density over Sobolev classes for both pure and approximate local differential privacy. We have found two regimes of convergence rates: for approximate differential privacy the rate of convergence is of the same order n − 2s−1 2s as in the case of non-private observations, whereas the rate is deteriorated to n − 2s−1 2s+1 for pure differential privacy. We have suggested approaches to adaptive kernel density estimation via Lepski's method in the framework of local differential privacy for both setups. Although we have studied its theoretical properties in the prototypical example of univariate density estimation only, our methodology should be transferable to the multivariate case. We also conjecture that it might be possible to extend our results to the case of general linear functionals (different from pointwise evaluation of the density function at a fixed point) as investigated in [15] via Lepski's method in a inverse problem setup. Moreover, the optimal power of the logarithmic factor in the adaptive rate of convergence deserves further investigation. We have also pointed out the potentially misleading conception of (α, β)-differential privacy for the case of strictly positive β since it does not rule out mechanisms where the original data are published and the observer is even aware of this fact. This point certainly deserves conceptional work in the future since approximate differential privacy is receiving rather large interest in the computer science and official statistics literature. Note that this point of criticism is not only valid for local privacy but also in the (even more popular) setup of global privacy (see, for instance, [6] for a rigorous defintion of privacy in this case): the mechanism that publishes a whole original database with positive probability β and the symbol ∅ with probablity 1 − β guarantees (α, β)-differential privacy but is certainly not admissible in practise when dealing with really sensitive data. Hence, a notion that on the hand lessens the restrictive property of strict (α, 0)-differential privacy but on the other hand rules out non-acceptable procedures is desirable. Finally, note that pointwise rates of convergence have in the non-privacy setup also been studied for wavelet estimators [5]. Transferring such results to privacy setups (using, for instance, the anonymization techniques suggested in [4]) and investigating their properties provides another direction for future research.

A.1. Proof of Proposition 2.2
Let A ∈ B(R) be arbitrary. It has to be shown that for any x, x ∈ X. By the triangle inequality this holds true if and the latter holds true as soon as 1 exp (α − Δ(g)/b)+β which is equivalent to b Δ(g)/(α − log(1 − β)).

A.2. Proof of Proposition 2.4
Let A ∈ B(R) be arbitrary. Then, the condition on approximate differential privacy reads for all x, x ∈ X. This is certainly satisfied if which in turn holds true for any (α, 0)-differentially private mechanism Q.

A.3. Proof of Lemma 2.5
Let A ∈ Z 1 ⊗ Z 2 be a measurable set. Denote A z1 = {z 2 ∈ Z 2 : (z 1 , z 2 ) ∈ A} which is measurable. By Cavalieri's principle and the independence assumption Then P Z1|X=x (Ω c ) β 1 since otherwise there would be a contradiction to approximate differential pri-vacy. Hence, which shows the claim assertion.

B.1. Proof of Proposition 3.1
The bias-variance decomposition for the estimator f h (t) is . We begin with the analysis of the bias. First recall that and due to centredness of the error added by the privacy mechanism Thus, using that F [K sinc ] (·) = 1 [−π,π] (·), we obtain

M. Kroll
Let us now consider the variance. We denote By denoting ξ ∼ L(1), we have The statement of the proposition follows now by combining the obtained bounds for squared bias and variance.

B.2. Proof of Proposition 3.3
By the very definition of the kernel K n , we have with This is the same expression as in the proof of Proposition 3.1 and we obtain the same bound for the squared bias as in this proposition (note that this bound does not depend on β). In order to study the variance, note that Using this estimate, we obtain the following bound for the variance term: The proof of the proposition follows now by combining the bounds for bias and variance.

B.3. Proof of Proposition 3.5
Let , Q ∈ Q α be arbitrary as in the statement of the proposition. Define ψ n > 0 via ψ 2 n = n − 2s−1 2s+1 . Let f 0,n , f 1,n be two functions in S(s, L) (to be specified later on) such that (f 0,n (t) − f 1,n (t)) 2 ψ 2 n . Using a general reduction argument (see [24], Section 2.2) it can be shown that where the infimum is taken over all {0, 1}-valued test functions τ based on the observations Z 1 , . . . , Z n and P θ denotes the distribution of Z 1 , . . . , Z n if the true density of X 1 , . . . , X n is f θ,n . In view of [24], Theorem 2.2, Statement (iii), the claim assertion follows if we can choose the functions f 0,n and f 1,n such that (1) f 0,n , f 1,n ∈ S(s, L), (2) (f 0,n (t) − f 1,n (t)) 2 ψ 2 n , and (3) KL(P 0 , P 1 ) C < ∞ for some C independent of n.
To construct such f 0,n , f 1,n we use ideas from Section 6 of [3] and refer to this paper also for some of the computations. First, take a strictly positive probability density f on R that is infinitely often continously differentiable. Setting f (s) 2 2 = 1 2π R |F[f ](ω)| 2 |ω| 2s dω, we can further assume that f (s) 2 L. Then, for δ ∈ (0, 1/2), define the function f 0,n by In order to define the second hypothesis f 1,n we consider the auxiliary function K s as introduced on p. 26 of [3] (its construction in that paper is borrowed from [22]). In particular, note that K s is compactly supported and satisfies K We now check conditions (1)-(3) from above.
Verification of (1): The proof follows step by step along the lines of the one in [3] and we omit the details. We only record the fact that γ n,s = cLh which is the desired bound. (14) in [11] we have KL(P 0 , P 1 ) 4n(exp(α) − 1) 2 TV 2 (P X1 0 , P X1 1 ).
On the one hand, Moreover, for a −1 h < h * n , by the very definition of h * n . Thus, altogether, and by the monotonicity of v(·) and λ(·), for η < h h * n E[( f a −1 h (t) − f (t)) 4 ] Cλ 4 (η)v 4 (a −1 h). (1) for i = 1, . . . , n. We have Local density estimation under local differential privacy first. By Bernstein's inequality (see Lemma D.1) with b = 4 K ∞ /η, For any h ∈ H n and n large enough, it holds Let us now consider the probability in terms of ζ (2) i . We decompose Consider only the first probability on the right-hand side, the bound for the second one following analogously. By Bernstein's inequality (see Lemma D.1, take the version with control on the moments applied with t = v(h, η)λ(η)/4, v 2 = C 2 α β /h 2 and b = C α β /h)