Geometric sharp large deviations for random projections of $\ell_p^n$ spheres and balls

Accurate estimation of tail probabilities of projections of high-dimensional probability measures is of relevance in high-dimensional statistics and asymptotic geometric analysis. Whereas large deviation principles identify the asymptotic exponential decay rate of probabilities, sharp large deviation estimates also provide the"prefactor"in front of the exponentially decaying term. For fixed $p \in (1,\infty)$, consider independent sequences $(X^{(n,p)})_{n \in \mathbb{N}}$ and $(\Theta^n)_{n \in \mathbb{N}}$ of random vectors with $\Theta^n$ distributed according to the normalized cone measure on the unit $\ell_2^n$ sphere, and $X^{(n,p)}$ distributed according to the normalized cone measure on the unit $\ell_p^n$ sphere. For almost every realization $(\theta^n)_{n\in\mathbb{N}}$ of $(\Theta^n)_{n\in\mathbb{N}}$, (quenched) sharp large deviation estimates are established for suitably normalized (scalar) projections of $X^{(n,p)}$ onto $\theta^n$, that are asymptotically exact (as the dimension $n$ tends to infinity). Furthermore, the case when $(X^{(n,p)})_{n \in \mathbb{N}}$ is replaced with $(\mathscr{X}^{(n,p)})_{n \in \mathbb{N}}$, where $\mathscr{X}^{(n,p)}$ is distributed according to the uniform (or normalized volume) measure on the unit $\ell_p^n$ ball, is also considered. In both cases, in contrast to the (quenched) large deviation rate function, the prefactor exhibits a dependence on the projection directions $(\theta^n)_{n \in\mathbb{N}}$ that encodes additional geometric information that enables one to distinguish between projections of balls and spheres. Moreover, comparison with numerical estimates obtained by direct computation and importance sampling shows that the obtained analytical expressions for tail probabilities provide good approximations even for moderate values of $n$.

1. Introduction 1.1. Motivation and context. The study of high-dimensional norms, the convex bodies that describe their level sets, and other high-dimensional geometric structures are central themes in geometric functional analysis [28], and the burgeoning field of asymptotic geometric analysis [3]. Several results in these fields have shown that the presence of high dimensions often imposes a certain regularity that has a probabilistic flavor. A significant result of this type is the central limit theorem (CLT) for convex sets [25] which, roughly speaking, says that if X n is a high-dimensional random vector uniformly distributed on an isotropic convex body (namely, a compact convex set with non-empty interior whose normalized volume measure has zero mean and identity covariance matrix), its one-dimensional scalar projections X n , θ n along most directions θ n on the unit (n − 1)-dimensional sphere S n−1 in R n have Gaussian fluctuations. In fact, this result holds for the larger class of isotropic logconcave measures as well as more general high-dimensional measures that satisfy a certain concentration estimates called the thin shell condition (see, e.g. [27,37,40]). Of particular interest is the geometry of n p spaces, which has been classically studied using laws of large numbers, CLTs and concentration results [7,17,35,36]. These constitute beautiful universality results that suggest that random projections of the uniform measure on a convex body behave in some aspects like sums of independent random variables. On the other hand, they also imply the somewhat negative conclusion that typical fluctuations of lower-dimensional random projections do not yield much information about highdimensional measures. It is therefore natural to ask whether such random projections also satisfy other properties exhibited by sums of independent random variables, in particular those that capture non-universal features that would yield useful information about the corresponding high-dimensional measures.
With this objective, large deviation principles (LDP) were established for suitably normalized one-dimensional random projections of n p balls in [15,16]. These works established both quenched LDPs, conditioned on the sequence of projection directions, as well as annealed LDPs, which average over the randomness of the projection directions. Subsequently, quenched LDPs for multi-dimensional projections were obtained in [22], and annealed large deviation results for norms of n p balls and their multidimensional random projections were established in [1,19,20,23], with [19] also considering moderate deviations (see also [32] for a recent survey). Going beyond the setting of n p balls (and measures with a similar representation), annealed LDPs were obtained for norms of multidimensional projections of more general sequences of high-dimensional random vectors (X n ) n∈N that satisfy a so-called asymptotic thin shell condition in [22,23]. All these LDPs are indeed non-universal, in that the associated speeds (or exponential decay rates) and rate functions (that also captures the exponent) both encode properties of the highdimensional measures. However, although LDPs (in contrast to concentration results or large deviation upper bounds) identify the precise asymptotic exponential decay rate and allow for the identification of conditional limit laws [24], they have the drawback that in general they only provide approximate estimates of the probabilities, characterizing only the limit of the logarithms of the deviation probabilities, as the dimension n goes to infinity. Thus, existing LDPs for random projections cannot be applied directly to provide accurate estimates of tail probabilities or develop efficient algorithms that distinguish between two given high-dimensional measures, tasks that are of importance in statistics, data analysis and computer science [11].
1.2. Discussion of results. Our broad goal is to establish sharp (quenched) large deviation results of high-dimensional measures that not only capture the precise asymptotic exponential decay rate of tail probabilities of random projections, but also their "prefactors" (or the terms in front of the exponential), so as to provide more accurate quantitative estimates in finite dimensions, much in the spirit of the local theory of Banach spaces. In addition, we aim to identify additional geometric information that sharp large deviation estimates provide over LDPs. In this article, we focus on one-dimensional projections of n p spheres and balls and obtain estimates of deviation probabilities that are asymptotically exact as the dimension goes to infinity.
It is worthwhile to mention that for the Euclidean norm of a random vector distributed on an isotropic convex body, sharp large deviation upper bounds were obtained in several works (see, for example, [13,17,25,31] and references therein). While these estimates have the very nice feature that they are universal (in that they apply for all isotropic convex bodies or, more generally, logconcave measures), that very feature also makes them not tight for many specific sub-classes of convex bodies. As a consequence, our proof techniques are different from those used in the latter works, and may be of independent interest. In addition, we develop and analyze importance sampling algorithms to compute geometric quantities such as the volume fraction of small n p spherical caps in a certain direction, which would be infeasible to compute with reasonable accuracy using standard Monte Carlo estimation since the quantities are vanishingly small. We expect that such computational approaches based on large deviations may be useful more generally in the study of high-dimensional geometric structures. Indeed, the first version of this article has already spurred further work in this direction. For example, Kaufmann [21] studied annealed (i.e., averaged over the randomness of Θ) sharp large deviation estimates for q-norms of random vectors uniformly distributed on n p balls, and the paper [34] establishes quenched large deviation estimates for multi-dimensional projections of n p balls and their norms. We now describe some of the challenges in obtaining such sharp estimates and comment on our proof technique. Our results can be viewed as a geometric generalization of classical sharp large deviation estimates in the spirit of Bahadur and Ranga Rao [4], which we now briefly recall. Given a sequence of independent and identically distributed (i.i.d.) random variables (X i ) i∈N , for each n ∈ N, let S n denote the corresponding empirical mean: where X n := (X 1 , . . . , X n ) and I n := 1 √ n (1, 1, · · · , 1) ∈ S n−1 . Under suitable assumptions on the (marginal) distribution of X 1 it was shown in [4] that P (S n ≥ a) = e −nI(a) σ a τ a √ 2πn (1 + o(1)) , (1.2) where I is the Legendre transform of Λ, the logarithmic moment generating function of X 1 , τ a > 0 andσ a > 0 are suitable constants specified below and o(1) indicates a term ε n that satisfies ε n → 0 as n → ∞. Key ingredients of the proof in [4] include first identifying a "tilted" measure under which the rare event on the left-hand side of (1.2) becomes typical, and second, establishing a quantitative CLT for the sequence (S n ) n∈N under the tilted measure. Specifically, this tilted measure is also another product measure of the form ⊗ P a , where P a is a measure absolutely continuous with respect to P, with Radon-Nikodym derivative given by where τ a is the unique positive constant such that X 1 has mean a under the marginal P a of the tilted measure. The constantσ 2 a in (1.2) is the variance of X 1 under P a . The second step of establishing a quantitative CLT is in this case standard given the product form of the tilted measure, and appeals to well known Edgeworth expansions that also involve the third moment of S n under the tilted measure ⊗ n P a .
Fix p ∈ (1, ∞), and let the projection direction Θ n be distributed according to the normalized surface measure on S n−1 , and let X (n,p) be a random vector independent of Θ n that is uniformly distributed on the unit n p ball. In this article we obtain estimates of tail probabilities of the scaled random projection W (n,p) := n 1/p n 1/2 X (n,p) , Θ n = 1 n n i=1 n 1/p X (n,p) i conditioned on Θ = (Θ n ) n∈N = θ = (θ n ) n∈N , for a.e. realization θ of Θ. Using terminology that originates in statistical physics, due to the fact that we condition on the realization θ of Θ and obtain results for almost every realization, we refer to these as "quenched" deviation estimates. While (quenched) sharp large deviations of sums of weighted i.i.d. random variables with i.i.d. weights have been considered in more recent work [9], comparing the expressions for W (n,p) and S n in (1.3) and (1.1), respectively, we see that W (n,p) is a randomly weighted sum of random variables that are not independent, with random weights that are also not independent. Thus, the analysis in this case is significantly more challenging and requires several new ingredients. First, we instead exploit a known probabilistic representation for the cone measure on n p spheres [35] to rewrite the tail event {W (n,p) ≥ a) as the probability that a certain two-dimensional random vector lies in a certain domain in R 2 (see Section 2.4), and then establish sharp large deviation estimates for the latter. This transformation turns out to be useful even though sharp large deviations in multiple dimensions are more involved, and none of the existing results (see, e.g., [2,5,18] and references therein) apply to this setting. We use Fourier analysis and a change of measure argument to obtain an asymptotic expansion for the quenched two-dimensional density (see Proposition 5.4 and Section 7) and then integrate this density over the appropriate domain. To identify the appropriate change of measure or "tilted" measure, we first show (in Lemma 2.2) that the quenched large deviation rate function obtained in [16] is strictly convex and has a unique minimizer. Along the way, we also establish several results of possible independent result including quantitative central limit theorems under the change of measure (see Lemma 4.4) and multi-dimensional generalized Laplace asymptotics (see Proposition 5.6).
In addition, we also obtain corresponding results for n p balls, where X (n,p) is replaced with X (n,p) , a random variable independent of Θ n distributed according to the normalized volume measure on a scaled n p ball. Obtaining sharp large deviation estimates for random projections of n p balls is substantially more complex than the n p sphere setting because the probability of interest is now expressed as an integral over a three-dimensional domain whose boundary is non-smooth at the minimizing point of the Laplace-type functional (see Section 2.4). This leads to additional difficulties in the computation of the associated Laplace-type asymptotic integral (see Lemma 5.6). As elaborated in Remarks 2.12 and 2.8, our analytical sharp large deviation estimates do indeed capture additional geometric information beyond the large deviation rate function, and in fact we show that there is a clear difference between sharp tail probabilities in n p balls and spheres, even though they share the same large deviation rate function. Analogous sharp large deviation asymptotics can also be obtained in the case p = ∞ or, in fact, for more general product measures; the analysis in this case is much easier (see, e.g., [26,Section 4.2]).
In order to provide evidence of the accuracy of our sharp analytical estimates of the deviation probabilities for finite n, we compare them with numerical approximations. Specifically, we use the tilted measure identified in the sharp large deviations analysis to propose an importance sampling scheme that numerically approximates the deviation probabilities. We then compare the estimates obtained from importance sampling with analytical sharp large deviation estimates for a range of n.

1.3.
Outline of the rest of the paper. After a summary of common notation and terminology in Sections 1.4 and 2.1, precise statements of the main results are presented in Sections 2.2 and 2.3. An importance sampling algorithm for calculating tail probabilities and comparisons with resulting simulations and the obtained analytical formulas are presented in Section 3. The main results rely on an asymptotic independence result for the weights induced by the projection direction, which is obtained in Section 4, as well as a reformulation of the rare event of interest as the event that a certain random vector lies in a two-dimensional (or three-dimensional) domain, which is described in Section 2.4. Section 2.4 also contains an outline of the proofs of the main results, with the complete proofs of the refined quenched tail estimates given in Sections 5.5 and 5.6 for projections of n p spheres, and in Section 6 for projections of n p balls. Both proofs proceed by first performing asymptotic expansions for the joint densities of the multi-dimensional random vectors, as formulated in Section 5.2. These expansions are derived from a general result on multi-dimensional generalized Laplace approximations obtained in Section 5.3 (see Propositions 5.6 and 5.7 therein) and estimates obtained in Sections 5.1 and 5.4, which justify the applicability of these approximations in the present context. Proofs of several technical results used in the analysis are deferred to Appendices A-F. 1.4. Notation and definitions. We use the notation N, R and C to denote the set of positive integers, real numbers and complex numbers, respectively. For a complex number z ∈ C, we denote Re{z} to be the real part of z. For a set A, we denote its complement by A c . Also, given a m × d matrix M , let M T denote its transpose and when m = d, let detM denote its determinant.
Given an extended real-valued function f : R d → [0, ∞], its effective domain is defined as {x ∈ R d : f (x) < ∞}. For a twice differentiable function f : R d → R (i.e., for which each partial derivative ∂ i ∂ j f exists for all i, j ∈ {1, . . . , d}), let Hess f (x) denote the d × d Hessian matrix of f at x. For q ∈ N, define the function space L q (R d ) to be For p ∈ (1, ∞) and n ∈ N, let · n,p denote the p-norm in R n , that is, for x n,p := (|x 1 | p + · · · + |x n | p ) 1/p . , (1.5) where [0, 1]A := {xa ∈ R n : x ∈ [0, 1], a ∈ A}, and vol denotes Lebesgue measure. Note that when p ∈ {1, 2, ∞}, the (renormalized) cone measure coincides with the (renormalized) surface measure, and is equal to the unique rotational invariant measure on S n−1 with total mass 1. For the special case p = 2, we use just · to denote · n,2 , the Euclidean norm on R n , S n−1 to denote S n−1 2 and σ n to denote µ n,2 . We end this section with the definition of a large deviations principle (LDP); we refer to [10] for general background on large deviations theory. For d ∈ N, let P(R d ) denote the space of probability measures on R d , equipped with the topology of weak convergence, where recall that for η, η n ∈ P(R d ), n ∈ N, η n is said to converge weakly to η as n → ∞, denoted η n ⇒ η, if as n → ∞ for every bounded and continuous function f on R d . Definition 1.1 (Large deviation principle). The sequence of probability measures (η n ) n∈N ⊂ P(R d ) is said to satisfy a large deviation principle (in R d ) with (speed n and) a good rate function I : R d → [0, ∞] if I is lower semicontinuous and for any measurable set A, where A o and cl(A) denote the interior and closure of A, respectively. Moreover, we say that I is a good rate function if it has compact level sets. A sequence of random variables (V n ) n∈N with each V n defined on some probability space (Ω n , F n , P n ), is said to satisfy an LDP if the corresponding sequence of laws (P −1 n • V n ) n∈N satisfies an LDP.

Statement of main results
Fix p ∈ (1, ∞). Consider a probability space (Ω, F, P) on which are defined three independent sequences X = (X (n,p) ) n∈N and X = (X (n,p) ) n∈N , and Θ = (Θ n ) n∈N . Each X (n,p) is distributed according to the cone measure µ n,p on the unit n p sphere, as defined in (1.5), and each X (n,p) is distributed according to the normalized volume measure on the unit n p ball B n p defined in (1.4). The random element Θ takes values in the sequence space S := ⊗ n∈N S n−1 , with Θ n ∈ S n−1 denoting the n-th element of that sequence, and is independent of X (and X) with distribution σ, where σ is any probability measure on S whose image under the mapping θ ∈ S → θ n ∈ S n−1 coincides with σ n , the unique rotation invariant measure on S n−1 . The dependence between the random vectors Θ n for different n ∈ N can be arbitrary. For θ ∈ S, denote P θ to be the probability measure P conditioned on Θ = θ, and let E and E θ denote expectation with respect to P and P θ , respectively. For n ∈ N, let W (n,p) be the normalized scalar projection of X (n,p) along Θ n defined as and similarly let W (n,p) be the normalized scalar projection of X (n,p) defined as First, in Section 2.1, we introduce notation that is required to state the quenched sharp large deviation estimates. In Section 2.2 we recall the quenched LDP for n p spheres and balls established in [16] and obtain an important simplification of the quenched LDP rate function obtained therein, which in particular shows that it is convex and has a unique minimum. The latter property will be crucial for our analysis. We then present our sharp large deviation results for projections of n p spheres. Corresponding results for n p balls are presented in Section 2.3.
Finally, in Section 2.4 we provide a brief outline of both proofs, and present a more detailed comparison of our results with classical Bahadur-Ranga Rao bounds.
2.1. Preliminary notation. Fix p ∈ (1, ∞). Let γ p ∈ P(R) be the p-Gaussian distribution with density where Γ is the Gamma function. For t 1 , t 2 ∈ R, define the extended functions and 5) and observe that they both have effective domain D p := R × (−∞, 1/p). Also, let Ψ * p be the Legendre transform of Ψ p : and let J p ⊂ R 2 be the effective domain of Ψ * p : Since by [16,Lemma 5.8], the function Λ p defined in (2.4) is strictly convex on its effective domain, which we denote by D p , Ψ p is also strictly convex on D p . By [16,Lemma 5.9], Ψ p is essentially smooth, lower-semicontinuous and hence closed. Therefore by [33,Theorem 26.5], ∇Ψ p is one-to-one and onto from the domain of Ψ p to J p . Thus, for each (x 1 , x 2 ) ∈ J p there exists a unique λ x such that λ x ∈ D p and ∇Ψ p (λ x ) = x. This in turn implies that λ x uniquely achieves the supremum in (2.6), and hence that Remark 2.1. Since Ψ p is a strictly convex infinitely differentiable function on D p , the inverse function theorem and (2.8) imply that the mapping J p x → λ x ∈ D p is also infinitely differentiable.

2.2.
Results on projections of n p spheres. We first state quenched LDPs for the sequences (W (n,p) ) n∈N from (2.1) and (W (n,p) ) n∈N from (2.2). It follows from [16, Theorem 2.5] that for σ-a.e. θ, under P θ , the sequence (W (n,p) ) n∈N satisfies an LDP with (speed n and) a quasiconvex good rate function I p (t) = inf where recall that a quasiconvex function is a function whose level sets are convex. Furthermore, it follows from [16, Lemmas 3.1 and 3.4] that (W (n,p) ) n∈N also satisfies an LDP with the same speed and rate function. Note that the rate function I p is insensitive to the projection directions, in the sense that it is the same for σ-a.e. θ. We show in the following lemma that the infimum in (2.10) is attained uniquely at (t, 1), yielding a simpler form for the rate function and use that to deduce it is strictly convex and has a unique minimizer. The latter is a crucial property both for obtaining sharp large deviation estimates and developing importance sampling algorithms. Lemma 2.2. For p ∈ (1, ∞) and a > 0 such that Ψ * p (a, 1) < ∞, inf  ∞). For σ-a.e. θ, under P θ , the sequences (W (n,p) ) n∈N and (W (n,p) ) n∈N both satisfy LDPs with the same strictly convex, symmetric, good rate function I p given by (2.11) We now introduce notation to state the sharp large deviation estimate for W (n,p) . Recall the definitions of Ψ p , Ψ * p , J p and λ x from Section 2.1 and for x ∈ J p , define H x = H p,x by H p,x := (Hess Ψ p )(λ x ), (2.12) where we suppress the dependence on p from λ x and H x . Also, fix a > 0 such that I p (a) < ∞. With some abuse of notation, we write λ a = λ a * and H a = H a * , where a * = (a, 1). Note that then λ a = (λ a,1 , λ a,2 ) ∈ R 2 is the unique maximizer in (2.11), that is, and H a := (Hess Ψ p ) (λ a ).
(2.14) Next, define the positive constants ξ a = ξ p,a and κ a = κ p,a via the relations Remark 2.4. Although it is not a priori obvious that the right-hand side of (2.16) is positive, this will become apparent from the proof of Theorem 2.5.
We are now ready to state the quenched sharp large deviation estimate for scaled projections of n p spheres. Recall for θ ∈ S, we denote P θ to be the probability measure P conditioned on Θ = θ. Theorem 2.5. Fix p ∈ (1, ∞) and a > 0 such that I p (a) < ∞. Then the following statements hold with the matrix H a as defined in (2.14) and constants ξ a = ξ p,a and κ a = κ p,a defined as in (2.15) and (2.16), respectively: (i) For n ∈ N, there exist mappings R n a = R n p,a : S n−1 → R and c n a = c n p,a : S n−1 → R 2 , defined explicitly in (5.9) and (5.8) as a centered integrated log moment generating function and its gradient, such that for σ-a.e. θ, (ii) Moreover, there exist sequences of random variables (r n = r n p,a ) n∈N , (s n = s n p,a ) n∈N , and (t n,i = t n p,a,i ) n∈N , i = 1, 2, (defined on some common probability space) such that for each n ∈ N, 20) and as n → ∞, Z is a standard Gaussian random variable, and ( A, D, E, G) are jointly Gaussian with mean 0 and covariance matrix Σ a = Σ p,a that takes the following explicit form:  An outline of the proof of Theorem 2.5 is given in Section 2.4, with full details provided in Sections 5.5 and 5.6. See also (7.12) and (7.13) for an interpretation of c n a and H a as the scaled mean vector and limiting covariance matrix, under a quenched tilted measure of a twodimensional vector that arises in a convenient representation for W (n,p) described in Section 2.4).
Remark 2.6. We will refer to the term C n a (θ n )e √ nR n a (θ n ) /κ a ξ a √ 2πn in (2.18) as the "prefactor" since it provides a multiplicative correction to the exponentially decaying term e −nIp(a) , which is identified by the LDP. In addition, it follows from (2.19)-(2.20) that (in distribution) R n a (Θ n ) and C n a (Θ n ) both converge to zero as n → ∞; see also Lemma 5.8 for more refined estimates. Further insight into the form of the prefactor can be found in Remarks 2.8 and 2.15.
As mentioned above, the most significant term in the prefactor that depends on θ is e √ nR n p,a (θ n ) . The following proposition describes the additional geometric information contained in this term beyond what is available in the rate function I p , which is σ-almost surely insensitive to the projection sequence Θ. Proposition 2.7. Fix p ∈ (1, ∞), a > 0 such that I p (a) < ∞ and let R n p,a be the mapping in Theorem 2.5 that is defined explicitly in (5.8). Then (1) For p = 2, R n p,a (θ n ) is a constant regardless of the direction θ n ∈ S n−1 ; (2) For p > 2, the maximum of R n p,a (θ n ) over θ n ∈ S n−1 is attained at (±1, ±1, . . . , ±1)/ √ n, while the minimum is attained at ±e j for j = 1, . . . , n; (3) For p < 2, the minimum of R n p,a (θ n ) over θ n ∈ S n−1 is attained at (±1, ±1, . . . , ±1) √ n, while the maximum is attained at ±e j for j = 1, . . . , n, where (e j ) j=1,...,n are defined to be the standard basis vectors in R n . Remark 2.8. Proposition 2.7 in conjunction with Theorem 2.5 shows how the sharp large deviation estimates reflect the difference in the geometry of n p spheres for p ∈ (1, 2) and p ∈ (2, ∞) with respect to the relative distribution of mass along different rays. This motivates obtaining sharp large deviation estimates for projections of more general high-dimensional objects to uncover new geometric information about these objects.
As a corollary, combining the two parts of Theorem 2.5, we obtain an alternative expression for the distribution of the conditioned tail probability: Corollary 2.9. Fix p ∈ (1, ∞) and a > 0 such that I p (a) < ∞. For n ∈ N, recall the definitions of (r n ) n∈N , (s n ) n∈N and (t n ) n∈N in Theorem 2.5 (ii), and that of H a from (2.14). Then Moreover, as n → ∞, where (R, S, T 1 , T 2 ) is as defined in Theorem 2.5(ii).
Proof. By (2.18), (2.19) and (2.20), the tail probability can be written as since exp(o(1)) = 1 + o(1). Also, from the relation (2.22), the mapping (r n , s n , t n,1 , t n,2 ) → (M n , r n ) is continuous. Therefore, we may apply the continuous mapping theorem to the last display, and invoke Theorem 2.5(ii) to obtain the joint convergence stated in (2.23).
Remark 2.10. In Theorem 2.5 and Corollary 2.9 we only consider values p ∈ (1, ∞) because for p ∈ (0, 1), n p balls are no longer convex and the existence of even an LDP has not been established. Moreover, as shown in [16, Theorem 2.6], when p = 1 the quenched LDP of the projection exists only when the projection directions satisfy lim n→∞ n log n max 1≤i≤n θ for some constant c ∈ (0, ∞), and in that case, it is with speed n/ √ log n and the rate function is no longer universal but depends on the limiting constant c. On the other hand, we omit the case p = ∞, or the more general case of product measures, because this is in fact simpler to analyze than the p ∈ (1, ∞) case; details can be found in [26,Section 4 2.3. Results on projections of n p balls. Next, we state the corresponding sharp large deviation results for balls. For p ∈ (1, ∞) and a > 0, recalling that λ a,1 is the first coordinate of the maximizer λ a in the expression for Ψ * p (a) in (2.9) and H a is as defined in (2.14), define the positive constant γ a = γ p,a via the relation Theorem 2.11. Fix p ∈ (1, ∞) and a > 0 such that I p (a) < ∞. Then for n ∈ N, where γ a = γ p,a is the constant defined in (2.24), and R n a and C n a are the functions defined in Theorem 2.5.

Remark 2.12.
(i) Note that the tail probability in (2.25) is a geometric quantity, equal to the volume of the p-spherical cap (at level a) of n p balls along the direction θ n . (ii) Recall that it follows from the results of [16] (recapitulated here as Theorem 2.3) that n p spheres and balls cannot be distinguished because the large deviation speeds and rate functions for random projections of n p balls and spheres coincide. In contrast, we see from (2.18) and (2.25) that although the two prefactors have a similar form, their actual values differ since in general γ a = κ a ξ a . Thus, the sharp large deviation estimates obtained here are sufficiently refined to distinguish these two objects, whereas the LDP rate function does not do so. (iii) As in Remark 2.8, due to the appearance of R n a in (2.25), the sharp large deviation estimate provides more insight into the distinction between the geometry of n p balls with p ∈ (1, 2) and n p balls with p ∈ (2, ∞). Similar to Corollary 2.9, we have the following immediate corollary for balls: For n ∈ N, recall the definitions of (M n ) n∈N and (r n ) n∈N in Corollary 2.9 and let γ a be as in (2.24). Then for n ∈ N, where (2.22) and (2.23) hold.

2.4.
Reformulation of the problem and outline of the proof. Fix p ∈ (1, ∞). As mentioned in the introduction, one of the reasons the estimate (2.18) is challenging to establish is that W (n,p) and W (n,p) are randomly weighted sums of random variables that are not independent, and furthermore, the random weights are also themselves not independent. In this section we provide a brief outline of our proof and additional insight into the form of the sharp large deviation estimates, contrasting them with existing results, and explaining the role of various constants. The first step of the proof is to reformulate the probability of the rare event in terms of a certain multi-dimensional random vector (S (n,p) in the case of spheres andS (n,p) in the case of balls) using a well known probabilistic representation for the random vector X (n,p) that we now recall. Assume without loss of generality that the probability space (Ω, F, P) is large enough to also support an i.i.d. sequence of generalized p-Gaussian random variables (Y has density f p defined in (2.3). Then, it follows from [35, Lemma 1] (see also a statement of this property at the bottom of p. 548 in [8]) that where recall that x n,p denotes the p-norm in R n . Define the R 2 -valued random vector In view of (2.1) and the independence of X (n,p) , (2.26), and Θ, for a > 0 and θ ∈ S, we may rewrite the tail probability on the left-hand side of (2.18) as On the other hand, again from [35, Lemma 1], we also have an equivalent representation for X (n,p) : where U is a uniform random variable on (0, 1), independent of the sequence (Y (n,p) ) n∈N . Define the R 3 -valued random vector From the equivalent representation (2.30), for a > 0 and θ ∈ S, we may rewrite the tail probability of W (n,p) as whereD p,a is the three-dimensional domain given bȳ Remark 2.14. Throughout the paper, we will typically use an overline to denote quantities related to these multi-dimensional reformulations, and script fonts for quantities related to n p balls.
While several results on sharp large deviations in multiple dimensions have been obtained (see, e.g., [2,18] as well as [5] for a comprehensive list of references), none of these cover the cases of interest in (2.28) and (2.32). In particular, the work [2] considers empirical means of i.i.d. random vectors whereas, under P θ ,S (n,p) is the empirical mean of non-identical random vectors. Moreover, the results of [18] also do not apply since the condition imposed in [18, Assumption (A.2)] is not satisfied here due to the additional √ n factor in the exponent of (2.18) compared with [18, Equation (3)]. Instead, our proof proceeds by first exploiting quantitative asymptotic independence results of the weights (Θ n i ) j=1,...,n obtained in Section 4, and combining them with new asymptotic estimates for certain Laplace-type integrals stated in Section 5.
Remark 2.15. Comparing the estimate in (2.18) with the sharp large deviation estimate for the projection of an i.i.d. sum onto the vector I n = (1, 1, . . . , 1)/ √ n given in (1.2), we see that ξ a in (2.18) plays a role similar toσ a τ a in (1.2). On the other hand, the additional constant κ a in (2.18) arises due to the geometry of the domainD p,a defined in (2.29) and the fact that we obtain this estimate by reformulating it in terms of a two-dimensional problem. From a technical point of view, the additional θ n -dependent terms R n a (θ n ) and C n a (θ n ) arise because we are considering (quenched) sharp large deviations of a vectorS (n,p) whose independent summands are not identically distributed under P θ on account of the different weights arising from the coordinates of θ n . From their exact definitions given in (5.8) and (5.9), it is easy to see that both these terms would vanish if we considered θ n ∈ S n−1 with identical weights such as θ n = I n = (1, 1, . . . , 1)/ √ n.

An Importance Sampling Algorithm
To numerically compute the tail probability Markov Chain Monte Carlo (MCMC), for any θ n ∈ S n−1 , one would have to generate independent samples of X (n,p) from the cone measure µ n,p defined in (1.5), and use the empirical mean as an estimate of the expectation. However, since the probability is very small, this is inefficient or computationally infeasible for even moderate values of n. In this section, we propose an alternative importance sampling (IS) algorithm to more efficiently compute the tail probability numerically, for a range of values of n, and compare this with the analytical estimate obtained in Theorem 2.5. For a > 0, fix p ∈ (1, ∞) and recall the constant λ a defined in (2.13). Also, recall the definition of the density f p in ( where we suppress from the notation the explicit dependence of f n p,j on θ n . Also define In view of (3.1) and (3.2), it then follows that 3) The IS algorithm estimates the tail probability on the left-hand side of (3.3) by first sampling a direction θ n according to σ n and then sampling from i.i.d. copies of the vector Y (n,p) := ( Y p 1 , . . . , Y p n ), independently of the θ n sample, to approximate the expectation on the right-hand side of (3.3) by a standard Monte Carlo estimate.
The results are displayed in Figures 1-2 and Tables 1-2. In each case, the IS estimate is computed as above, the LDP estimate is e −nIp(a) (i.e., with 1 as a prefactor), and the sharp large deviation (SLD) estimate is the prefactor (see Remark 2.6) times e −nIp(a) . We consider p = 3  with only 100 samples since we do not have closed form expressions for various functions needed in the IS simulation, thus requiring greater computational effort per sample. In Table 1 we also calculate the confidence interval of the IS estimate and tabulate the relative distance between the SLD and IS estimates, computed as (SLD − IS) × 100/IS. First, we see from Figure 1 that the LDP estimate is not a good enough approximation, but the sharp large deviation (SLD) estimate does a much better job. For large a, namely a = 0.7, in Figure 1(B) and Table 1, we see that the SLD and IS estimates match pretty well even for small n (namely, even n = 20). However, this is not the case for a small, namely for a = 0.1. In this case, as evident from Figure 1(A) and Table 2, the SLD estimate appears to achieve the same accuracy only for much larger n, which likely reflects the dependence of the o (1)  Finally, we also ran simulations for different realizations θ of the direction sequence Θ. We see from Figure 2 that different projection direction sequences result in fluctuations around the quantity e −nIp(a) /κ a ξ a √ 2πn, which is the basic sharp large deviation estimate obtained by ignoring the θ n -dependent terms in the prefactor in (2.18). As shown in Theorem 2.5(ii), these fluctuations converge in distribution to functionals of a multi-dimensional Gaussian vector with an explicit covariance matrix.

Asymptotic Independence Results for the Weights
and equip P p (R) with the p-Wasserstein distance defined to be where Π(ν, ν ) denotes the set of couplings of ν and ν or equivalently, the set of probability measures on R 2 whose first and second marginals coincide with ν and ν , respectively. We now define a function with polynomial growth in the natural way.
Definition 4.1. Given m ∈ N, we say that a function f : R → R has polynomial growth of degree m if there exist T ∈ R and C ∈ (0, ∞) such that We say a function f : R → R has polynomial growth if it has polynomial growth of degree m for some m ∈ N.
Next, we recall the definition of the p-Wasserstein distance on probability measures.
(2) For any continuous φ : R → R that has polynomial growth of degree p For each n ∈ N and θ ∈ S, let L n θ denote the empirical measure of the coordinates of the scaled projection direction √ nθ n : The following strong law of large numbers for (L n θ ) n∈N was established in [16,Lemma 5.11]. Recall that γ 2 denotes the standard normal distribution.  [16]). For p ∈ (1, ∞), for σ-a.e. θ ∈ S, We now establish a central limit theorem refinement of Lemma 4.3. Given an i.i.d. array (Z n = (Z n j , j = 1 . . . , n)) n∈N of standard normal random variables, for any twice continuously differentiable function φ, definê and setr For any probability measure π ∈ P(R), define π(F ) := R F (x)π(dx), for any Borel measurable function F : R → R.
Lemma 4.4. Given a thrice continuously differentiable function F : R → R and two twice continuously differentiable functions G 1 , G 2 : R → R such that F , G 1 and G 2 have polynomial growth in the sense of Definition 4.1, we have the following expansion: whereŝ n andr n are as defined in (4.3) and (4.4), and as n → ∞, and Z is a standard normal random variable.
This result is similar in spirit to [20, Theorem 1.1], which establishes a central limit theorem for the sequence of q-norms of √ nΘ n , n ∈ N. Lemma 4.4 above provides fluctuation estimates for suitable joint functionals of √ nΘ n , for which we first apply a Taylor expansion to the functionals. The proof of Lemma 4.4 is deferred to Appendix B.

Proof of the sharp large deviation estimate for spheres
Throughout this section, fix p ∈ (1, ∞) and for n ∈ N, recall from Section 2.4 the definition of the two-dimensional random vectorS n :=S (n,p) = 1 n n j=1 ( sequence of random variables with common density f p as in (2.3), and for θ ∈ S, leth n θ denote the (joint) density ofS n under P θ , where in this section we will typically suppress the dependence ofh n θ ,S n and Y j and other quantities on p. In view of (2.28), we then have whereD a =D p,a is the domain defined in (2.29).
Remark 5.1. Note thath n θ depends on θ only through θ n . For notational simplicity throughout we will adopt the convention that for quantities that depend on both n and θ n , we will use a superscript n to denote the former dependence and a subscript θ instead of θ n to denote the dependence on θ n .
The key ingredients required to estimate the tail probability in (5.1) are an asymptotic expansion for the joint densityh n θ carried out in Proposition 5.4 of Section 5.2, a multi-dimensional generalized Laplace approximation stated in Proposition 5.7 of Section 5.3, and a certain estimate that justifies the application of this Laplace approximation that is stated in Lemma 5.8 of Section 5.4. The proof of Proposition 5.4 is somewhat involved and hence deferred to Section 7. Instead, these results are first used in Sections 5.5 and 5.6 to prove Theorem 2.5. We first state a preliminary result in Section 5.1.

5.1.
Estimates on the joint logarithmic moment generating function. We obtain an estimate on the growth of the log moment generating function Λ p of (Y j , |Y j | p ) defined in (2.4), which will be useful in the subsequent discussion. The following expression was established in [16, Lemma 5.7]: 4) is the moment generating function of Y j . In order to understand the growth in t 1 of the derivatives of Λ p , it suffices to understand the derivatives of log M γp .
, exists and has at most polynomial growth, in the sense of Definition 4.1. Therefore, for j, k ∈ N ∪ {0}, and any t 2 < 1/p, the function has at most polynomial growth.
The proof of Lemma 5.2 involves conceptually straightforward (though detailed) estimates, and is thus deferred to Appendix C.

5.2.
An asymptotic expansion for the joint density. The main result of this section is Proposition 5.4, which provides an asymptotic expansion for the joint densityh n θ of the twodimensional random vectorS n under P θ . To state the result, for n ∈ N, definē where L n θ is the empirical measure of the coordinates of √ nθ n , as defined in (4.2).
is continuous and has polynomial growth by Lemma 5.2. Hence, for every t = (t 1 , t 2 ) ∈ D p and σ-a.e. θ, the convergence of L n θ to γ 2 established in Lemma 4.3 shows that as n → ∞, where the last equality holds by the definition of Ψ p given in (2.5).
Next, recall the definition of J p from (2.7) and for x ∈ J p , the definition of λ x from (2.7)-(2.8).Then for θ ∈ S, define where we drop the explicit dependence on p from c n x , H n x and R n x , and note that the right-hand sides above depend on θ only through θ n (see Remark 5.1).
For a > 0, with the same abuse of notation used for H a in Section 2.2, we let c n a and R n a denote the functions c n a * and R n a * , respectively, where a * = (a, 1). We show in Section 7.2 that c n x (θ n ) and H n x (θ n ) are the mean vector and covariance matrix, respectively, of 1 √ n n j=1 (V n j −x), withV n j as in (5.5), under a certain quenched tilted measure; see (7.12) and (7.13). Proposition 5.4. Fix p ∈ (1, ∞), n ∈ N, and recall the definitions of Ψ p , Ψ * p , J p and Ψ n p,θ given in (2.5), (2.6), (2.7) and (5.7), respectively, and for x ∈ J p , recall the definitions of H x , c n x (·) and R n x (·) from (2.12), (5.9) and (5.8), respectively. Then for σ-a.e. θ,

5.3.
A multi-dimensional generalized Laplace approximation. The formula (5.1) and the expression forh n θ in (5.10)- (5.11) show that the tail probability can be expressed as a Laplacetype integral over the domainD a defined in (2.29). However, to estimate this integral, we cannot directly apply conventional Laplace approximations such as those in [6,Chapter 8] or [42,Chapter V] due to the additional dependence of n inḡ n θ . Instead, in Propositions 5.6 and 5.7, we first establish a generalization of multi-dimensional Laplace approximations that can accommodate such n-dependent terms, which may be of independent interest. Definition 5.5. Given m, d ∈ N, α ∈ (0, 1) and a bounded domain D ⊂ R m+d , we say that the sequence h n : R m+d → R, n ∈ N, admits a (f, x * , α, g n )-representation if for each n ∈ N, where (1) f is a nonnegative function that is twice continuously differentiable in D and achieves its minimum on cl(D), the closure of D, at a unique point x * , (2) there exists C ∈ (0, ∞) such that for each n ∈ N sufficiently large, g n (x) = exp(r n (x)) is continuously differentiable with |r n (x)| ≤ Cn α x 2 for all x in a neighborhood of x * .
We start by establishing a Laplace asymptotics result, which extends the one-dimensional result in [30, Chapter 9.2].
Proposition 5.6. Given m, d ∈ N, and a bounded domain D ⊂ R m × R d + containing the origin. Suppose the sequence h n : R m+d → R, n ∈ N, admits a (f, x * , α, g n )-representation on D with x * = (0, 0, . . . , 0). Then we have the following asymptotic expansion: The proof is deferred to Appendix E. We now obtain an alternative representation for this integral. To state the result we need to introduce the definition of Weingarten maps. Let D be a hypersurface in R d . Denote the tangent space at a point x ∈ D to be T x (D) and the normal vector field at x to be N x . Then the Weingarten map at x is defined to be the linear map Also, for a map L, let L −1 denote its inverse and recall that det(A) denotes the determinant of a matrix A. (See also [2, Section 4] for more information on Weingarten maps).
Proposition 5.7. For m, n ∈ N, let D ⊂ R m+d be a bounded domain whose boundary is a differentiable (d − 1)-dimensional hypersurface. Let h n : R m+d → R, n ∈ N, be a sequence of functions that admits a (f, x * , α, g n )-representation on D in the sense of Definition 5.5. Then where for i = 1, 2, L i is the Weingarten map at x * ∈ ∂D of the surface C i , given by Proof. The proof will make use of arguments from [6] as well as a result from [2]. Since ∂D is a differentiable (d − 1)-dimensional hypersurface, there exists a one-to-one continuously differentiable transformation Γ : N →Ñ ⊂ R × R d−1 + such that F maps x * to the origin. Setting J Γ (x) ∈ R d × R d to be the Jacobian matrix of Γ at x, we can write By the assumption in Definition 5.5 there exist α ∈ (0, 1) and C ∈ (0, ∞) such that g n (x) = exp(r n (x)) with |r n (x)| ≤ Cn α x 2 on a neighborhood of D. By the differentiability of Γ, we have r n (Γ −1 (x)) ≤ Cn α x 2 . Hence, Proposition 5.6 with m, d g n , f and D therein replaced with d − 1, 1, |det J Γ (x)| g n (Γ −1 (x)), f • Γ −1 and Γ(D), respectively, implies there exists a constant C = C (Γ, D, f ) ∈ (0, ∞) that does not depend on g n such that In order to deduce the constant C , we note that the same formula also holds when g n ≡ 1 and hence it follows that (1)).  (1)).
with L 1 , L 2 as in the proposition. (Note that there is an erroneous additional factor of √ 2πn in the denominator of the expression in [2, Equation (4.6)], which we have corrected). Comparing (5.15), (5.16) and (5.14), we see that The proposition then follows on substituting the above expression for C into (5.13).

5.4.
Continuity estimates for terms in the prefactor. In order to apply Proposition 5.7 to the expression forh n θ given in (5.10)-(5.11), we need to verify thath n θ satisfies Definition 5.5. The following lemma will be useful in verifying property (2) of Definition 5.5.
Remark 5.10. From (5.23), we will use in our proof the following equivalence that a statement about Z (n) / Z (n) holds P -almost surely if and only if the same statement with Z (n) / Z (n) replaced by Θ n holds P-almost surely.

Proof of Theorem 2.5(i).
We are now ready to prove the main estimate (2.18). Fix p ∈ (1, ∞) and a > 0 such that I p (a) < ∞ and recall the definition of the domainD a =D p,a given in (2.29). Since I p (a) is convex and symmetric, I p (a) is increasing for a ∈ R + . Thus, (2.10) and Lemma 2.2 imply that Hence, the infimum of Ψ * p over the closure cl(D a ) ofD a is attained at a * := (a, 1). Moreover due to (2.11), the assumption I p (a) < ∞ implies Ψ * p (a, 1) < ∞, and hence, a * = (a, 1) ∈ J p , defined in (2.7). Further, by (2.29), a * is a point on the smooth part of the the boundary ∂D a ofD a . Let U ⊂ R 2 > := {(x, y) ∈ R 2 : x > 0, y > 0} be an open neighborhood of a * to be chosen below and note that the boundary of U ∩D a is also smooth at a * . Then, for θ ∈ S, we can split the probability of interest from (5.1) into two parts: P θ S n ∈D a = P θ S n ∈D a ∩ U + P θ S n ∈D a ∩ U c . (5.32) The proof will proceed in two steps. In the key first step, we will estimate the first term on the right-hand side of (5.32) by integrating the estimate of the densityh n θ ofS n obtained in Proposition 5.4 over the domainD a ∩ U , and then analyze the asymptotics of the resulting Laplace type integral, as n → ∞ using Proposition 5.4, Proposition 5.7 and Lemma 5.8. The second step will involve using the LDP for (S n ) n∈N to show that the second term on the righthand side of (5.32) is negligible.
Proof. To verify property (1) of Definition 5.5, first note that Ψ * p is nonnegative since it is a rate function by Theorem 2.3. Next note that by (2.5), (5.2) and Lemma 5.2, Ψ p is twice (in fact infinitely) differentiable on D p = R × {t 2 : t 2 < 1/p}. Hence, by the duality of the Legendre transform [43, Section III.D], it follows that Ψ * p is twice differentiable inD a and achieves its minimum uniquely at a * = (a, 1) ∈ ∂(D a ∩ U ). Thus, property (1) of Definition 5.5 holds.
We next turn to the verification of property (2) of Definition 5.5. From (5.11), it follows thatḡ n θ (x) = exp(r n θ (x)), where Lemma 5.8 and the smoothness of x → log H −1/2 x , which follows from Remark 2.1 and (2.12), imply that for a sufficiently small neighborhood of x * , for any α ∈ (1/2, 1), there exist C ∈ (1, ∞) and a finite random variable N such that for σ-a.e. θ, r n θ satisfies property (2) of Definition 5.5 and the claim follows.
Given the claim, Proposition 5.7 applied with d = 2, D =D a ∩ U and h n (x) =ḡ n θ (x)e −nΨ * p (x) shows that for σ-a.e. θ, where L a,1 and L a,2 are the Weingarten maps of the curves C 1 := {x ∈ R 2 : Ψ * p (x) = Ψ * p (a, 1)} and C 2 := {x ∈ R 2 : x 1 = ax 1/p 2 }, evaluated at a * = (a, 1). To further simplify (5.35), first note that by the duality of the Legendre transform, and the definition of λ a,j in (2.13), we have Next, observe that [2,Example 4.3] shows that in R 2 , the Weingarten map is reduced to multiplication by the inverse of the radius of the osculating circle, which is equal to the absolute value of the curvature. Recall that for a curve in R 2 defined by the equation T (x, y) = 0 for a sufficiently smooth map T : R 2 → R, the curvature at a point x * on the curve is given by the formula Thus, to calculate the curvature of the curve C 1 at a * , use the above formula with T (x, y) = Ψ * p (x, y) − Ψ * p (a, 1) and x * = a * , and substitute the relations ∂ j Ψ * p (a * ) = λ a,j , j = 1, 2, and the definition of H a mentioned above to conclude that On the other hand, the curvature of the graph of a function y = T (x) at the point (x, T (x)) for sufficiently smooth T : R → R is given by T (x) /(1 + ( T ) 2 (x)) 3/2 . Recalling the definition of D a from (2.29), we can apply this with T (x) = (x/a) p to compute the curvature of C 2 = ∂D a at a * as: Substituting these calculations back into the expressions (5.33) and (5.38), and recalling the definitions ofḡ n θ from (5.11), C n a (θ n ) from (2.19) and ξ a and κ a from (2.15) and (2.16), we conclude that for σ-a.e. θ, Step 2. We now turn to the second term in (5.32). Note that by the continuity of Ψ * p , there exists η > 0 such that inf By the refinement in Lemma 2.2 of the (quenched) large deviation principle forS n established in [16, Proposition 5.3], Ψ * p achieves its unique minimum inD a at a * = (a, 1). Thus, for σ-a.e. θ,

5.6.
Proof of Theorem 2.5(ii). We start by obtaining expansions for R n a (Θ n ) and c n a (Θ n ). First, note that the functions a , a,1 and a,2 defined in (2.17) and their derivatives up to second order (for a,1 and a,2 ) and third order (for a ) are continuous and have at most polynomial growth by Lemma 5.2. Therefore, setting r n :=r n ( a ), s n :=ŝ n ( a ), t n,1 :=r n ( a,1 ), t n,2: =r n ( a,2 ), wherer n andŝ n are defined in (4.4) and (4.3), respectively, we can apply (5.9), (5.8) and Lemma 4.4 to obtain Moreover, Lemma 4.4 also shows that we have the convergence (r n , s n , t n,1 , t n,2 ) ⇒ where ( A, D, E, G) is jointly Gaussian with mean 0 and covariance matrix (2.21).
6. Proof of the sharp large deviation estimate for balls 6.1. Preliminary Notation. Fix p ∈ (1, ∞) and a > 0 such that I p (a) < ∞. The definitions in Section 2.4, specifically (2.32), yield the following expression for the tail probability of projections of n p balls: where for θ ∈ S,h n θ (x 1 , x 2 , y) is the density under P θ of the random vectorS (n,p) = (S (n,p) , U 1/n ) defined in (2.31), andD a :=D p,a ⊂ R 3 is the domain defined in (2.33). By the independence of U and Y (n,p) , for x ∈ R 2 and y ∈ (0, 1],h n θ (x 1 , x 2 , y) is the product ofh n θ (x 1 , x 2 ), the density of S (n,p) under P θ evaluated at (x 1 , x 2 ), and the density of U 1/n at y, which is equal to n y e n log y . Hence, by Proposition 5.4, we have the following uniform estimate forh n θ : for σ a.e. θ, withḡ n θ defined in (5.11), and Thus, as in Section 5.5, the integral (6.1) of interest is once again a Laplace-type integral, and so one expects the significant contribution to come from the value of the integrand in a neighborhood of the point where the minimum of F overD a is achieved. Now, for any x ∈ R 2 , the minimum of F (x, y) over y ∈ (0, 1] is clearly attained when y = 1, and by Lemma 2.2 the minimum of F (x, 1) over the region {x ∈ R 2 : x 2 > 0, x 1 x 1/p 2 = a} is attained at x = (a, 1). Together with the strict convexity of the function Ψ * p (a, 1) established in Theorem 2.3 and the fact that its minimum is attained at 0, this shows that for a > 0, the minimizing point is given by arg min F (x 1 , x 2 , y) = (a, 1, 1). (6.5) However, in this case, the boundary of the domainD a is not smooth at the minimizing point (a, 1, 1), and so instead of Proposition 5.7, we apply Proposition 5.6 to prove Theorem 2.11.

6.2.
Proof of the sharp quenched estimate for n p -balls. We now prove Theorem 2.11. Proof of Theorem 2.11. Fix p ∈ (1, ∞) and a > 0 such that I p (a) < ∞. For θ ∈ S, recall that the density ofS n can be expressed as in (6.1) and (6.2), and recall the assertion in (6.5) that the minimum of the function F in (6.4) onD a is attained at (a, 1, 1). Thus, for any open neighborhood U of (a, 1, 1) whose closure does not intersect the plane y = 0, we split the probability into two parts. Fix θ ∈ S. Then P θ S n ∈D a = P θ S n ∈D a ∩ U + P θ S n ∈D a ∩ U c . (6.6) For the first term in (6.6), we have the following estimate from (6.1) and (6.2): where g n θ and F are given in (6.3) and (6.4).
The bulk of the proof is devoted to the asymptotics of the Laplace type integral in (6.7). In order to apply Proposition 5.6, we first perform a change of variables to transform the domain of integration. Let T :D a → R 3 be the mapping that takes (x 1 , x 2 , y) to (X , Y, Z) such that Note that the transformation T is invertible in a neighborhood of (a, 1, 1), the Jacobian of this transformation at (a, 1, 1) is 1, the image ofD a under this transformation is and T maps the minimizer (a, 1, 1) of F to (0, 0, 0). Hence, under the transformation T, setting U := T(U), we rewrite (6.7) as Combining (6.4) with the duality relations (5.36)-(5.37), imply the following identities:
The expression on the right-hand side can be simplified further using first the relations T −1 (0, 0, 0) = (a, 1, 1), F (a, 1, 1) = Ψ * p (a * ), g n θ (a, 1, 1) =ḡ n θ (a * ), which follow from (6.3) and (6.4), together with the expressions for the partial derivatives of F calculated above, to obtain Substituting forḡ n θ and Ψ * p using the relations (5.11), (2.19) and (2.11), we then obtain where γ a = (detH a )γ a , which coincides with the definition given in (2.24). For the second term in (6.6), as in the proof of n p spheres in (5.42), one can invoke the quenched large deviation principle forS n established in [16,Proposition 5.3] along with the fact that the rate function has a unique minimum, as proved in Lemma 2.2 to show that it is negligible with respect to (6.10). When combined with (2.32), (6.6) and (6.10), this yields (2.25).

The joint density estimate
This section is devoted to the proof of the density estimate stated in Proposition 5.4. As usual, throughout fix p ∈ (1, ∞). In Section 7.1 an identity for the joint density is established in terms of an integral. This integral is then shown in Section 7.2 to admit an alternative representation as an expectation with respect to a tilted measure. The latter representation is used in Section 7.3 to obtain certain asymptotic estimates. These results are finally combined in Section 7.4 to prove Proposition 5.4.

7.
1. An integral representation for the joint density. Lemma 7.1 (Representation for the density ofS n under P θ ). Fix n ∈ N and θ ∈ S, and recall the definitions of Ψ p , J p , λ x , Φ p and Ψ n p,θ in (2.5), (2.7), (2.9), (5.6) and (5.7), respectively, and recall thath n θ is the density, under P θ , ofS n defined in (2.27). Then for all sufficiently large n, and x ∈ J p , the following identity holds, dt.
Proof. Let D p be as in (5.3), fix x ∈ J p and omit the subscript x from λ x ∈ D p ⊂ R 2 and the superscript p from many quantities for notational simplicity. Recall the definition ofV n j in (5.5) and for θ ∈ S, letl n θ be the density of the sum n j=1V n j under P θ . The moment generating function of this sum is given by where Y 1 , . . . , Y n are i.i.d. with density f p defined in (2.3) and the finiteness follows because λ ∈ D p and thus λ 2 < 1/p. Then the Fourier transform of the integrable function y → e λ,y l n θ (y) is given as follows 1 : for t ∈ R 2 , R 2 e λ+it,y l n θ (y)dy = E θ e λ+it, n j=1V n j We now make the following claim: Claim. There exists s > 1 such that for any λ ∈ D p , t ∈ R 2 and j, k ∈ {1, . . . , n}, j = k, we have We defer the proof of the claim, first showing how the lemma follows from the claim. Let s > 1 be as in the claim. Since the moment generating function Φ p is bounded, the claim holds for any s > s. Now, pick any integer n > 2s. Then Hölder's inequality and the claim imply that the right-hand side of (7.3) lies in L 1 (R 2 ). Hence, the second assertion of the lemma holds for any such n. We may then apply the inverse Fourier transform formula to conclude that, for all sufficiently large n, Next, recall that for any x ∈ J p , λ = λ x is chosen so that (2.9) is satisfied. Also, by (2.27) and (5.5), we haveS n = 1 n n j=1V n j .
Hence, using (7.5), (2.9) and (5.7), we see that the densityh n θ ofS n under P θ is given bȳ dt, for x ∈ J p . Since the right-hand side above coincides with the expression forh n θ given in (7.1) and (7.2), this proves the first part of the lemma given the claim.
To complete the proof of the lemma, it only remains to prove the claim. Proof of the claim. Fix n ∈ N, j, k ∈ {1, . . . , n}, j = k, and set θ 1 := θ n j and θ 2 := θ n k . Let υ :=ῡ n,j,k θ denote the density ofV n j +V n k under P θ . We assert that to prove the claim it suffices to show that the function R 2 z → e λ,z ῡ(z) lies in L 1+r (R 2 ) for some r ∈ (0, ∞). Indeed, then by the Hausdorff-Young inequality [14,Theorem 8.21], the Fourier transform of z → e λ,z ῡ(z) lies in L s , where s is the "conjugate exponent" of 1 + r. By (5.5) and (5.6), this is equivalent to saying that (7.4) holds with s = 1 + 1/r > 0.
For |z 1 | < M (z 2 ), we define y + and y − to be the two solutions to T (y) = z. Thus, T is locally invertible on its range and hence, by the change of variables formula and the differentiability of T , we may write the densityῡ as Here, |∂(y 1 , y 2 )/∂(z 1 , z 2 )| is the Jacobian of the transformation T at (y 1 , y 2 ), which is given by the explicit formula where sgn(·) denotes the sign function. For r > 0, the above discussion shows that √ n(θ 1 y 1 +θ 2 y 2 )+ λ 2 − 1 p (|y 1 | p +|y 2 | p )) 1+r ∂(y 1 , y 2 ) where the inequality follows from (a+b) 1+r ≤ 2 r (a 1+r +b 1+r ) for a, b ∈ R + , and the last equality uses the definition of T . Next, let N ⊂ R 2 be a neighborhood of the origin. Then 1+r ∂(y 1 , y 2 ) Since p ∈ (1, ∞) and x ∈ J p implies pλ 2 = pλ x,2 < 1, it follows that e λ 1 √ n(θ 1 y 1 +θ 2 y 2 )+ λ 2 − 1 p (|y 1 | p +|y 2 | p )) lies in L r (R 2 ) for any r > 0. Moreover, since p > 1, by (7.6), there exists r 1 > 0 small enough such that the Jacobian J T lies in L r 1 (N ). On the other hand, there exists 0 < r 2 < ∞ large enough such that the Jacobian J T lies in L r 2 (N c ). Thus, by Hölder's inequality, there exists r > 0 such that the last display is finite. This completes the proof of the claim, and therefore of the lemma.

7.2.
Representation of the integrand in terms of a tilted measure. We next obtain a representation for the integrand of the integral I n θ in (7.2) using a change of measure. Once again, from Section 2.4, recall the i.i.d. sequence of random variables (Y j ) j∈N defined on (Ω, F, P) that have density f p and are independent of Θ = (Θ n ) n∈N . Fix a > 0 such that I p (a) < ∞, recall the definition of λ = λ a from (2.9). Fix n ∈ N, and consider a "tilted" measure P n = P n,a on (Ω, F) such that the (marginal) distribution of Θ n remains unchanged but conditioned on Θ = θ ∈ S, {Y n j , j = 1, . . . , n} are still independent, but not identically distributed, with Y n j having density f n j = f n,a θ,j given by with Λ p as defined in (2.4) and as before we omit the explicit dependence and other quantities of f n j on p and a. For θ ∈ S, denote by P n θ and E n θ the probability and the expectation taken with respect to P n , conditioned on θ, and likewise, let Var n θ (·) and Cov n θ (·, ·) denote the conditional variance and conditional covariance, respectively, under P n θ . Recall from (2.4) and (5.6) that Λ p (t) = log Φ p (t) for t ∈ R 2 . Then, by (5.5), (5.6) and (7.7), it follows that for j = 1, . . . , n and β = (β 1 , β 2 ) ∈ R 2 , and hence, DenotingV n j = V n j,1 ,V n j,2 , by (7.8), we also have for k, l = 1, 2, Lemma 7.2. For x ∈ J p and θ ∈ S, recall the definitions ofV n j , Φ p , c n x , H n x and V n x given in (5.5) (5.6), (5.9) and (7.11). Then H n x (θ n ) t, t = Var Moreover, for t = (t 1 , t 2 ) ∈ R 2 , µ n x,θ (t) := E n θ e i t, . (7.14) Furthermore, for σ-a.e. θ, as n → ∞, H n x (θ n ) converges to the quantity H x defined in (2.12).
Proof. We fix θ ∈ S and x in the domain J p of Ψ * p defined in (2.7) and omit the subscript x from λ x for notational simplicity. By (7.9), (7.11), the definition of Ψ n p,θ in (5.7) and (2.8), we have, When combined with (5.9), this proves (7.12). Similarly, by the independence ofV n j , j = 1, . . . , n, under P n θ , (7.10), the definition of Ψ n p,θ in (5.7) and the definition of H n x in (5.9), it follows that which proves (7.13). Also, by the definitions ofμ n x,θ and V n x in (7.14) and (7.11), respectively, the independence ofV n j , j = 1, . . . , n, under P n θ and the relation (7.8), it follows that for t ∈ R 2 , , which proves (7.14). It only remains to establish the convergence stated in the last assertion of the lemma. By (5.9) and (5.7), it follows that for each i, j = 1, 2, there exists α, β ∈ N such that the entry (H n x (θ n )) ij can be written as Since, the moment generating function Φ p is infinitely differentiable, the mapping u → φ(u) := u α ∂ α 1 ∂ β 2 log Φ p (uλ 1 , λ 2 ) is continuous. Moreover, φ has polynomial growth by Lemma 5.2. Since Lemma 4.3 implies that W p (L n θ , γ 2 ) → 0 as n → ∞, it follows that where from Lemma 4.2(2) that, as n tends to infinity, the last equality follows by the definition of H x in (2.12).
We show boundedness of just the first term; boundedness of the second can be shown analogously.
Using following relation between cumulants and central moments, by simple calculation we have Now, by (7.13), Var n θ ( V n x,1 ) = (H n x (θ n )) 11 and so by the last assertion of Lemma 7.2, for σ-a.e. θ, as n → ∞, Var n θ ( V n x,1 ) converges to (H x ) 11 . Also, since the function R → ∂ 4 1 (log Φ p (uλ x,1 , λ x,2 )) is continuous and has polynomial growth (the latter by Lemma 5.2), Lemma 4.3 and Lemma 4.2(2) together show that for σ-a.e. θ, the second term on the right-hand-side of (7.17) also has a finite limit as n → ∞. Therefore, for σ-a.e. θ, the sum of the two terms is uniformly bounded.
Next, we deal with the second inequality. By (7.11), we have By Jensen's inequality, we further obtain We show the boundedness of the first term above, and the second follows similarly. For m ∈ {1, 2}, by the independence of (V n j,1 ) j=1,...,n , we have which is bounded above by (7.15). This proves (7.16).
Lemma 7.4. Fix x ∈ J p and recall the definitions of H x Φ p , c n x , H n x , V n x andμ n x,θ given in (2.12), (5.6), (5.9) (7.11) and (7.14), respectively. Then for σ-a.e. θ and every neighborhood U ⊂ R 2 of the origin, there exist a neighborhood U of x and a constant C ∈ (0, 1) such that for all sufficiently large n, sup t∈U c μ n y,θ (t) 1/n < C, y ∈ U . Proof. We omit the subscript x of λ x for notational simplicity. Now, for θ ∈ S, and t ∈ R 2 , the relation (7.8) yields the inequality Noting from (5.6) that Φ p (t) is the Fourier transform of the joint density of (Y 1 , |Y 1 | p ), evaluated at +it, we can apply the Riemann-Lebesgue lemma [14,Theorem 8.22] to obtain Thus, under the assumption that θ n j = 0, we see that Since Φ p is a moment generating function which converges to 0 at infinity, Φ p is strictly smaller than 1 other than at the origin. For any neighborhood of the origin U ⊂ R 2 and any 0 < K < ∞, there exists 0 < r < 1 such that for all t ∈ U c , if K −1 ≤ √ nθ n j ≤ K and θ n j = 0, then Combining this with (7.14) yields the inequality ) whose limit, as n → ∞, is dominated by c K := γ 2 [K −1 , K] > 0 due to Lemma 4.3, we have for σ-a.e. θ, Thus, for σ-a.e. θ, we have a uniform bound 0 < C < 1 such that for all sufficiently large n, Since Φ p is uniformly continuous in λ x by definition and λ x is a infinitely differentiable function of x by the inverse function theorem applied to (2.9), we may choose a neighborhood U of x such that for y ∈ U , sup t∈U c μ n y,θ (t) 1/n < C, i.e., for σ-a.e. θ and all sufficiently large n (possibly depending on θ), (7.18) holds. Next, note that by (7.14) and (7.12), for t ∈ R 2 , Thus, for θ ∈ S, by (7.13) and [12,Lemma 3.3.7], we have the following expansion: For ε > 0, by (7.16) of Lemma 7.3, we may choose a neighborhood U ⊂ R 2 of the origin with small enough radius so that the right-hand-side of the last display is bounded by ε||t|| 2 for t ∈ U . On the other hand, by the convergence of H n x (θ n ) to H x established in Lemma 7.2, for σ-a.e. θ, there exists ε > 0 such that H n x (θ n ) − εI is positive definite for all sufficiently large n (possibly depending on θ) and for t ∈ U , Note that the right-hand side of the last display converges to the integrable function exp(− 1 2 (H x − εI)t, t ) as n tends to infinity. Similar to the proof of (7.18), the uniformity of the bound in (7.19) follows from the definition in (5.9), (5.7) and the aforementioned uniform continuity of Φ p in x.
Combining (5.11), (7.22), (7.23), (7.26) and the estimate of the integral over U c in (7.24), we conclude that the asymptotic expansion for the densityh n θ (x) given in (5.10) holds uniformly for x in any compact subset of J p .
Appendix A. Infimum of the rate function In this section, we analyze the infimum of the rate function.
(A.8) Now, in view of (A.1), to compute I p (t) we have to first take the derivative of Ψ * p (τ t, τ p ) with respect to τ and set it to 0. Note that in the following, s 1 , s 2 are functions of τ and t satisfying (A.6) and (A.7). Using (2.5) and (A.1), we first rewrite Ψ p (s 1 , s 2 ) as ∂s 1 ∂τ τ t + s 1 t + ∂s 2 ∂τ τ p + ps 2 τ p−1 − ∂s 2 ∂τ Setting the derivative computed above to 0, we conclude that the minimum over τ > 0 in (A. where we use Z n = Z n n,2 to denote the Euclidean norm of the vector Z n := (Z n 1 , · · · , Z n n ).
Since F is a thrice continuously differentiable function, we may apply Taylor's theorem, for x ∈ R and h > 0 to obtain for some x ∈ (x, x + h). With the expansion above, we obtain wherer n (·) andŝ n (·) are defined in (4.4) and (4.3), respectively, and Z n i ∈ R lies between Z n j and √ nZ n j / Z n . In the following, the notation o(1) means having order o(1) in probability P. We first show that the last term in (B.2) is of order o(1/n) in probability. By assumption, |F | has polynomial growth, so there exist q > 0 and C < ∞ such that Therefore, for each n ∈ N, Since Z n j lies between Z n j and √ nZ n j / Z n , and √ n/ Z n converges to 1 almost surely. For each 0 <C < ∞, there exists N = N (w) such that a.s. for all n > N , Combining the last two inequalities above, we obtain for some constant C < ∞, and all n > N , From the Gaussian concentration inequality (see [38, Theorem 3.1.1]), there exists a universal constant c such that for δ > 0, Given > 0, we have On the other hand, since (Z n j ) j=1,...,n are independent, by the strong law of large numbers for triangular arrays, as n tends to infinity, almost surely Similarly, the strong law of large numbers also ensures that as n tends to infinity, We may then rewrite (B.2) as follows: Due to the assumption that F , G 1 and G 2 all have polynomial growth, the variances of F (Z), F (Z)Z, F (Z)Z 2 , G 1 (Z), G 1 (Z)Z, G 2 (Z) and G 2 (Z)Z are all finite. Define sequences (A n ), (B n ), (C n ), (D n ), (E n ), (F n ), (G n ) and (H n ) as follows: By the Skorokhod representation theorem, we can find ( A n , B n , C n , D n , E n , F n , G n , H n ) and M := ( A, B, C, D, E, F, G, H) all defined on some common probability space, such that (A n , B n , C n , D n , E n , F n , G n , H n , M ) Now, we substitute ( A n , B n , C n , D n , E n , F n , G n , H n ) into (B.6), and we first take care of r n where H 1 : R 2 → R is the mapping Since B n / √ n and D n / √ n converge to 0 almost surely by (B.8), we consider the Taylor expansion of H 1 at (0, 0): Combining the last three displays, we obtain By the a.s. convergence, ( A n , D n ) → ( A, D), we see that as n tends to infinity, Applying Slutsky's lemma and the almost sure convergence above, we obtain as n → ∞.
Similarly, for s n we havê where H 2 : R 2 → R is the mapping Note that C n / √ n and D n / √ n converge to 0 almost surely by (B.8). We now apply the Taylor expansion to H 2 at (0, 0) and obtain With the above expansion for H 2 , we writê as n tends to infinity, which holds since D n → D almost surely. This completes the analysis of the expansion for F . Fix i = 1, 2, we next consider the expansion for G i . Following the same method, we can write Again by assumption, G i has polynomial growth, and thus the last term is of order o(1). Hence, we may rewrite the terms above as follows: . The second assertion of the lemma is a consequence of (B.9), (B.10), the analog of (B.9) with F replaced with G i and the joint convergence of ( A n , D n , E n ) ⇒ ( A, D, E) and ( A n , D n , G n ) ⇒ ( A, D, G).

Appendix F. A uniform deviation estimate
We now establish Lemma 5.9. Key ingredients of the proof include the Gaussian concentration inequality and certain deviation estimates that are uniform with respect to a class of functions, much in the spirit of uniform Glivenko-Cantelli or Donsker classes.
Combining the last two displays, we see that sup t∈D, t <ε Since (Z j ) j∈N are independent, by the strong law of large numbers, P -almost surely, as n → ∞, (1 + (Cε) q |Z j | q ) |Z j | → E (1 + (Cε) q |Z| q ) |Z| and Z (n) √ n → 1. (F.10) We now treat each of these terms individually.
Step 2A. For the first term I n 21 in (F.16), we start by proving that there exist C 3 < ∞ and a random integer N 3 such that P -almost surely, I n 21 ≤ C 3 + 2ε for n ≥ N 3 .

(F.19)
Proof of Step 2A bound. The proof of (F.19) starts with the following claim.
Claim. For x ∈ R and t ∈ D, t < ε, t → T n (x, t) is Lipschitz continuous with constant C(1 + ε q r q n )r 2 n . Proof of the claim. Define x n := x, if x < r n , sgn(x)r n , if x ≥ r n .
The last two displays and (F.4) imply that |∂ t 2 T n (x, t)| ≤C(1 + ε q r q n )r n , for x ∈ R and t ≤ ε. (F.21) Thus, the claim follows from (F.20) and (F.21). We now continue with the proof of Step 2A. For n ∈ N, let δ n and k n be finite positive constants given by δ n := ε C(1 + ε q r q n )r 2 n n 1−α and k n := C (1 + ε q r q n )r 2 n n 1−α 2 .
Since this is summable in n due to the definition of r n in (F.13), the first inequality in (F. 25