Edgeworth expansions for volatility models

Motivated from option and derivative pricing, this note develops Edgeworth expansions both in the Kolmogorov and Wasserstein metric for many different types of discrete time volatility models and their possible transformations. This includes, among others, H\"{o}lder-type functions of (augmented) Garch processes of any order, iterated random functions or Volterra-processes.


Introduction
Consider a strictly stationary sequence (X k ) k∈Z of real-valued random variables with EX k = 0 and EX 2 k < ∞.If the sequence exhibits weak dependence in a certain sense, then the distribution of n −1/2 S n , where S n = X 1 + X 2 + . . .+ X n , is asymptotically normal, see for instance [35] and the references therein.This fact has made the central limit theorem one of the most important tools in probability theory and statistics.On the other hand, it was already noticed by Chebyshev [9] and Edgeworth [15] that normal approximations can be improved in terms of (Edgeworth) expansions Ψ n , implying the approximation (or even better) (1) in the Kolmogorov metric, where with Motivated by applications in actuarial science, Cramér gave rigorous proofs in [10], and ever since, Edgeworth expansions have been an indispensable tool in actuarial science and finance, see the discussion below.They also arise in the context of dynamical system theory and Markovian setups, e.g.[18], [31], [26] and the references therein for more recent results, and also [24], [32] for a general, weakly dependent framework1 .On the other hand, in a very influential work, Efron [16] broadened the view on resampling techniques (e.g.bootstrapping) and demonstrated their significant superior performance compared to normal approximations, see [17], [25], [33] for an overview.Not surprisingly, the key tools for analysing, and, in particular, showing superiority of resampling methods, are again Edgeworth expansions.
Our main motivation here stems more from actuarial, econometric, finance and risk management considerations.Very prominent models in these areas in a discrete time setting are (augmented) Garch processes, e.g.[12], [14], [7], [27], [20].It is well-known that already the Black-Scholes formula for option pricing has serious shortcomings, e.g.[23].To address these problems, a common and quite successful approach is to employ more complex models and use (Edgeworth) expansions to salvage comparatively simple and easy to evaluate formulas, see for instance [1], [2], [13], [19], [21], [22], [28].To illustrate this further, consider the (standard) model where log P nt is the log-price of some derivative under a martingale measure for appropriate function h.Here, ǫ k are the innovations and V 2 k is some volatility process with Then one seeks an approximation of the type where the function f (x) describes the pay-off of some option and Λ nt is a 'convenient', signed measure.Since we may express log /2n in terms of a normalised and centred sum of (dependent) random variables, the connection to Edgeworth expansions is obvious.A rather prominent example in this context are European Put-Options, where f (x) = max{K − e x , 0} for some strike price K > 0. Now |f (x) − f (y)| ≤ K|x − y|, and hence f (x) is Lipschitzcontinuous.The latter is true for many options, and thus another natural metric to measure the quality of Edgeworth expansions is in terms of the Wasserstein metric W 1 .In the latter, a Gamma-type approximation is more convenient in our setting, see (9) for more details and definitions.
Our main contribution and novelty here is to establish the validity of Edgeworth expansions both in the Kolmogorov and Wasserstein metric for various classes of popular volatility type models.This includes in particular -for the first time, to the best of the authors knowledge -functions of (augmented) Garch(p, q)-processes of any order.Previously, only the case p = q = 1 appears to have been treated in the literature.In addition, it seems that there are almost no results concerning Edgeworth expansions for weakly dependent processes in terms of the Wasserstein distance in general.This note is structured as follows.In Section 2, we present the setup and our main global results.We then show how to use these to derive the validity of Edgeworth expansions both in the Kolmogorov and Wasserstein metric for Hölder type functionals of various volatility type models, see Section 3.1 (augmented Garch processes), Section 3.2 (iterated random functions), Section 3.3 (linear processes) and Section 3.4 (Volterra processes) for details.

Setup and main global results
For a random variable X, we write EX for expectation, X p for E|X| p 1/p , p ≥ 1, and sometimes E H X = E[X|H] for conditional expectation, and in analogy P H (•) for conditional probabilities., , (≈) denote (two-sided) inequalities involving a multiplicative constant.For a, b ∈ R, we put a ∨ b = max{a, b}, a ∧ b = min{a, b}.For two random variables X, Y , we write X d = Y for equality in distribution.For an i.i.d.sequence (ǫ k ) k∈Z , let E k = σ ǫ j , j ≤ k , and Consider a sequence of real-valued, measurable random variables X 1 , . . ., X n .It is well known (cf.[37]), that this sequence can be assumed to satisfy X k ∈ E k , that is, we have for some measurable functions g k2 , where (ǫ k ) k∈Z is a sequence of independent and identically distributed random variables.For notational convenience, we sometimes assume g k = g, that is, the function g does not depend on k.Such processes are usually referred to as (time-homogenous) Bernoulli-shift processes.
Representation (5) allows to give simple, yet very efficient and general dependence conditions.Following [39], let (ǫ ′ k ) k∈Z be an independent copy of (ǫ k ) k∈Z on the same probability space, and define the 'filter' θ We write ), and . As dependence measure, one may then define If g = g k does not depend on k (time-homogenous case), ϑ * l (p) simplifies to ϑ * l (p) = X l − X * l p .
Our basic condition regarding weak dependence is now the following.
Assumption 2.1.For p > 3, (X k ) k∈Z is stationary and satisfies Our requirement of stationarity is more a convenience condition, and can be replaced with quenched or locally stationary setups.
As is well known, validity of Edgeworth expansions is not for free and requires some non-lattice condition.We need the following regularity asssumptions regarding the underlying distribution.
, which is independent of E l .For any δ > 0 and l ∈ Z, there exists a family of random variables ( Observe that condition (B2) is a non-lattice condition, and will be easy to verify in case of our applications.The key to our results is (B1), which is a small ball condition.While it is not true in general, we show below that it does hold for a huge class of volatility models and their Hölder-continuous transformations (and even more).Note that validity of (B1) does not imply that k or even X k is non-lattice.Our first result is the following.Theorem 2.1.Assume that Assumptions 2.1 and 2.2 hold.Then In particular, there exists b n → ∞ and δ > 0 such that for any a > 0 Next, we turn to the Wasserstein metric W 1 .For two probability measures P 1 , P 2 , let L(P 1 , P 2 ) be the set of all probability measures on R 2 with marginals P 1 , P 2 .The Wasserstein metric (of order one) is defined as the minimal coupling L 1 -distance, that is, Let V n be the (signed) measure induced by Ψ n .Then a priori, the distance W 1 (P 1 , V n ) is not defined in general.In [6], generalized transport distances are introduced that also allow for signed measures.In order to maintain the original definition in terms of couplings, we follow [30] and replace Ψ n with a probability measure that is induced by a sequence of i.i.d.random variables.Let Z be a zero mean Gaussian random variable N (0, σ 2 ) with variance σ 2 = s 2 n , and G follow a Gamma distribution Γ(α, β) with shape parameter α = s 2 n β and rate be i.i.d., and denote by P Ln the probability measure induced by Theorem 2.2.Grant Assumption 2.1, and suppose that (8) holds.Then Due to Theorem 2.1, an immediate consequence is the following.

Volatility models
Over the past decades, the following basic model has emerged as a key building block in econometrics, finance and actuarial science for an underlying process (Y k ): where (ǫ k ) k∈Z are i.i.d. and V k ∈ E k is some volatility process.The actual models for asset prices (and related) are then obtained by appropriate transformations, compensations or by passing on to the limit to obtain stochastic differential equations.A sheer endless amount of models and processes of this type have been established and discussed.Since our focus here lies on discrete time, we mention for instance [12], [14], [7], [27], [20] which, however, presents only an almost infinitesimal fraction of the literature.
Our basic setup here is the following.We consider processes Y k of type (11), where we assume that V k ∈ E k is stationary.In the sequel, it will be convenient to use Consider the price of an asset P k = e n −1/2 S k .If we select h n ≡ 0 and f such that then, given sufficiently many (exponential) moments 3 , we obtain Hence P k is almost a martingale, and the actual error in ( 13) can be made arbitrarily small by further specifying g.On the other hand, formally letting it follows that P k is a martingale.Note in particular that if ε has a standard Gaussian distribution N (0, 1), then we get the well-known form ).More generally, a formal Taylor expansion around zero with Eε = 0 leads to We wish to apply Theorem 2.1 to , where we assume EX k = 0. Having in mind ( 12), (15), but also statistical applications (power transformations), we consider functions f, h n satisfying the generalised Hölder condition For future reference, we denote this class with H(L, α, β).Moreover, we assume that which in light of ( 15) is a mild condition.
Our basic condition to verify Assumption 2.2 is the following.
We now discuss three particular, yet quite general class of models.We note, however, that the method of proof is more general and can also be applied to other classes of models.

Augmented Garch
Arch, Garch and augmented Garch models have had a huge impact both in theory and practice, see [7], [14] and [20].In one of its more general forms, it can be stated as where g i , c i ≥ 0 are positive functions, and g i0 ≥ ω > 0 for some 1 ≤ i 0 ≤ p. Motivated from the Box-Cox transformation, the function Λ(•) typically comprises the cases Λ(x) = log x or Λ(x) = x λ , λ > 0. We will consider the latter.For q ≥ 1, an important quantity is where we replace possible undefined c i (and g i ) with zero.If γ c < 1, then (V k ) k∈Z is stationary.In particular, one can show the representation see [4] for comments and references on this matter.In particular, V k is a timehomogenous Bernoulli-shift process, that is, g k = g does not depend on k in representation (5).
We then have the following result.

Iterated random functions
An iterated random function system on the state space R is defined as where is the ε-section of a jointly measurable function F : R × R → R.Many dynamical systems, Markov processes and non-linear time series are within this framework, see for instance [11].
We then have the following result.Theorem 3.2.Grant Assumptions 3.1 and 3.4.Then both Assumptions 2.1 and 2.2 hold.In particular, Theorem 2.1 and Corollary 2.3 apply.

Volterra processes
In the study of nonlinear processes, Volterra processes are of fundamental importance, see for instance [3], [36] or [38].We consider where ǫ k p < ∞ for p ≥ 2, and a k are called the k-th Volterra kernel.Let Then by the triangle inequality, there exists a constant C such that We thus require the following assumption.

Proofs of Theorem 2.1 and Theorem 2.2
Throughout the proof, for notational convenience, we assume for simplicity that we are in the time-homogenous Bernoulli-shift case.This means that g k = g in ( 5), and, in particular, the quantities in Assumption 2.2 are invariant in l ∈ Z.
Moreover, for 1 ≤ j ≤ n, let (ǫ k ) k∈Z be independent copies of (ǫ k ) k∈Z .For each 2(j − 1)m + 1 ≤ k ≤ 2jm, define and note that X k d = X km .Finally, let us introduce the quantities We recall parts of Theorem 2.7 in [30], which we restate as the following lemma for the sake of reference.Lemma 4.1.Grant Assumption 2.1.Then there exists δ > 0, such that where T n ≥ c √ n for some c > 0.
In addition, we require the following technical result in the sequel.
Lemma 4.2.Grant Assumption 2.1.Then there exists C > 0, such that Proof of Lemma 4.2.This is an easy consequence of Equation ( 50) in [29] (note p , a ≥ 0, p ≥ 1, see also Theorem 1 in [34]).Note that the construction of X km is slightly different in [29], but the argument remains equally valid.
Proof of Theorem 2.1.Let n = ⌊n/m⌋ for m ∈ N. The proof works with any choice m ≍ n m , m ∈ (1/2, 1).In order to establish the claim, it suffices to show that for any a > 0, the Berry-Esseen Characteristic To this end, we will study Ee iξSn more closely, subject to Assumptions 2.1 and 2.2.Due to |e ix − 1| ≤ |x|, |e ix | = 1, and Lemma 4.2, we have Observe that (B j ) 1≤j≤n is conditionally independent with respect to F m , and is a one-dependent sequence in general.Let I = {1, 3, . . ., 2⌊n/2⌋ − 1}.Then (B j ) j∈I is an independent sequence.Hence using |e ix | = 1, we have Next, put , and observe that for 1 ≤ j ≤ n, we have the identity Let ≤ E H e iξA1:m + E H e iξA1:m (e iξ(Am+1:2m−A + ) − 1) Observe that A m+1:2m − A + is independent of E 0 .Using |e ix − 1| ≤ |x|, we thus further obtain Then due to (B1), for any δ > 0, there exists a set A δ ∈ G m+1 with P(A δ ) ≥ c δ > 0, such that for appropriate choice of Since A δ ∈ H, we obtain from the above and

Next, observe
We next deal with E E E0 e iξA1:m .To this end, let E One readily shows that the map g : R → [0, 1], given by where we used (B2) for the last inequality.Hence for any 0 < a < b, there exists Setting δ = (1 − η ab )/2b, we obtain from ( 29) and ( 30) that for any ξ ∈ [a, b], there exists ρ ab < 1 such that (recall Consequently, since |I| ≥ n/3 for n large enough, we get ab . Combining (27) and the above, we conclude that there exists b n → ∞ such that for any a > 0, we have sup ξ∈[a,bn] j∈I Using (26), it follows that sup ξ∈[a,bn] and which, by virtue of Lemma 4.1, completes the proof.
For the proof of Theorem 2.2, we require some additional notation.For e > 0 and f ∈ N even, let G e,f be a real valued random variable with density function for some constant c f > 0 only depending on f .It is well-known (cf.[5], Section 10) that for even f the Fourier transform ĝe,f satisfies where = G e,f and independent of S n .For η > 0, define and L ⋄ n in analogy.Proof of Theorem 2.2.Using standard properties of the Wasserstein distance and the triangle inequality, we arrive at For f ≥ 6 and small enough e > 0, we get, using (8), for any c, η > 0 Following the proof of Theorem 3.6.in [30], we obtain where τ n √ log n.By (36), we have for T b a (x) (defined with respect to which does not depend on x.Hence for C a √ n (defined with respect to An application of Lemma 4.1 then yields Plugging this into (37), we obtain Selecting η = η n → 0 sufficiently slow ((8) must be valid, see (36)), the claim follows by combining (35) and (38).

Proofs of Volatility models
We first state the following elementary lemma.
Lemma 5.1.Suppose that the function f satisfies (16), and assume Y 0 p(α+β) Proof of Lemma 5.1.Using Hölders inequality with r = α/β + 1, s = (α+ β)/α, we get 5.1.Proof of Theorem 3.1 Proof of Theorem 3.1.In order to apply Theorem 2.1, we need to validate both Assumptions 2.1 and 2.2, based on Assumptions 3.1 and 3.3.We will do so below.Since, as mentioned above, V k is a time-homogenous Bernoulli-shift process, the quantities in Assumption 2.2 do not depend on l ∈ Z, simplifying the notation.We first consider the case h n ≡ 0.
(B1): We first validate (B1), which requires most attention.To this end, we first introduce some necessary quantities.Let and which we will repeatedly use.For δ > 0, denote with and let where we used Markovs inequality in the last step.This simple lower bound is the key for establishing (B1).
, where we recall c 1 (ε) q + . . .c r (ε) q ≤ γ c < 1.From the above, we conclude (on the event B 1jη ), and similarly, one obtains (still on the event B 1jη ) In the following derivations below, all norms • r are taken with respect to P E1j .Since V k ≥ V + k and λ ≥ 1/2, we have from |x 1/(2λ) − y 1/(2λ) | ≤ |x− y| 1/(2λ) and Cauchy-Schwarz By independence, (41) and Jensens inequality (if β/λ < 1, we again apply Jensens inequality in addition) Similarly, by independence and (42) All in all, on the event B 1jη , combining (43), (44), and (45), we arrive at , where C + does not depend on η, and ρ < 1 only depends on γ c , λ and β.Moreover, we have , and consequently Hence selecting η such that C + η β 1 1−ρ < δ, we conclude B 1jη ⊆ A 1jδ , and (V1) yields Similarly, we obtain on B 1jη , for k > j, the estimate Selecting j 0 sufficiently large, we get for j ≥ j 0 Hence (B1) holds.Let us now consider the case h n ≡ 0, which turns out to be just a minor extension.Arguing as above in (43), (44), and (45), we obtain , where C does not depend on n.Hence by the triangle and Jensen's inequality On the other hand, (17), the triangle and Jensen's inequality imply that there exists l n → ∞, such that as n increases.Setting h + n (V k ) = h n (V + k ) for k ≥ l n and h + n (V k ) ≡ 0 otherwise, we conclude from the above Piecing everything together, the validity of (B1) follows.
(A2): Arguing similarly as above for establishing (B1), one derives The claim now follows from Lemma 5.1 and the triangle inequality.

Proof of Theorem 3.2
Proof of Theorem 3.2.
Although we are no longer in the time-homogenous Bernoulli-shift setup, it is obvious that we can repeat the proof of Theorem 3.1, almost verbatim.In fact, due to the more explicit iterative structure, some computations are even simpler.

Proof of Theorem 3.3
Proof of Theorem 3.3.As for augmented Garch sequences, we are again in the time-homogenous Bernoulli-shift case.Let It is again obvious that we can repeat the proof of Theorem 3.1, almost verbatim.As in the case of Theorem 3.2, the actual proof is even simpler.

Proof of Theorem 3.4
Proof of Theorem 3.4.As for augmented Garch sequences and functions of linear processes, we are again in the time-homogenous Bernoulli-shift case.Let Since clearly V + 1 ∈ E + 1 , we may now repeat the proof of Theorem 3.1.As in previous cases, the actual proof is even simpler.
denotes the f -fold convolution of the density of the uniform distribution on [−e, e], that is u[−e, e](t) = 1 2e ½ [−e,e] (t).For f ≥ 6, let (H k ) k∈Z which is always possible by Assumption 3.3.Assume without loss of generality g i (ε) q ≤ C g .Then by Assumption 3.3, we have on the event B 1jη