High-resolution quantization and entropy coding for fractional Brownian motion

We establish the precise asymptotics of the quantization and entropy coding errors for fractional Brownian motion with respect to the supremum norm and L p [0 , 1]-norm distortions. We show that all moments in the quantization problem lead to the same asymptotics. Using a general principle, we conclude that entropy coding and quantization coincide asymptotically. Under supremum-norm distortion, our proof uses an explicit construction of eﬃcient codebooks based on a particular entropy constrained coding scheme.


Introduction
Functional quantization and entropy coding concern the identification of "good" discrete approximations to a non-discrete random signal (original ) in a Banach space of functions. These approximations are required to satisfy a range constraint in the context of quantization and an entropy constraint in the context of entropy coding. Such discretization problems arise naturally when digitizing analog signals in order to allow storage on a computer or transmission over a channel with finite capacity.
As another application, the approximating functions of good quantizers may serve as evaluation points of quasi Monte Carlo methods. Moreover, some Monte Carlo methods use appropriate quantizers to carry out a variance reduction (see for instance (18) and the references therein, or (10)).
Previous research addressed, for instance, the problem of constructing good approximation schemes, the evaluation of the theoretically best approximation under an information constraint, existence of optimal quantizers and regularity properties of the paths of optimal approximations. For Gaussian measures in Hilbert spaces optimal quantizers exist and all its approximating functions are elements of the reproducing kernel Hilbert space (16). Under mild assumptions the best achievable distortion in both problems (the quantization and entropy coding problem) coincide asymptotically and do not depend on the moment under consideration ((17), (4)). Moreover, this (optimal) approximation error is asymptotically equivalent to the distortion rate function which can be expressed implicitly in terms of the eigenvalues of the covariance operator.
When the underlying space is a Banach space, the approximation errors of both problems are weakly asymptotically equivalent to the inverse of the small ball function in many cases ((8), (5)). Thus asymptotic estimates for the small ball function can be translated into asymptotic estimates for the above coding problems (see for instance (15) for a summary of results on small ball probabilitites). Moreover, many approximation quantities of Gaussian measures are tightly connected to the quantization numbers (see (14), (2)). See also (12) for existence and pathwise regularity results of optimal quantizers. The above questions are treated for Gaussian measures in Hilbert spaces by Luschgy and Pagès ((16), (17)) and by the first-named author in (4). For Gaussian originals in Banach spaces, these problems have been addressed by the authors and collaborators in (8), (9), (4), (5) and by Graf,Luschgy and Pagès in (12). For general accounts on quantization and coding theory in finite dimensional spaces, see (11) and (1) (see also (13)).
In this article, we consider the asymptotic coding problem for fractional Brownian motion under supremum and L p [0, 1]-norm distortion. We derive the asymptotic quality of optimal approximations. In particular, we show that efficient entropy constrained quantizers can be used to construct close to optimal high resolution quantizers when considering the supremum norm. Moreover, for all of the above norm-based distortions, all moments and both information constraints lead to the same asymptotic approximation error. In particular, quantization is asymptotically just as efficient as entropy coding. The main impetus to the present work was provided by the necessity to understand the coding complexity of Brownian motion in order to solve the quantization (resp. entropy constrained coding) problem for diffusions (see (7) and (6)).
Let (Ω, A, P) be a probability space, let H ∈ (0, 1) and let X = (X t ) t≥0 denote fractional Brownian motion with Hurst index H on (Ω, A, P), i.e. (X t ) t≥0 is a centered continuous Gaussian process with covariance kernel For a > 0, let C[0, a] and D[0, a] denote the space of continuous real-valued functions on the interval [0, a] and the space of càdlàg functions on [0, a], respectively. Moreover, we let (L p [0, a], · L p [0,a] ) denote the standard L p -space of real-valued functions defined on [0, a]. Finally, · s , s ∈ (0, ∞], denotes the L s -norm induced by the probability measure P on the set of real-valued random variables.
Let us briefly introduce the concepts of quantization and entropy coding. For fixed a > 0 let d : over all measurable functions π : C[0, a] → D[0, a] with discrete image (strategies) that satisfy a particular information constraint parameterized by the rate r ≥ 0.
Often we associate a sequence of probability weights (p w ) w∈im(π) to a strategy π. Then due to Kraft's inequality, there exists a prefix-free representation for im(π) which needs less than (− log 2 p w ) + 1 bits to represent w ∈ im(π). Thus the pair (π, (p w )) corresponds to a coding scheme translating the original symbol x into a prefix-free representation for π(x). The best average code length is achieved for p w = P(π(Y ) = w), which leads to an average code length of about H(π(Y ))/ log 2 (see for instance (1), Theorem 5.2.1).
Entropy coding (also known as entropy constrained quantization in the literature) concerns the minimization of (1) over all strategies π having entropy H(π(Y )) at most r. Recall that the entropy of a discrete r.v. Z with probability weights (p The entropy constraint represents an average case complexity constraint. In the quantization problem, one is considering strategies π satisfying the range constraint: | range (π(Y ))| ≤ e r which is a static complexity constraint. The corresponding approximation quantities are the entropy-constrained quantization error where the infimum is taken over all strategies π with entropy rate r ≥ 0, and the quantization error the infimum being taken over all strategies π having quantization rate r ≥ 0. Often, all or some of the parameters Y , d, s are clear from the context, and will therefore be omitted. The quantization information constraint is more restrictive, so that the quantization error always dominates the entropy coding error. Moreover, the coding error increases with the moment under consideration.
Unless otherwise stated, we choose as original the fractional Wiener process Y = X. We are mainly concerned with two particular choices for the distortion d. First we analyse the supremum norm distortion that is d(f, g) = f − g [0,1] . In this setting we find: There exists a constant κ = κ(H) ∈ (0, ∞) such that for all s 1 ∈ (0, ∞] and Remark 1.2. In the above theorem, general càdlàg functions are allowed as reconstructions. Since the original process is continuous, it might seem more natural to use continuous functions as approximations. The following argument shows that confining oneself to continuous approximants does not change the corresponding quantization and entropy quantity, when s ∈ [1, ∞).
Let π : C[0, 1] → D[0, 1] be an arbitrary strategy and let τ n : D[0, 1] → C[0, 1] denote the linear operator mapping f to its piecewise linear interpolation with supporting points 0, 1 n , 2 n . . . , 1. Then Note that the second term vanishes when n tends to infinity and that τ n • π satisfies the same information constraint as π. The argument can be easily modified in order to show the statement for s ∈ (0, 1). 1] , we prove the following analog to Theorem 1.1: As in Remark 1.2, a simple convolution type argument shows that allowing L p [0, 1]approximations yields the same coding errors as restricting oneself to C[0, 1]-approximations.
For ease of notation, the article is restricted to the analysis of 1-dimensional processes. However, when replacing (X t ) by a process (X t ) consisting of d independent fractional Brownian motions, the proofs can be easily adapted, and one obtains analogous results. In particular, it is possible to prove analogs of the above theorems for a multi-dimensional Brownian motion.
Let us summarize some of the known estimates for the constant κ in the case where X is standard Brownian motion, i.e. H = 1/2.
• Under supremum-norm distortion, the relationship between the small ball function and the quantization problem (see (8)) shows that • Under L p [0, 1]-norm distortion, κ p may again be estimated via a connection to the small ball function. Indeed, letting where the infimum is taken over all weakly differentiable ϕ ∈ L 2 (R) with unit norm, one has κ p ∈ [c, √ 8 c] In the case where p = 2, the constant κ 2 is known explicitly: κ 2 = √ 2 π (see (17) and (4)).
The article is organized as follows. In Sections 2 to 5 we consider the approximation problems under the supremum norm. We start in Section 2 by introducting a coding scheme which plays an important role in the sequel. In Section 3, we use the construction of Section 2 and the self similarity of X to establish a polynomial decay for D (e) (·|∞). In the following section, the asymptotics of the quantization error are computed. The proof relies on a concentration property for the entropies of "good" coding schemes (Proposition 4.4). In Section 5, we use the equivalence of moments in the quantization problem to establish a lower bound for the entropy coding problem. In the last section, we treat the case where the distortion is based on the 1] ; we introduce the distortion rate function and prove Theorem 1.3 with the help of Shannon's source coding theorem.
It is convenient to use the symbols ∼, and ≈. We

The coding scheme
This section is devoted to the construction of strategies π (n) : C[0, n] → D[0, n] which we will need later in our discussion. The construction depends on three parameters: M ∈ N\{1}, d > 0 and a strategy π : The coding scheme is motivated as follows: Due to the self similarity of the fractional Wiener process, coding X on [0, 1] with accuracy εn −H is as hard as coding X on the time interval [0, n] with accuracy ε (see the argument at the end of the proof of Lemma 3.4). Intuitively, one may decompose the coding scheme for (X t ) t∈[0,n] into two steps. First store information on the values (X j ) j=1,...,n−1 and then approximate the paths X (j) = (X j+t − X j ) t∈[0,1) by π(X (j) ) for j = 0, . . . , n − 1. The parameter M governs the rate spent on coding the first part, and we shall see that for ε small most rate is spent for coding the second part. As in Shannon's source coding theorem, the ergodicity of (X (j) ) j∈N 0 can be used to construct close to optimal codebooks when n is large. This will be done in the proof of Theorem 4.1 to derive an upper bound for the quantization error. Moreover, the coding scheme leads to a weak form of subadditivity which we use to prove polynomial decay of D (e) (·|∞) (see Theorem 3.1 and Lemma 3.4).
We define the maps π (n) by induction. Let w ∈ C[0, ∞) and set (w 1). Assume that (ŵ t ) t∈[0,n) (n ∈ N) has already been defined. Then we choose ξ n to be the smallest number in and extend the definition ofŵ on [n, (n + 1)) by settinĝ Note that (ŵ t ) t∈[0,n) depends only upon (w t ) t∈[0,n) , so that the above construction induces strategies for an appropriate measurable function ϕ n : The main motivation for this construction is the following property. If one has, for some In particular, if π : then for any n ∈ N, 3 Polynomial decay of D (e) (r|∞) The objective of this section is to prove the following theorem.
Remark 3.2. It was found in (4) (see Theorem 3.5.2) that for finite moments s ≥ 1 the entropy coding error is related to the asymptotic behavior of the small ball function of the Gaussian measure. In particular, for fractional Brownian motion, one obtains that In order to show that D (e) (r|∞) is of the order r −H , we still need to prove an appropriate upper bound. We prove a stronger statement which will be useful later on. w ) w∈im(π (r) ) such that for any s ≥ 1, In particular, The proof of the lemma is based on an asymptotic estimate for the mass concentration in randomly centered small balls, to be found in (9). LetX 1 denote a fractional Brownian motion that is independent of X with L(X) = L(X 1 ). Then, for any s ∈ [1, ∞), one has as ε ↓ 0 (see (9), Theorem 4.2 and Corollary 4.4).
Proof. For a given D[0, 1]-valued sequence (w n ) n∈N∪{∞} , we consider the following coding strategy π (r) (·|(w n )): let with the convention that the infimum of the empty set is ∞, and set Moreover, let (p n ) n∈N denote the sequence of probability weights defined as and set p ∞ := 0. Now we let (X n ) n∈N∪{∞} denote independent FBM's that are also independent of X, and analyze the random coding strategies π (r) (·) := π (r) (·|(X n )). With T (r) := T (r) (X|(X n )) we obtain and Given X, the random time T (r) is geometrically distributed with parameter P( X −X 1 ≤ 1/r H |X), and due to Lemma A.2 there exists a universal constant Consequently, Due to (8), one has so that (9) and (10) imply that E[(− log p T (r) ) s ] 1/s c 2 r for some appropriate constant c 2 < ∞.
In particular, for any r ≥ 0, we can find a C[0, 1]-valued sequence (w (r) ) n∈N of pairwise different elements such that Now the strategies π (r) (·|(w (r) n )) with associated probability weights p .
Next we use the coding scheme of Section 2 to prove Choose M := e ∆r and let π (n) be as in Section 2. Note that ∆r ≥ 1 guarantees that M ≥ e ∆r − 1 ≥ e ∆r /2, so that (1 + ε)D (e) (r|∞).
Now let α n : and consider the strategyπ Since α n (X) is again a fractional Brownian motion on [0, n], it follows that, a.s.

The quantization problem
Theorem 4.1. One has for any s ∈ (0, ∞), Recall that a strategy π and probability weights (p w ) on the image of π intuitively correspond to a coding scheme which maps an original symbol x onto a prefix-free representation for π(x) with codelength of about − log 2 p π(x) . The proof of Theorem 4.1 relies on Proposition 4.4. There we show that for good coding schemes − log 2 p π(X) is strongly concentrated around some typical value when r is large. In order to prove the proposition we combine Lemma 3.3 with the following lemma.
Just as in the proof of Lemma 3.4, we use the self similarity of X to translate the strategy π (n) into a strategy for encoding (X t ) t∈[0,1] . For n ∈ N, let (1 + ε)nr 1 , in probability and by (13) By choosingπ (r) =π (n) and (p (r) ) = (p (n) ) for r ∈ ((n − 1)r 1 , nr 1 ], one obtains a coding scheme satisfying (1 + ε)r, in probability, so that the assertion follows by a diagonalization argument.

Remark 4.3.
In the above proof, we have constructed a high resolution coding scheme based on a strategy π : C[0, 1] → D[0, 1], using the identityπ (n) = α −1 n • π (n) • α n . This coding scheme leads to a coding error which is at most Moreover, the ergodic theorem implies that, for large n,π (n) (X) lies with probability almost one in the typical set {w ∈ D[0, 1] : − logp (n) w ≤ n(H(π(X)) + log M + ε)}, where ε > 0 is arbitrarily small. This set is of size exp{n(H(π(X)) + log M + ε)}, and will serve as a close to optimal high resolution codebook. It remains to control the case whereπ (n) (X) is not in the typical set. We will do this in the proof of Theorem 4.1 at the end of this section (see (19)).
Proposition 4.4. For s ≥ 1 there exist strategies (π (r) ) r≥0 and probability weights (p In addition, for any ε > 0 one has where the supremum is taken over all strategies π : C[0, 1] → D[0, 1] and over all sequences of probability weights (p w ).
Proof. Fix s > 1 and let for each R ≥ 0, π The definitions of π (r) 1 and π (r) 2 imply that lim r→∞ P(X ∈ T c r ) = 0 and E[κ Since δ > 0 can be chosen arbitrarily small, a diagonalization procedure leads to strategiesπ (r) and probability weights (p Now the first assertion follows from (16).
It remains to show that for arbitrary strategiesπ (r) , r ≥ 0, and probability weights (p (r) Without loss of generality, we can assume that Otherwise we modify the mapπ (r) for all w ∈ C[0, 1] with w −π (r) (w) > κ r −H in such a way that (18) be valid. Hereby the probability in (17) increases and it suffices to prove the statement for the modified strategy. Let us consider Then the probability weights p (r) := 1 2 (p (r) +p (r) ) satisfy π(X) ) s ] 1/s r.

Recall that
r, in probability, which gives (17).
Proof of Theorem 4.1. We start by proving the lower bound. Fix s > 0, let C r , r ≥ 0, denote arbitrary codebooks of size e r , and let π (r) : C[0, 1] → C r denote arbitrary strategies. Moreover, let (p (r) w ) be the sequence of probability weights defined as p (r) w = 1/|C r |, w ∈ C r . Then − log p (r) π (r) (X) ≤ r a.s., and the above lemma implies that for any ε ∈ (0, 1), Therefore, which proves the lower bound.
It remains to show that D (q) (r|s) κ/r H . By Lemma 4.2, there exist strategies π (r) and probability weights (p (r) w ) such that X − π (r) (X) ∞ ≤ κ 1 r H and − log p π (r) (X) r, in probability.

Implications of the equivalence of moments
In this section we complement Theorem 4.1 by The proof of this theorem is based on the following general principle: if the asymptotic quantization error coincides for two different moments s 1 < s 2 , then all moments s ≤ s 2 lead to the same asymptotic quantization error and the entropy coding problem coincides with the quantization problem for all moments s ≤ s 2 .
Let us prove this relationship in a general setting. E andÊ denoting arbitrary measurable spaces and d : E ×Ê → [0, ∞) a measurable function, the quantization error for a general E-valued r.v. X under the distortion d is defined as where the infimum is taken over all codebooks C ⊂Ê with |C| ≤ e r . In order to simplify notations, we abridge d(x, A) = inf y∈A d(x, y), x ∈ E, A ⊂Ê.
Analogously, we denote the entropy coding error by where the infimum is taken over all discreteÊ-valued r.v.X with H(X) ≤ r.
Then Theorem 5.1 is a consequence of Theorem 4.1 and the following theorem.
We need two technical lemmas. Proof. For r ≥ 0, let C * r denote codebooks of size e r with Now let C r denote arbitrary codebooks of size e r , and consider the codebooksC r := C * r ∪ C r . Using (21) and the inequality s 1 ≤ s 2 , it follows that Thus the s 1 -th and the s 2 -th moment coincide asymptotically and it follows by Lemma A.1 that d(X,C r ) ∼ f (r), in probability, so that in particular, d(X, C r ) f (r), in probability. In this section, we consider the coding problem for the fractional Brownian motion X under L p [0, 1]-norm distortion for some fixed p ∈ [1, ∞). In order to treat this approximation problem, we need to introduce Shannon's distortion rate function. It is defined as where the infimum is taken over all D[0, 1]-valued r.v.'sX satisfying the mutual information constraint I(X;X) ≤ r. Here and elsewhere I denotes the Shannon mutual information, defined as The objective of this section is to prove Theorem 6.1. The following limit exists and for any s > 0, one has We will first prove that statement (23) is valid for Since D(r|p) is dominated by D (q) (r|p), the existence of the limit in (22) then follows immediately. Due to Theorem 1.2 in (5), the distortion rate function D(·|p) has the same weak asymptotics as D (q) (·|p). In particular, D(r|p) ≈ r −H and κ p lies in (0, ∞).
We proceed as follows: decomposing X into the two processes we consider the coding problem for X (1) and X (2) in L p [0, n] (n ∈ N being large). We control the coding complexity of the first term via Shannon's source coding theorem (SCT) and use a limit argument in order to show that the coding complexity of X (2) is asymptotically negligible. We recall the SCT in a form which is appropriate for our discussion; for n ∈ N, let Then d n,p (f, g) p , n ∈ N, is a single letter distortion measure, when interpreting the function f | [0,n) as the concatenation of the "letters" f (0) , . . . , f (n−1) , where f (i) = (f (i + t)) t∈[0,1) . Analogously, the process X (1) corresponds to the letters X (1,i) := (X i+t ) t∈[0,1) , i ∈ N 0 . Since (X (1,i) ) i∈N 0 is an ergodic stationary C[0, 1)-valued process, the SCT implies that for fixed r ≥ 0 and ε > 0 there exist codebooks C n ⊂ D[0, n], n ∈ N, with at most exp{(1 + ε)nr} elements such that lim n→∞ P(d n,p (X (1) , C n ) p ≤ (1 + ε)D(r|p) p ) = 1.
The statement is an immediate consequence of the asymptotic equipartition property as stated in (3) (Theorem 1) (see also (1) and (3)).
First we prove a lemma which will later be used to control the coding complexity of X (2) . Lemma 6.2. Let (Z i ) i∈N be an ergodic stationary sequence of real-valued r.v.'s and let ..,n , c is a universal constant and · l n ∞ denotes the maximum norm on R n .
Proof. Let c > 0 be such that (p n ) n∈Z defined through p n = e −c 1 (|n| + 1) 2 is a sequence of probability weights, and let C n = ŝ n 1 ∈ 2εZ n : − ) defines a sequence of probability weights on 2εZ n the set C n satisfies the required size constraint. LetŜ n 1 denote a best approximation for S n 1 in the set 2εZ. Then always log(|Z i |/2ε + 2) + nc so that the ergodic theorem implies that lim n→∞ P(Ŝ n 1 ∈ C n ) = 1 which implies the assertion.
We now use the SCT combined with the previous lemma to construct codebooks that guarantee almost optimal reconstructions with a high probability. Lemma 6.3. For any ε > 0 there exist codebooks C r , r ≥ 0, of size e r such that Proof. Let ε > 0 be arbitrary and c be as in Lemma 6.2. We fix r 0 ≥ We decompose X into the two processes Due to the SCT (24), there exist codebooks C We apply Lemma 6.2 for ε := εκ p r −H 0 . Note that n . Then |C n | ≤ exp{(1 + 2ε)nr 0 }, and one has P(d n,p (X,C n ) ≤ (1 + 3ε)κ p r −H 0 ) ≥ P(d n,p (X (1) , C (1) n ) ≤ (1 + 2ε)κ p r −H 0 and d n,p (X (2) , C (2) n ) ≤ εκ p r −H 0 ) → 1. Consider the isometric isomorphism n (X) is a fractional Brownian motion and one has d p (X (n) , C n ) = d n,p (β n (X (n) ), β n (C n )) = n −H d n,p (X,C n ).
Hence, the codebooks C n are of size exp{(1 + 2ε)nr 0 } and satisfy P(d p (X, C n ) ≤ (1 + 3ε)κ p (nr 0 ) −H )) = P(d n,p (X,C n ) ≤ (1 + 3ε)κ p r −H 0 ) → 0 as n → ∞. Now the general statement follows by an interpolation argument similar to that used at the end of the proof of Theorem 3.1.
Proof of Theorem 6.1. Let s ≥ 1 be arbitrary, let C (1) r be as in the above lemma for some fixed ε > 0. Moreover, we let C Then the codebooks C r := C (1) r ∪ C (2) r contain at most 2e r elements and satisfy, in analogy to the proof of Theorem 4.1 (see (19)), E[d p (X, C r ) s ] 1/s (1 + ε)κ p 1 r H , r → ∞.
Since ε > 0 is arbitrary, it follows that D (q) (r|s) κ p 1 r H . For s ≥ p the quantization error is greater than the distortion rate function D(r|p), so that the former inequality extends to lim r→∞ r H D (q) (r|s) = κ p .
In particular, we obtain the asymptotic equivalence of all moments s 1 , s 2 greater or equal to p.
Using the definition of λ and (27), as well as the fact that ε > 0 is arbitrary, the conclusion follows.