Optimising Prediction Error among Completely Monotone Covariance Sequences

We provide a characterisation of Gaussian time series which optimise the one-step prediction error subject to the covariance sequence being completely monotone with the first m covari-ances specified.


Introduction
In [9] the aggregation of simple dynamic equations, in particular AR(1) models, was introduced as a means for generating processes displaying long memory.Since this paper, the idea of aggregating has become a popular approach for constructing time series with a flexible covariance structure (see, for instance [1,12,2,19,15,17]).Aggregation has also become a useful way to represent certain time series and provides a method for simulation.Examples of time series which can be represented in this way include the fractional ARIMA(0, d, 0) model proposed by Granger and Joyeaux in [10] and Lamperti's transformation of fractional Brownian motion with H < 1/2 [5].Other examples are given in [14].The problem of representing long memory processes in terms of an aggregation of short memory processes is studied in [7].Consider the aggregation of independent Gaussian AR(1) time series with positive correlation parameters.The covariance sequence of this time series is represented as for some Borel measure σ on [0, 1].The representation (1) is equivalent to stating that {γ k } ∞ k=0 has the completely monotone property, that is, for all n, k ≥ 0 where ∆ is the difference operator ∆a n = a n+1 − a n ( [18]).In this note we consider the problem of optimising the one-step prediction error variance, denoted by ψ(σ), subject to the covariance sequence having representation (1) and satisfying equality constraints The measure which maximises the prediction error will also provide the maximum entropy process ( [6]) subject to constraints (1) and (2).While the measure which minimises the prediction error will not necessarily be the minimum entropy process, it will still provide a lower bound on the prediction error for Gaussian time series with completely monotone covariance.
For the case where m = 1, the problem of maximising ψ(σ) subject to ( 1) and ( 2) can be solved from the result of Burg [3].He showed that the time series which maximises prediction error subject to (2) is a Gaussian AR(m) time series.When m = 1 this result can be applied so that the measure which maximises the prediction error subject to where χ A (x) is the characteristic function of the set A taking the value 1 if x ∈ A and zero otherwise.The measure which minimises ψ(σ) subject to (1) and ( 2) was shown in [14] to be It is interesting to note that for m = 1 the measure which maximises prediction error minimises the covariance γ k for k ≥ 2 and similarly, the measure which minimises prediction error maximises the covariance γ k for k ≥ 2. In the terminology of moment spaces [13] these two measures are called the extremising measures of the moment space [11] The paper is organised as follows.In Section 2 we recall some basic properties of time series with completely monotone covariance sequences and some properties of moment spaces.The main result of this paper, that the extremising measures of the moment space are precisely the measures which maximise and minimise the prediction error, is stated in Section 3. The proof of the result is given first assuming the measure is supported on [0, φ] with φ < 1 and then for the case of measures supported on [0, 1].The paper concludes with a discussion of how these measures are constructed.

Wold decomposition
Consider a time series with covariance sequence (1) and such that σ({1}) = 0. Then the corresponding spectral density function is given by where g(ω; ρ) is the spectral density of an AR(1) time series with variance 1.This spectral density satisfies the following bounds and From ( 3) it is seen that if σ({1}) = 0 then the time series is completely non-deterministic and the prediction error is given by the Kolmogorov formula In general, the Wold decomposition of a time series with covariance sequence ( 1) is simply where Z is a random variable whose variance is σ({1}) and The prediction error is still given by ( 5) provided the spectral density is understood as that of the completely nondeterministic part.

Moment spaces
Let D | c be the set of measures on [0, 1] satisfying (2).When we need to consider measures on [0, φ], φ ≤ 1 we shall write D φ | c.A necessary and sufficient condition for D | c to consist of at least one measure is that the quadratic forms for m = 2q or for m = 2q + 1, are positive definite or semi-definite (see Theorems 16.1a and 16.1b in [13]).From Theorem 16.2 and Theorem 20.1 of [13] it follows that if (6) or (7), as required, is positive definite then there is more than one measure in D | c.In this case we say D | c is non-degenerate.If ( 6) or (7), as required, is only positive semi-definite then there is only one measure in D | c and we say it is degenerate.These conditions are easily modified for D φ | c.
Note that these conditions define unique measures (see Theorem 5 of [11]).In [11] it was demonstrated that one of σ * 1,φ , σ * 2,φ will maximise E σ g(ρ) while the other measure will minimise the expectation.Which of these measures acts to minimise and which acts to maximise depends on the function g.

Main result
We first consider the case of measures on [0, φ] with φ < 1.The following result confirms the observation made in the introduction that in order to maximise (minimise) the prediction error we should minimise (maximise) the covariance sequence.
Lemma 1.Let γ, γ be the covariance sequences associated with measures σ, From the inequality x − 1 − log x ≥ 0, for all x > 0, The interchange of summation and integration is permitted since the covariance sequences are in ℓ 1 and f −1 (ω; σ) is bounded.Applying Jensen's inequality where the final inequality follows from the fact that the inverse autocovariance of an AR(1) time series with positive correlation parameter is non-positive for all k = 0 [4].The result now follows from ( 8)-( 10).
Lemma 2. The measure which maximises The second statement is an immediate implication of the first.The result for all φ ≤ 1 will follow immediately if it can be shown to be true for φ = 1.Since σ * 2 always has an atom at 1 it follows that lim j→∞ E σ * 2 ρ j = α > 0. On the other hand σ * 1 only has atoms in [0, 1) and so Therefore, there exists a J ∈ N such that for all j ≥ J Now assume that E σ ρ j is not maximised by σ * 2 for some j > m and let k be the largest integer such that From ( 11) and ( 12) there exists an ω j ∈ (0, 1) such that As we can apply Theorem 7 of [11] so that for all σ ∈ D | c.Now take any σ ∈ D | c which does not have an atom at 1.As lim j→∞ ω j = ω > 0 we can apply (13) to conclude that and all higher order moments can be determined from (13).Therefore, for any two measures σ, σ ′ ∈ D | c which do not have atoms at 1, E σ ρ j = E σ ′ ρ j , j ≥ k.It follows that there is only one measure in D | c which does not have an atom at 1 ( [8]) and this is σ * 1 .If we can show that there exists another measure in D | c which does not have an atom at 1 then a contradiction will occur and the result will be established.Take φ = 1 − ǫ with ǫ > 0 and such that σ * 1 ([φ, 1]) = 0. Define ck = φ −k c k , k = 0, . . ., m.As D | c is non-degenerate and ( 6) and ( 7) are continuous, we may take ǫ sufficiently small so that D | c is also non-degenerate.Let σ * 2 be the extremising measure in D | c which has an atom at 1.The measure σ * 2 (φ −1 dρ) is in D | c and does not have an atom at 1.By construction, this measure is different to σ * 1 and hence there is more than one measure in D | c which does not have an atom at 1. Having arrived at a contradiction, it follows that there is no k for which (12) holds and the result now follows.
The following theorem is a direct result of lemmas 1 and 2. Theorem 3. Let φ < 1.The measure in D φ | c which maximises prediction error is σ * 1,φ .The measure in D φ | c which minimises the prediction error is σ * 2,φ .The space D | c introduces a problem since the covariance sequence will not necessarily be in ℓ 1 .To overcome this problem we first show that ψ(σ) is continuous with respect to weak convergence of σ.We then consider the limit of the extremising measures σ * 1,φ , σ * 2,φ as φ → 1. Lemma 4. Let σ j be a sequence of measures converging weakly to σ.Then ψ(σ j ) → ψ(σ).
Proof.(See Proposition 4.1 (ii) of [14]) We need to show that for any ǫ > 0 there exists a J ∈ N such that for all j ≥ J Take δ > 0, as a family of functions of ρ on [0, 1], parameterized by ω, g(ω; ρ) is bounded and uniformly continuous for all |ω| > δ.Applying Theorem 3.2 of [16] we have f (ω; σ j ) → f (ω; σ) uniformly on |ω| > δ.As f (ω; σ) is bounded away from zero we may apply the dominated convergence theorem to obtain Applying the bounds (3) and ( 4) we can show that the integral can be made arbitrarily small by taking δ sufficiently small.The result follows.
Proof.It follows from the definition of the Prohorov metric that for φ < 1 but sufficiently large the measure σ(φ −1 dρ) is in V (σ).
From Lemma 2 and Theorem 7 of [11] it follows that for any j ∈ N, E σ * 1,φ ρ j is bounded and decreasing as φ → 1.Therefore, the moments converge for all j ∈ N and so the measures must converge weakly.That the limiting measure is σ * and σ  * 1,φ must also be in a weak neighbourhood of σ * 1 .A similar argument applies for {σ * 2,φ }.Theorem 7. The measure in D | c which maximises the prediction error is σ * 1 .The measure in D | c which minimises the prediction error is σ * 2 .Proof.Let σ * be the measure in D | c which maximises ψ(σ).From Lemma 4 and Lemma 5 it follows that for any ǫ > 0 we can find a φ < 1 and a measure σφ ∈ D φ | c such that ψ(σ * ) − ǫ ≤ ψ(σ φ ).From Theorem 3 ψ(σ * ) − ǫ ≤ ψ(σ * 1,φ ) and applying Lemma 4 and Lemma 6 it follows that ψ(σ * ) − ǫ ≤ ψ(σ * 1 ).As this holds for all ǫ > 0 we can conclude that ψ(σ * ) = ψ(σ * 1 ) and hence σ * 1 is the measure which maximises prediction error.A similar argument holds for minimising the prediction error.To conclude we briefly describe how the extremising measures can be constructed from the given covariances c 0 , . . ., c m .First define the polynomials ∆ n (t) and ∆ n (t) for n = 2k by the determinants, , and for n = 2k + 1 by the determinants, Define the polynomials P n (t) and P n (t) by From Theorem 20.2 of [13], the location of the atoms in the measures σ * 1 and σ * 2 are given by the distinct roots of the polynomials P m+1 (t) and P m+1 (t), respectively.The mass associated with each atom in the measure can then be determined from the linear system w 1 ρ k 1 + . . .+ w r ρ k r = c k , k = 0, 1, . . ., m, where ρ 1 , . . ., ρ r are the locations of the atoms (see section 2.3 for the number of atoms in the extremising measures).This system will be uniquely solvable provided D | c contains at least one measure.
As an example, consider the case of m = 2.The polynomial determining the locations of the atoms of σ * 1 is Therefore, σ * 1 has one atom at 0 and another at ρ = c 2 /c 1 .The moment constraints determine the measure to be Similarly, the polynomial determining the location of the atoms of σ * 2 is The measure σ * 2 is then Lemma 6.The sequence of extremising measures {σ * 1,φ } and {σ * 2,φ } converge weakly to σ *