Statistical inference for stationary linear models with tapered data

In this paper, we survey some recent results on statistical inference (parametric and nonparametric statistical estimation, hypotheses testing) about the spectrum of stationary models with tapered data, as well as, a question concerning robustness of inferences, carried out on a linear stationary process contaminated by a small trend. We also discuss some question concerning tapered Toeplitz matrices and operators, central limit theorems for tapered Toeplitz type quadratic functionals, and tapered Fej\'er-type kernels and singular integrals. These are the main tools for obtaining the corresponding results, and also are of interest in themselves. The processes considered will be discrete-time and continuous-time Gaussian, linear or L\'evy-driven linear processes with memory.

We want to make statistical inferences (parametric and nonparametric estimation, and hypotheses testing) about the spectrum of X(t).In the classical setting, the inferences are based on an observed finite realization X T of the process X(t): X T := {X(t), t ∈ D T }, where D T := [0, T ] in the c.t. case and D T := {1, . . ., T } in the d.t.case.
In the statistical analysis of stationary processes, however, the data are frequently tapered before calculating the statistic of interest, and the statistical inference procedure, instead of the original data X T , is based on the tapered data: X h T := {h T (t)X(t), t ∈ D T }}, where D T := [0, T ] in the c.t. case and D T := {1, . . ., T } in the d.t.case, and h T (t) := h(t/T ) with h(t), t ∈ R being a taper function.
The use of data tapers in nonparametric time series was suggested by Tukey [73].The benefits of tapering the data have been widely reported in the literature (see, e.g., Brillinger [8], Dahlhaus [17]- [20], [22], Dahlhaus and Künsch [23], Guyon [56], and references therein).For example, data-tapers are introduced to reduce the so-called 'leakage effects', that is, to obtain better estimation of the spectrum of the model in the case where it contains high peaks.Other application of data-tapers is in situations in which some of the data values are missing.Also, the use of tapers leads to bias reduction, which is especially important when dealing with spatial data.In this case, the tapers can be used to fight the so-called 'edge effects'.
In this paper, we survey some recent results on statistical inference (parametric and nonparametric statistical estimation, and hypotheses testing) about the spectrum of stationary models with tapered data, as well as, a question concerning robustness of inferences, carried out on a linear stationary process contaminated by a small trend.We also discuss some questions concerning tapered Toeplitz matrices and operators, central limit theorems for tapered Toeplitz type quadratic functionals, and tapered Fejér-type kernels and singular integrals.These are the main tools for obtaining the corresponding results, and also are of interest in themselves.The processes considered will be discrete-time and continuous-time Gaussian, linear or Lévy-driven linear processes with memory.
Some notation and conventions.The following notation and conventions are used throughout the paper.The symbol ':=' stands for 'by definition'; c.t.: = continuous-time; d.t.:= discrete-time; s.d.:= spectral density; c.f.:= covariance function; CLT:= central limit theorem.The symbols ' P →' and ' d →' stand for convergence in probability and in distribution, respectively.The notation X T d → η ∼ N (0, σ 2 ) as T → ∞ will mean that the distribution of the random variable X T tends (as T → ∞) to the centered normal distribution with variance σ 2 .E[•]: = expectation operator; tr[A]: = trace of an operator (matrix) A; I A (•): = indicator of a set A ⊂ Λ; WN(0, 1): = standard white-noise.The standard symbols N, Z and R denote the sets of natural, integer and real numbers, respectively; N 0 := N ∪ 0. By Λ we denote the frequency domain, that is, Λ := R in the c.t. case, and Λ := [−π.π] in the d.t.case.By L p (µ) := L p (Λ, µ) (p ≥1) we denote the weighted Lebesgue space with respect to the measure µ, and by || • || p,µ we denote the norm in L p (µ).In the special case where dµ(λ) = dλ, we will use the notation L p and || • || p , respectively.The letters C and c with or without indices are used to denote positive constants, the values of which can vary from line to line.Also, in the d.t.case all the considered functions are assumed to be 2π-periodic and periodically extended to R.
The structure of the paper.The rest of the paper is structured as follows.In Section 2 we specify the model of interest -a stationary process, recall some key notions and results from the theory of stationary processes, and introduce the data tapers and tapered periodogram.In Section 3 we discuss the nonparametric estimation problem.We analyze the asymptotic properties, involving asymptotic unbiasedness, bias rate convergence, consistency, a central limit theorem and asymptotic normality of the empirical spectral functionals.In Section 4 we discuss the parametric estimation problem.We present sufficient conditions for consistency and asymptotic normality of minimum contrast estimator based on the Whittle contrast functional for stationary linear models with tapered data.Section 5 is devoted to the construction of goodness-of-fit tests for testing hypotheses that the hypothetical spectral density of a stationary Gaussian model has the specified form, based on the tapered data.A question concerning robustness of inferences, carried out on a linear stationary process contaminated by a small trend is discussed in Section 6.In Section 7 we briefly discuss the methods and tools, used to prove the results stated in Sections 3-6.

Preliminaries
In this section we specify the model of interest -a stationary process, and introduce the data tapers and tapered periodogram.
By the Herglotz theorem in the d.t.case, and the Bochner-Khintchine theorem in the c.t. case (see, e.g., Cramér and Leadbetter [16]), there is a finite measure µ on (Λ, B(Λ)), where Λ = R in the c.t. case, and Λ = [−π.π] in the d.t.case, and B(Λ) is the Borel σ-algebra on Λ, such that for any u ∈ U the covariance function r(u) admits the following spectral representation: ( The measure µ in (2.1) is called the spectral measure of the process X(u).The function F (λ) := µ[−π, λ] in the d.t.case and F (λ) := µ[−∞, λ] in the c.t. case, is called the spectral function of the process X(t).If F (λ) is absolutely continuous (with respect to Lebesgue measure), then the function f (λ) := dF (λ)/dλ is called the spectral density of the process X(t).Notice that if the spectral density f (λ) exists, then f (λ) ≥ 0, f (λ) ∈ L 1 (Λ), and (2.1) becomes Thus, the covariance function r(u) and the spectral function F (λ) (resp.the spectral density function f (λ)) are equivalent specifications of the second order properties for a stationary process {X(u), u ∈ U}.

Linear processes. Existence of spectral density functions
We will consider here stationary processes possessing spectral density functions.For the following results we refer to Cramér and Leadbetter [16], Doob [25], and Ibragimov and Linnik [64].
(a) The spectral function F (λ) of a d.t.stationary process {X(u), u ∈ Z} is absolutely continuous (with respect to the Lebesgue measure), F (λ) = λ −π f (x)dx, if and only if it can be represented as an infinite moving average: where {ξ(k), k ∈ Z} ∼ WN(0,1) is a standard white-noise, that is, a sequence of orthonormal random variables.
(b) The covariance function r(u) and the spectral density f (λ) of X(u) are given by formulas: and (c) In the case where ξ(k) is a sequence of Gaussian random variables, the process X(u) is Gaussian.
Similar results hold for c.t. processes.Indeed, the following holds.
(a) The spectral function F (λ) of a c.t. stationary process {X(u), u ∈ R} is absolutely continuous (with respect to Lebesgue measure), if and only if it can be represented as an infinite continuous moving average: where {ξ(t), t ∈ R} is a process with orthogonal increments and E|d ξ(t)| 2 = dt.
(b) The covariance function r(u) and the spectral density f (λ) of X(u) are given by formulas: and (c) In the case where ξ(t) is a Gaussian process, the process X(u) is Gaussian.
In the case where ξ(t) = B(t), X(t) is a Gaussian process (see, e.g., Bai et al. [6]): The function a(•) in representations (2.3) and (2.6) plays the role of a time-invariant filter, and the linear processes defined by (2.3) and (2.6) can be viewed as the output of a linear filter a(•) applied to the process ξ(t), called the innovation or driving process of X(t).
Processes of the form (2.3) and (2.6) appear in many fields of science (economics, finance, physics, etc.), and cover large classes of popular models in time series modeling.For instance, the classical autoregressive moving average (ARMA) models and their continuous counterparts the c.t. autoregressive moving average (CARMA) models are of the form (2.3) and (2.6), respectively, and play a central role in the representations of stationary time series (see, e.g., Brockwell [9], Brockwell and Davis [10]).In the frequency domain setting, the statistical and spectral analysis of stationary processes requires two types of conditions on the spectral density f (λ).The first type controls the singularities of f (λ), and involves the dependence (or memory) structure of the process, while the second type -controls the smoothness of f (λ).The memory structure of a stationary process is essentially a measure of the dependence between all the variables in the process, considering the effect of all correlations simultaneously.Traditionally memory structure has been defined in the time domain in terms of decay rates of the autocorrelations, or in the frequency domain in terms of rates of explosion of low frequency spectra (see, e.g., Beran et al. [7], Giraitis et al. [52], Guégan [55]).It is convenient to characterize the memory structure in terms of the spectral density function.
We will distinguish the following types of stationary models: (a) short memory (or short-range dependent), (b) long memory (or long-range dependent), (c) intermediate memory (or anti-persistent).
Short-memory models.Much of statistical inference is concerned with short-memory stationary models, where the spectral density f (λ) of the model is bounded away from zero and infinity, that is, there are constants C 1 and C 2 such that A typical d.t.short memory model example is the stationary Autoregressive Moving Average (ARMA)(p, q) process X(t) defined to be a stationary solution of the difference equation: where ψ p and θ q are polynomials of degrees p and q, respectively, B is the backshift operator defined by BX(t) = X(t − 1), and {ε(t), t ∈ Z} is a d.t.white noise, that is, a sequence of zeromean, uncorrelated random variables with variance σ 2 .The covariance r(k) of (ARMA)(p, q) process is exponentially bounded: and the spectral density f (λ) is a rational function (see, e.g., Brockwell and Davis [10], Section 3.1): (2.9) A typical c.t. short-memory model example is the stationary c.t. ARMA(p, q) processes, denoted by CARMA(p, q), The spectral density function f (λ) of a CARMA(p, q) process X(t) is given by the following formula (see, e.g., Brockwell [9]): where are polynomials of degrees p and q, respectively.Another important c.t. short-memory model is the Ornstein-Uhlenbeck process, which is a Gaussian stationary process with covariance function r(t) = σ 2 e −α|t| (t ∈ R), and spectral density (2.11) Discrete-time long-memory and anti-persistent models.Data in many fields of science (economics, finance, hydrology, etc.), however, is well modeled by stationary processes whose spectral densities are unbounded or vanishing at some fixed points (see, e.g., Beran et al. [7], Guégan [55], and references therein).
A long-memory model is defined to be a stationary process with unbounded spectral density, and an anti-persistent model -a stationary process with vanishing (at some fixed points) spectral density.
In the discrete context, a basic long-memory model is the Autoregressive Fractionally Integrated Moving Average (ARFIMA)(0, d, 0)) process X(t) defined to be a stationary solution of the difference equation (see, e.g., Brockwell and Davis [10], Section 13.2): where B is the backshift operator and ε(t) is a d.t.white noise defined above.The spectral density f (λ) of X(t) is given by (2.12) Notice that f (λ) ∼ c |λ| −2d as λ → 0, that is, f (λ) blows up at λ = 0 like a power function, which is the typical behavior of a long memory model.
A typical example of an anti-persistent model is the ARFIMA(0, d, 0) process X(t) with spectral density specified by (2.12) with d < 0, which vanishes at λ = 0. Note that the condition d < 1/2 ensures that π −π f (λ)dλ < ∞, implying that the process X(t) with spectral density (2.12) is well defined because Data can also occur in the form of a realization of a 'mixed' short-long-intermediate-memory stationary process X(t).A well-known example of such a process, which appears in many applied problems, is an ARFIMA(p, d, q) process X(t) defined to be a stationary solution of the difference equation: where B is the backshift operator, ε(t) is a d.t.white noise, and ψ p and θ q are polynomials of degrees p and q, respectively.The spectral density f X (λ) of X(t) is given by where f (λ) is the spectral density of an ARMA(p, q) process, given by (2.9).Observe that for 0 < d < 1/2 the model X(t) specified by the spectral density (2.13) displays long-memory, for d < 0 -intermediate-memory, and for d = 0 -short-memory.For d ≥ 1/2 the function f X (λ) in (2.13) is not integrable, and thus it cannot represent a spectral density of a stationary process.Also, if d ≤ −1, then the series X(t) is not invertible in the sense that it cannot be used to recover a white noise ε(t) by passing X(t) through a linear filter (see, e.g., Brockweel and Davis [10]).
Another important long-memory model is the fractional Gaussian noise (fGn).To define the fGn we first introduce the fractional Brownian motion (fBm) {B H (t), t ∈ R} with Hurst index H, 0 < H < 1, defined to be a centered Gaussian H-self-similar process having stationary increments (see, e.g., Samorodnisky and Taqqu [70]).Then the increment process {X(k) := B H (k + 1) − B H (k), k ∈ Z}, called fractional Gaussian noise (fGn), is a d.t.centered Gaussian stationary process with spectral density function: where c is a positive constant.It follows from (2.14) that f (λ) ∼ c |λ| 1−2H as λ → 0, that is, f (λ) blows up if H > 1/2 and tends to zero if H < 1/2.Also, comparing (2.12) and (2.14), we observe that, up to a constant, the spectral density of fGn has the same behavior at the origin as ARFIMA(0, d, 0) Thus, the fGn For more details we refer to Samorodnisky and Taqqu [70].
Continuous-time long-memory and anti-persistent models.In the continuous context, a basic process which has commonly been used to model long-range dependence is the fractional Brownian motion (fBm) B H with Hurst index H, defined above, which can be regarded as a Gaussian process having a 'spectral density': (2.15) The form (2.15) can be understood in a generalized sense (see, e.g., Yaglom [77]), since the fBm B H is a nonstationary process.
A proper stationary model in lieu of fBm is the fractional Riesz-Bessel motion (fRBm), introduced in Anh et al. [2], and defined as a c.t. Gaussian process X(t) with spectral density ( The exponent α determines the long-range dependence, while the exponent β indicates the second-order intermittency of the process (see, e.g., Anh et al. [3] and Gao et al. [32]).
Comparing (2.15) and (2.16), we observe that the spectral density of fBm is the limiting case as β → 0 that of fRBm with Hurst index Another important c.t. long-memory model is the CARFIMA(p, H, q) process.The spectral density function f (λ) of a CARFIMA(p, H, q) process X(t) is given by the following formula (see, e.g., Brockwell [9], and Tsai and Chan [74]): where are polynomials of degrees p and q, respectively.Notice that for H = 1/2, the spectral density given by (2.17) becomes that of the short-memory CARMA(p, q) process, given by (2.10).

Data tapers and tapered periodogram
Our inference procedures will be based on the tapered data X h T : where with h(t), t ∈ R being a taper function.
Throughout the paper, we will assume that the taper function h(•) satisfies the following assumption.
Assumption 2.1.The taper h : R → R is a continuous nonnegative function of bounded variation and of bounded support [0, 1], such that H k = 0, where (2.20) , where I [0,1] (•) denotes the indicator of the segment [0, 1], will be referred to as the non-tapered case. Remark . More examples of taper functions satisfying Assumption 2.1 can be found in Dahlhaus [22] and in Guyon [56].
Denote by H k,T (λ) the tapered Dirichlet type kernel, defined by Define the finite Fourier transform of the tapered data (2.18): and the tapered periodogram I h T (λ) of the process X(t): where Notice that for non-tapered case (h(t) = I [0,1] (t)), we have C T = 2πT .

Nonparametric Estimation Problem
Suppose we observe a finite realization X T := {X(u), 0 ≤ u ≤ T (or u = 1, . . ., T in the d.t.case)} of a centered stationary process X(u) with an unknown spectral density function f (λ), λ ∈ Λ.We assume that f (λ) belongs to a given (infinite-dimensional) class F ⊂ L p := L p (Λ) (p ≥ 1) of spectral densities possessing some specified smoothness properties.The problem is to estimate the value J(f ) of a given functional J(•) at an unknown 'point' f ∈ F on the basis of an observation X T , and investigate the asymptotic (as T → ∞) properties of the suggested estimators, depending on the dependence structure of the model X(u) and the smoothness structure of the 'parametric' set F ⊂ L p (Λ) (p ≥ 1).Linear and non-linear functionals of the periodogram play a key role in the parametric estimation of the spectrum of stationary processes, when using the minimum contrast estimation method with various contrast functionals (see, e.g., Anh et al. [4], Dzhaparidze [27], Guyon [56], Leonenko and Sakhno [65], Taniguchi and Kakizawa [72], and references therein).In this section, we review the asymptotic properties, involving asymptotic unbiasedness, bias rate convergence, consistency, a central limit theorem and asymptotic normality of the empirical spectral functionals based on the tapered data.Some of these properties were discussed and proved in Ginovyan and Sahakyan [49,50].For non-tapered case, these properties were established in the papers Ginovyan [35,39].The results stated in this section are used to prove consistency and asymptotic normality of the minimum contrast estimator based on the Whittle contrast functional for stationary linear models with tapered data (see Section 4).Here we follow the papers Ginovyan [36,39,40], and Ginovyan and Sahakyan [49,50].

Estimation of linear spectral functionals
We are interested in the nonparametric estimation problem, based on the tapered data (2.18), of the following linear spectral functional: where g(λ) ∈ L q (Λ), 1/p + 1/q = 1.
As an estimator J h T for functional J(f ), given by (3.1), based on the tapered data (2.18), we consider the averaged tapered periodogram (or a simple 'plug-in' statistic), defined by where I h T (λ) is the tapered periodogram of the process X(t) given by (2.23).Denote where g(t) is the Fourier transform of function g(λ): In view of (2.23) and (3.2) -(3.4) we have where C T is as in (2.24).We will refer to g(λ) and to its Fourier transform g(t) as a generating function and generating kernel for the functional J h T , respectively.Thus, to study the asymptotic properties of the estimator J h T , we have to study the asymptotic distribution (as T → ∞) of the tapered Toeplitz type quadratic functional Q h T given by (3.3) (for details see Section 7.2).

Asymptotic unbiasedness
We begin with the following assumption.
Assumption 3.1.The function and is continuous at u = 0.
Theorem 3.1.Let the functionals J := J(f, g) and J h T := J(I h T , g) be defined by (3.1) and (3.2), respectively.Then under Assumptions 2.1 and 3.1 the statistic J h T is an asymptotically unbiased estimator for J(f ), that is, the following relation holds: Using Hölder inequality, it can easily be shown that if Under additional smoothness conditions on functions f (λ) and g(λ) we can estimate the rate of convergence in (3.7).To state the corresponding result, we first introduce some notation and assumptions.
The following theorem controls the bias E(J h T ) − J and provides sufficient conditions assuring the proper rate of convergence of bias to zero, necessary for asymptotic normality of the estimator J h T .Specifically, we have the following result.Theorem 3.2.Let the functionals J := J(f, g) and J h T := J(I h T , g) be defined by (3.1) and (3.2), respectively.Then under Assumptions 2.1 and 3.2 (or 3.3), the following asymptotic relation holds: Remark 3.3.We call an estimator J h T of J asymptotically unbiased of the order of T β , β > 0 if lim T →∞ T β [E(J h T ) − J] = 0. Thus, Theorem 3.2 states that the statistic J h T is an asymptotically unbiased estimator for J of the order of T 1/2 .

Consistency
Recall that an estimator J h T of J is said to be (a) consistent if as T → ∞, To state the corresponding results we first introduce the following assumption.
Assumption 3.4.The filter a(•) and the generating kernel g(•) are such that We begin with results on the asymptotic behavior of the variance Var( The proof of the next theorem can be found in Ginovyan and Sahakyan [49]. Theorem 3.3.Let the functionals J := J(f, g) and J h T := J(I h T , g) be defined by (3.1) and (3.2), respectively.Then under Assumptions 2.1 and 3.4 the following asymptotic relation holds: where Here κ 4 is the fourth cumulant of ξ(1), and From Theorems 3.1-3.3we infer the following result.
(a) Under Assumptions 2.1, 3.1 and 3.4 the statistic J h T is a mean square consistent estimator for J.
(b) Under Assumptions 2.1, 3.2 (or 3.3) and 3.4 the statistic J h T is a √ T -consistent in the mean square sense estimator for J.

Asymptotic normality
The next result contains sufficient conditions for functional J h T to obey the central limit theorem (CLT), and was proved in Ginovyan and Sahakyan [49].Theorem 3.5 (CLT).Let J := J(f, g) and J h T := J(I h T , g) be defined by (3.1) and (3.2), respectively.Then under Assumptions 2.1 and 3.4 the functional J h T obeys the central limit theorem.More precisely, we have where the symbol d → stands for convergence in distribution, and η is a normally distributed random variable with mean zero and variance σ 2 h (J) given by (3.10) and (3.11).
Taking into account the equality as an immediate consequence of Theorems 3.2 and 3.5, we obtain the next result that contains sufficient conditions for a simple 'plug-in' statistic J(I h T ) to be an asymptotically normal estimator for a linear spectral functional J. Theorem 3.6.Let the functionals J := J(f, g) and J h T := J(I h T , g) be defined by (3.1) and (3.2), respectively.Then under Assumptions 2.1, 3.2 (or 3.3) and 3.4 the statistic J h T is an asymptotically normal estimator for functional J.More precisely, we have where η is as in Theorem 3.5, that is, η is a normally distributed random variable with mean zero and variance σ 2 h (J) given by (3.10) and (3.11).Remark 3.4.Notice that if the underlying process X(u) is Gaussian, then in formula (3.10) we have only the first term.Using the results from Ginovyan [35] and Ginovyan and Sahakyan [44,45], it can be shown that in this case Theorem 3.6 is true under Assumptions 2.1 and 3.4.
Example 3.1 (Estimation of covariance function).Assume that X(t) is a c.t. process, and let g(λ) = e iuλ , then Thus, in this special case our problem becomes to the estimation of the the covariance function r(u) = E[X(t + u)X(t)] of the process X(t).By Theorem 3.6 the simple "plug-in" statistic is asymptotically normal estimator for r(u) with asymptotic variance where e(h) is given by (3.11).
Example 3.2 (Estimation of spectral function).Assume that X(t) is a d.t.process, and let g(λ) = χ [0,µ] (λ) be the indicator of an interval [0, µ], then Thus, in this case the estimand functional is the spectral function F (µ) of the process X(u), and by Theorem 3.6 the simple "plug-in" statistic is asymptotically normal estimator for F (µ) with asymptotic variance where e(h) is given by (3.11).

Parametric Estimation Problem
We assume here that the spectral density f (λ) belongs to a given parametric family of spectral densities F := {f (λ, θ) : θ ∈ Θ}, where θ := (θ 1 , . . ., θ p ) is an unknown parameter and Θ is a subset in the Euclidean space R p .The problem of interest is to estimate the unknown parameter θ on the basis of the tapered data (2.18), and investigate the asymptotic (as T → ∞) properties of the suggested estimators, depending on the dependence (memory) structure of the model X(t) and the smoothness of its spectral density f .There are different methods of estimation: maximum likelihood, Whittle, minimum contrast, etc.Here we focus on the Whittle method.

The Whittle estimation procedure
The Whittle estimation procedure, originally devised for d.t.short memory stationary processes, is based on the smoothed periodogram analysis on a frequency domain, involving approximation of the likelihood function and asymptotic properties of empirical spectral functionals (see Whittle [76]).The Whittle estimation method since its discovery has played a major role in the asymptotic theory of parametric estimation in the frequency domain, and was the focus of interest of many statisticians.Their aim was to weaken the conditions needed to guarantee the validity of the Whittle approximation for d.t.short memory models, to find analogues for long and intermediate memory models, to find conditions under which the Whittle estimator is asymptotically equivalent to the exact maximum likelihood estimator, and to extend the procedure to the c.t. models and random fields.
For the d.t.case, it was shown that for Gaussian and linear stationary models the Whittle approach leads to consistent and asymptotically normal estimators under short, intermediate and long memory assumptions.Moreover, it was shown that in the Gaussian case the Whittle estimator is also asymptotically efficient in the sense of Fisher (see, e. g., Dahlhaus [21], Dzhaparidze [26], Fox and Taqqu [30], Giraitis and Surgailis [53], Guyon [56], Heyde and Gay [60], Taniguchi and Kakizawa [72], Walker [75], and references therein).
The Whittle estimation procedure based on the d.t.tapered data has been studied in Alomari et al. [1], Dahlhaus [17], Dahlhaus and Künsch [23], Guyon [56], Ludeña and Lavielle [66].In the case where the underlying model is a Lévy-driven c.t. linear process with possibly unbounded or vanishing spectral density function, consistency and asymptotic normality of the Whittle estimator was established in Ginovyan [42].
To explain the idea behind the Whittle estimation procedure, assume for simplicity that the underlying process X(t) is a d.t.Gaussian process, and we want to estimate the parameter θ based on the sample X T := {X(t), t = 1, . . ., T }.A natural approach is to find the maximum likelihood estimator (MLE) θ T,M LE of θ, that is, to maximize the likelihood function, or to minimize the −1/T ×log-likelihood function L T (θ), which in this case takes the form: where B T (f θ ) is the Toeplitz matrix generated by f θ .Unfortunately, the above function is difficult to handle, and no explicit expression for the estimator θ T,M LE is known (even in the case of simple models).An approach, suggested by P. Whittle, called the Whittle estimation procedure, is to approximate the term ln det B T (f θ ) by T 2 π −π ln f θ (λ)dλ and the inverse matrix [B T (f θ )] −1 by the Toeplitz matrix B T (1/f θ ).This leads to the following approximation of the log-likelihood function L T (θ), introduced by Whittle [76], and called Whittle functional: where I T (λ) is the ordinary periodogram of the process X(t).Now minimizing the Whittle functional L T,W (θ) with respect to θ, we get the Whittle estimator θ T for θ.It can be shown that if then the MLE θ T,M LE and the Whittle estimator θ T are asymptotically equivalent in the sense that θ T also is consistent, asymptotically normal and asymptotically Fisher-efficient (see, e.g., Coursol and Dacunha-Castelle [14], Dzhaparidze [26], and Dzhaparidze and Yaglom [29]).
In the continuous context, the Whittle procedure of estimation of a spectral parameter θ based on the sample X T := {X(t), 0 ≤ t ≤ T } is to choose the estimator θ T to minimize the weighted Whittle functional: where I T (λ) is the continuous periodogram of X(t), and w(λ) is a weight function (w(−λ) = w(λ), w(λ) ≥ 0, w(λ) ∈ L 1 (R)) for which the integral in (4.1) is well defined.An example of common used weight function is w(λ) = 1/(1 + λ 2 ).The Whittle procedure of estimation of a spectral parameter θ based on the tapered sample (2.18) is to choose the estimator θ T,h to minimize the weighted tapered Whittle functional: where I h T (λ) is the tapered periodogram of X(t), given by (2.23), and w(λ) is a weight function for which the integral in (4.2) is well defined.Thus,
The next theorem contains sufficient conditions for Whittle estimator to be consistent (see Ginovyan [42]).Theorem 4.1.Let θ T,h be the Whittle estimator defined by (4.3) and let θ 0 be the true value of parameter θ.Then, under Assumptions 4.1-4.4 and 2.1, the statistic θ T,h is a consistent estimator for θ, that is, θ T,h → θ 0 in probability as T → ∞.
Having established the consistency of the Whittle estimator θ T,h , we can go on to obtain the limiting distribution of T 1/2 θ T,h − θ 0 in the usual way by applying the Taylor's formula, the mean value theorem, and Slutsky's arguments.Specifically we have the following result, showing that under the above assumptions, the Whittle estimator θ T,h is asymptotically normal (see Ginovyan [42]).Theorem 4.2.Suppose that Assumptions 4.1-4.6 and 2.1 are satisfied.Then the Whittle estimator θ T,h of an unknown spectral parameter θ based on the tapered data (2.18) is asymptotically normal.More precisely, we have where N p (•, •) denotes the p-dimensional normal law, d → stands for convergence in distribution, where the matrices W , A and B are defined in (4.4)-(4.7),and the tapering factor e(h) is given by formula (3.11).

Goodness-of-fit tests
In this section we consider the following problem of hypotheses testing.
Based on the tapered sample X h T given by (2.18), we want to construct goodness-of-fit tests for testing a hypothesis H 0 that the spectral density of the process X(t) has the specified form f (λ).We will distinguish the following two cases.
a) The hypothesis H 0 is simple, that is, the hypothetical spectral density f (λ) of X(t) does not depend on unknown parameters.
To test the hypothesis H 0 , similar to the non-tapered case, it is natural to introduce a measure of divergence (disparity) of the hypothetical and empirical spectral densities, and construct a goodness-of-fit test based on the distribution of the chosen measure (see, e.g., Dzhaparidze [27], Ginovyan [38,41,43], Hannan [57], and Osidze [68,69]).

A Goodness-of-fit test for simple hypothesis
We first consider the relatively easy case a) of a simple hypothesis H 0 .As a measure of divergence of the hypothetical spectral density f (λ) and the tapered empirical spectral density I h T (λ), we consider the m-dimensional random vector with elements where e(h) is as in (3.11) and {ϕ j (λ), j = 1, 2, . . ., m} is some orthonormal system on Λ: In Ginovyan [43] it was shown that under wide conditions on f (λ) and ϕ j (λ), the random vector Φ h T in (5.1) -( 5.2) has asymptotically N (0, I m )-normal distribution as T → ∞, where I m is the m × m identity matrix.Therefore in the case of simple hypothesis H 0 , we can use the statistic which for T → ∞ will have a χ 2 -distribution with m degrees of freedom.Thus, fixing an asymptotic level of significance α we can consider the class of goodnessof-fit tests for testing the simple hypothesis H 0 about the form of the spectral density f with asymptotic level of significance α determined by critical regions of the form: where S T (X h T ) is given by (5.4), and d α is the α-quantile of χ 2 -distribution with m degrees of freedom, that is, d α is determined from the condition: where k m (x) is the density of χ 2 -distribution with m degrees of freedom.
The next theorem contains sufficient conditions for statistic S h T , given by (5.4), to have a limiting (as T → ∞) χ 2 -distribution with m degrees of freedom (see Ginovyan [43]).
Remark 5.1.For the non-tapered case, for observations generated by d.t.short-memory Gaussian stationary processes the result of Theorem 5.1 was first proved in Hannan [57] (p.94) (see, also, Dzhaparidze [27] and Osidze [68,69]).In the case where the spectral density has singularities (zeros and/or poles), the result for d.t.processes was proved in Ginovyan [38].The non-tapered counterpart of Theorem 5.1 for c.t. processes was proved in Ginovyan [41].

A Goodness-of-fit test for composite hypothesis
Now we consider the case of composite hypothesis H 0 , and assume that the hypothetical spectral density f = f (λ, θ) is known with the exception of a vector parameter θ := (θ 1 , . . ., θ p ) ∈ Θ ⊂ R p .In this case, the problem of construction of goodness-of-fit tests becomes more complex, because we first have to choose an appropriate statistical estimator θ T for the unknown parameter θ, constructed on the basis of the tapered sample (2.18).It is important to remark that in this case the limiting distribution of the test statistic will change in accordance with properties of an estimator of θ, and generally will not be a χ 2 -distribution.
For testing a composite hypothesis H 0 , we again can use a statistic of type (5.4), but with a statistical estimator θ T instead of unknown θ.The corresponding statistic can be written as follows: where now So, we must choose an appropriate statistical estimator θ T for unknown θ, and determine the limiting distribution of the statistic (5.5).Then, having the limiting distribution of the statistic (5.5), for given level of significance α we can consider the class of goodness-of-fit tests for testing the composite hypothesis H 0 about the form of the spectral density f with asymptotic level of significance α determined by critical regions of the form: where d α is the α-quantile of the limiting distribution of the statistic (5.5), that is, d α is determined from the condition: where k m (x) is the density of the limiting distribution of S h T ( θ T ) defined by (5.5).To state the corresponding result we first introduce the following set of assumptions: Assumption 5.1.For θ ∈ Θ, (f, g j ) ∈ (H) for all j = 1, 2, . . ., m, where f := f (λ, θ) and g j := ϕ j (λ)/f (λ, θ).Assumption 5.2.For θ ∈ Θ, (f, h kj ) ∈ (H) for all k = 1, 2, . . ., p and j = 1, 2, . . ., m, where is non-singular.
Assumption 5.4.There exists a √ T -consistent estimator θ T for the parameter θ such that the following asymptotic relation holds: where Γ −1 (θ 0 ) is the inverse of the matrix Γ(θ 0 ) defined in Assumption 5.3, and is a p-dimensional random vector with components The term o P (1) in (5.8) tends to zero in probability as T → ∞. (Recall that an estimator θ T for Remark 5.2.As an estimator θ T for θ satisfying (5.8) can be considered minimum contrast estimators (in particular, the Whittle estimator) based on the tapered data.Minimum contrast estimators based on the tapered data for d.t.processes have been studied in Dahlhaus [17,19,18], for Gaussian c.t. processes in Ginovyan [42], and for some classes of c.t. non-Gaussian processes in Alomari et al [1].
The following theorem was proved in Ginovyan [43].
Example 5.1.Let X(t) be a d.t.Autoregressive Process of order p (AR(p)), that is, X(t) is a stationary process with the spectral density Consider the functions ϕ j (λ) (j = 1, . . ., m) defined by ϕ j (λ) := ce ijλ α p (e −iλ ) α p (e −iλ ) −1 for j > p and ϕ j (λ) := 0 for j ≤ p, where c is a normalizing constant.As an estimator of θ := (θ 1 , . . ., θ p ) consider the Whittle estimator, and as a taper the Tukey-Hanning taper function h(t) (see Remark 2.1).Then it is easy to check that the conditions of Theorem 5.2 are satisfied and B(θ) = 0. Therefore, the limiting (as T → ∞) distribution of the statistic in (5.5) is a χ 2 -distribution with m − p degrees of freedom.

Robustness to small trends of estimation
In time series analysis, much of statistical inferences about unknown spectral parameters or spectral functionals are concerned with the stationary models, in which case it is assumed that the models are centered, or have constant means.In this section, we are concerned with the robustness of inferences, carried out on a stationary models, possibly exhibiting long memory, contaminated by a small trend.Specifically, let {X(t), t ∈ U} be a centered stationary process possessing a spectral density f X (λ), λ ∈ Λ.Assuming that either f X is known with the exception of a vector parameter θ ∈ Θ ⊂ R p , or f X is completely unknown and belongs to a given class F, we want to make inferences about θ or the value J(f X ) of a given functional J(•) at an unknown point f X ∈ F in the case where the actual observed data are in the contaminated form: where M (t) is a deterministic trend.The process X(t) is what we believe is being observed but in reality the data are in the contaminated form Y (t).In this case standard inferences can be carried on the basis of the stationary model X(t), and we are interested in question whether the conclusions are robust against this kind of departure from the stationarity.
The results stated below show that if the trend M (t) is 'small', then the asymptotic properties of estimators of the parameter θ and the functional J(f ), stated in Sections 3 and 4 for a stationary model X(t), remain valid for the contaminated model Y (t), that is, both the parametric and nonparametric estimating procedures are robust against replacing the stationary model X(t) by the non-stationary Y (t).To this end, we first establish an asymptotic relation between stationary and contaminated periodograms.

A relation between stationary and contaminated periodograms
The next result shows that a small trend of the form |M (t)| ≤ C|t| −β , β > 1/4, does not effect the asymptotic properties of the empirical spectral linear functionals of a periodogram.Note that this result is of general nature, and do not require from the model X(t) to be linear.Theorem 6.1.Let {X(t), t ∈ U} be a stationary mean zero process, {M (t), t ∈ U} be a deterministic trend, Y (t) = X(t) + M (t), and let I h T X (λ) and I h T Y (λ) be the periodograms of X(t) and Y (t), respectively.Let g(λ), λ ∈ Λ be an even integrable function.If the trend M (t) and the Fourier transform a(t) := g(t) of g(λ) are such that M (t) is locally integrable on R and with some constants C > 0, γ > 0 and β > 1/4, then where P → stands for convergence in probability, provided that one of the following conditions holds: (i) the process X(t) has short or intermediate memory, that is, the covariance function r(t) := r X (t) of X(t) satisfies r ∈ L 1 (Λ), and β + γ > 1, (ii) the process X(t) has long memory with covariance function r(t) satisfying with some constants C > 0, 0 < α ≤ 1, and α + 2β > 1 if β < 1 < γ.

Robustness to small trends of nonparametric estimation
The next result shows that a small trend of the form |M (t)| ≤ C|t| −β does not effect the asymptotic properties of the estimator of a linear spectral functional J(f ), that is, the nonparametric estimation procedure is robust to the presence of a small trend in the model.Theorem 6.2.Suppose that the assumptions of Theorems 3.6 and 6.1 are fulfilled.Then the statistic J(I h T Y ) is consistent and asymptotically normal estimator for functional J(f ) with asymptotic variance σ 2 h (J) given by (3.10) and (3.11), that is, the asymptotic relation (3.14) is satisfied with I h T X (λ) replaced by the contaminated periodogram I h T Y (λ): where η is N (0, σ 2 h (J)) with σ 2 h (J) given by (3.10) and (3.11).

Robustness to small trends of parametric estimation
The next result shows that a small trend of the form |M (t)| ≤ C|t| −β , β > 1/4, does not effect the asymptotic properties of the Whittle estimator of an unknown spectral parameter θ, that is, the Whittle parametric estimation procedure is robust to the presence of a small trend in the model.
Theorem 6.3.Suppose that the assumptions of Theorem 6.1 with g = f −1 (λ, θ) • w(λ) are satisfied.Then under the conditions of Theorems 4.2 the Whittle estimator θ T Y , constructed on the basis of the contaminated periodogram I h T,Y (λ), is consistent and asymptotically normal estimator for an unknown spectral parameter θ, that is, the asymptotic relation (4.8) is satisfied with I h T X (λ) replaced by the contaminated periodogram I h T Y (λ): where the matrix R(θ 0 ) is defined in (4.9).
Remark 6.3.In the non-tapered case, Theorems 6.1 -6.3 were proved in Ginovyan and Sahakyan [48].In the tapered case, the theorems can be proved similarly by using the tapered tools stated in Section 7.

Methods and tools
In this section we briefly discuss the methods and tools, used to prove the results stated in Sections 3-6.

Approximation of traces of products of Toeplitz matrices and operators.
The trace approximation problem for truncated Toeplitz operators and matrices has been discussed in detail in the survey paper Ginovyan et al. [51] in the non-tapered case.Here we present some important results in the tapered case, which were used to prove the results stated in Sections 3-6.
Given a real number T > 0 and an integrable real symmetric function ψ(λ) defined on R, the T -truncated tapered Toeplitz operator (also called tapered Wiener-Hopf operator) generated by ψ and a taper function h, denoted by W h T (ψ) is defined as follows: where ψ( Observe that, in view of (2.20), (2.24), (7.1) and (7.2), we have What happens to the relation (7.3) when A h T (ψ) is replaced by a product of Toeplitz matrices (or operators)?Observe that the product of Toeplitz matrices (resp.operators) is not a Toeplitz matrix (resp.operator).
The idea is to approximate the trace of the product of Toeplitz matrices (resp.operators) by the trace of a Toeplitz matrix (resp.operator) generated by the product of the generating functions.More precisely, let {ψ 1 , ψ 2 , . . ., ψ m } be a collection of integrable real symmetric functions defined on Λ.Let A h T (ψ i ) be either the T × T tapered Toeplitz matrix B h T (ψ i ), or the T -truncated tapered Toeplitz operator W h T (ψ i ) generated by a function ψ i and a taper function h.Define Proposition 7.1.Let ∆(T ) be as in (7.4).Each of the following conditions is sufficient for where u = (u 1 , u 2 , . . ., u m−1 ) ∈ Λ m−1 , belongs to L m−2 (Λ m−1 ) and is continuous at 0 = (0, 0, . . ., 0) ∈ Λ m−1 .
Remark 7.2.More results concerning the trace approximation problem for truncated Toeplitz operators and matrices can be found in Ginovyan and Sahakyan [46,47], and in Ginovyan et al. [51].

Central limit theorems for tapered quadratic functionals
In this subsection we state central limit theorems for tapered quadratic functional Q h T given by (3.3), which were used to prove the results stated in Sections 3-6.
Let A h T (f ) be either the T × T tapered Toeplitz matrix B h T (f ), or the T -truncated tapered Toeplitz operator W h T (f ) generated by the spectral density f and taper h, and let A h T (g) denote either the T ×T tapered Toeplitz matrix, or the T -truncated tapered Toeplitz operator generated by the functions g and h (for definitions see formulas (7.1) and (7.2)).Similar to the non-tapered case, we have the following results (cf.Ginovyan et al. [51], Grenander and Szegő [54], Ibragimov [61]).

The quadratic functional
3) has the same distribution as the sum ∞ j=1 λ 2 j,T ξ 2 j , where {ξ j , j ≥ 1} are independent N (0, 1) Gaussian random variables and {λ j,T , j ≥ 1} are the eigenvalues of the operator A h T (f ) A h T (g).
2. The characteristic function ϕ(t) of Q h T is given by formula: 3. The k-th order cumulant χ k (Q h T ) of Q h T is given by formula: Thus, to describe the asymptotic distribution of the quadratic functional Q h T , we have to control the traces and eigenvalues of the products of truncated tapered Toeplitz operators and matrices.

CLT for Gaussian models
We assume that the model process X(t) is Gaussian, and with no loss of generality, that g ≥ 0. We will use the following notation.By Q h T we denote the standard normalized quadratic functional: Also, we set where H 4 is as in (2.20).The notation will mean that the distribution of the random variable Q h T tends (as T → ∞) to the centered normal distribution with variance σ 2 h .The following theorems were proved in Ginovyan and Sahakyan [50].
The central limit theorem that follows was proved in Ginovyan and Sahakyan [49].
Remark 7.3.Notice that if the underlying process X(t) is Gaussian, then in formula (7.18) we have only the first term and so σ 2 L,h = σ 2 h (see (7.10)), because in this case κ 4 = 0. On the other hand, the condition (7.17) is more restrictive than the conditions in Theorems 7.1 -7.5.Thus, for Gaussian processes Theorems 7.1 -7.5 improve Theorem 7.6.For non-tapered case Theorem 7.6 was proved in Bai et al. [6].

Fejér-type kernels and singular integrals
We define Fejér-type tapered kernels and singular integrals, and state some of their properties.
For a number k (k = 2, 3, . ..) and a taper function h satisfying Assumption 2.1 consider the following Fejér-type tapered kernel function: The next result shows that, similar to the classical Fejér kernel, the tapered kernel F h k,T (u) is an approximation identity (see Ginovyan and Sahakyan [49], Lemma 3.4).e) If the function Q ∈ L 1 (R k−1 ) L k−2 (R k−1 ) and is continuous at v = (v 1 , . . ., v k−1 ) (L 0 is the space of measurable functions), then lim T > 0, (7.25)where C h is a constant depending on h.
•) is the Fourier transform of ψ(•), and L 2 ([0, T ]; h T ) denotes the weighted L 2 -space with respect to the measure h T (t)dt.Let h be a taper function satisfying Assumption 2.1, and let A h T (ψ) be either the T × T tapered Toeplitz matrix B h T (ψ), or the T -truncated tapered Toeplitz operator W h T (ψ) generated by a function ψ (see (7.1) and (7.2)).