Properties of linear spectral statistics of frequency-smoothed estimated spectral coherence matrix of high-dimensional Gaussian time series

The asymptotic behaviour of Linear Spectral Statistics (LSS) of the smoothed periodogram estimator of the spectral coherency matrix of a complex Gaussian high-dimensional time series $(\y_n)_{n \in \mathbb{Z}}$ with independent components is studied under the asymptotic regime where the sample size $N$ converges towards $+\infty$ while the dimension $M$ of $\y$ and the smoothing span of the estimator grow to infinity at the same rate in such a way that $\frac{M}{N} \rightarrow 0$. It is established that, at each frequency, the estimated spectral coherency matrix is close from the sample covariance matrix of an independent identically $\mathcal{N}_{\mathbb{C}}(0,\I_M)$ distributed sequence, and that its empirical eigenvalue distribution converges towards the Marcenko-Pastur distribution. This allows to conclude that each LSS has a deterministic behaviour that can be evaluated explicitly. Using concentration inequalities, it is shown that the order of magnitude of the supremum over the frequencies of the deviation of each LSS from its deterministic approximation is of the order of $\frac{1}{M} + \frac{\sqrt{M}}{N}+ (\frac{M}{N})^{3}$ where $N$ is the sample size. Numerical simulations supports our results.


The addressed problem and the results
We consider an M -variate zero-mean complex Gaussian stationary time series 1 (y n ) n∈Z and assume that the samples y 1 , . . . , y N are available. We introduce the traditional frequency smoothed periodogram estimateŜ(ν) of the spectral density of y at frequency ν defined bŷ where B is an even integer, which represents the smoothing span, and y n e −2iπ(n−1)ν (1.2) is the renormalized Fourier transform of (y n ) n=1...,N . The corresponding estimated spectral coherency matrix is defined as: where diag(Ŝ(ν)) =Ŝ(ν) I M , with denoting the Hadamard product (ie. entrywise product) and I M is the M -dimensional identity matrix. Under the hypothesis H 0 that the M components (y 1,n ) n∈Z , . . . , (y M,n ) n∈Z of y are mutually uncorrelated, we evaluate the behaviour of certain Linear Spectral Statistics (1.5) We notice that where r N (ν) is defined by α>2/3 appears to be u N which satisfies u N v N → 0.

Motivation
This paper is motivated by the problem of testing whether the components of y are uncorrelated or not when the dimension M of y is large and the number of observations N is significantly larger than M . For this, a possible way would be to estimate the spectral coherency matrix, equal to I M at each frequency ν under H 0 , by the standard estimateĈ(ν) defined by (1.3) for a relevant choice of B, and to compare, for example, the supremum over ν of the spectral norm Ĉ (ν) − I M to a threshold. To understand the conditions under which such an approach should provide satisfying results, we mention that under some mild extra assumptions, it can be shown that 1. In such a context, the predictions provided by the asymptotic regime B N → 0 and M B → 0 will not be accurate, and any test comparingĈ(ν) to I M for each ν will provide poor results. To solve this issue, we propose to choose B of the same order of magnitude as M . In this case,Ĉ(ν) has of course no reason to be close to I M for each ν. If M N , or equivalently if B N is small enough, the asymptotic regime where both M and B converge towards +∞ at the same rate appears relevant to understand the behaviour ofĈ(ν). We mention in particular that the condition α > 1/2 implies that the rate of convergence of M N towards 0 is moderate, which is in accordance with practical situations in which the sample size is not arbitrarily large. Our asymptotic results thus suggest that if M N is small enough and if B is chosen of the same order of magnitude as M , then it seems reasonable to test that the components of y are uncorrelated by comparing to a well chosen threshold, wherer N (ν) represents an estimate of r N (ν) accurate enough to keep equal to u N the convergence rate towards 0 of the modified statistics. We notice that our results just characterize the order of magnitude of the above statistics under H 0 , and that we do not provide asymptotic approximation of its distribution. While the derivation of such an approximation would be quite useful to design a well defined statistical test and to study and compare its performance with existing approaches, our results represent a first necessary step that has its own interest. We notice that we consider the supremum on the whole frequency interval [0, 1] because, compared to a solution where the maximum is over a low number of fixed frequencies, this allows to increase the power of the test in contexts of alternatives for which, under H 1 , (1.9) exhibits narrow peaks that would not be visible on a low density frequency grid. We also mention that other statistics could also be considered, e.g. the integral on the frequency domain of the function (1.9) or of the square of this function.
We finally remark that the most usual asymptotic regime considered in the context of large random matrices is M → +∞, N → +∞ in such a way that M N converges towards a non zero constant. In this regime, it is still possible to develop large random matrix-based approaches testing that the components of y are uncorrelated or not, see e.g. the contribution [29] to be presented below which, under the extra assumption that the components of y share the same spectral density, is based on a Gaussian approximation of linear spectral statistics of the empirical covariance matrixR N defined bŷ y n y * n (1. 10) under H 0 . However, when the ratio M N is small enough, the asymptotic regime considered in the present paper seems more relevant than the standard large random matrix regime M → +∞, N → +∞, and test statistics that depend on the estimated spectral coherency matrixĈ(ν) should provide better performance than functionals of the matrixR N .

On the literature
The problem of testing whether various jointly stationary and jointly Gaussian time series are uncorrelated is an important problem that was extensively addressed in the past. Apart from a few works that will be discussed later, almost all the previous contributions addressed the case where the number M of available time series remains finite as the sample size increases. Two classes of methods were mainly studied. The first class uses lag domain approaches based on the observation that M jointly stationary time series (y 1,n ) n∈Z , . . . , (y M,n ) n∈Z are mutually uncorrelated if and only if for each integer L, the covariance matrix of the ML dimensional vector y (L) n defined by y (L) n = (y 1,n , . . . , y 1,n+L−1 , . . . , y M,n , . . . , y M,n+L−1 ) T is block diagonal. The lag domain approach was in particular used in [17] for M = 2, and extended and developed in [24], [25], [19], [20], [8] and [12].

P. Loubaton and A. Rosuel
The second approach is based on the observation that the M jointly stationary time series (y 1,n ) n∈Z , . . . , (y M,n ) n∈Z are uncorrelated if and only the spectral density matrix S(ν) of y n = (y 1,n , . . . , y M,n ) T is diagonal for each frequency ν, or equivalently, if its spectral coherence matrix C(ν) is reduced to I M for each ν.
[35] is one of the first related contribution. This work was followed by [10], [33], as well as [11].
We now review the existing works devoted to the case where the number M of time series converges towards +∞. The particular context where the observations y 1 , . . . , y N are i.i.d. and where the ratio M N converges towards a constant d ∈ (0, 1) is the most popular. In contrast to the asymptotic regime considered in the present paper, M and N are of the same order of magnitude. This is because, in this context, the time series are mutually uncorrelated if and only the covariance matrix E[y n y * n ] is diagonal. Therefore, it is reasonable to consider test statistics that are functionals of the sample covariance matrixR N defined by (1.10). In particular, when the observations are Gaussian random vectors, the generalized likelihood ratio test (GLRT) consists in comparing the test statistics log det(Ĉ N ) to a threshold, whereĈ N represents the sample autocorrelation matrix. [21] proved that under H 0 , the empirical eigenvalue distribution ofĈ N converges almost surely towards the Marcenko-Pastur distribution μ MP for each bounded continuous function f . In the Gaussian case, [23] also established a central limit theorem (CLT) for log det(Ĉ N ) under H 0 using the moment method. In the real Gaussian case, [7] remarked that detĈ N N/2 is the product of independent beta distributed random variables. Therefore, log det (Ĉ N ) appears as the sum of independent random variables, thus deducing the CLT. More recently, in [28] is established a CLT on LSS ofĈ N in the Gaussian case using large random matrix techniques when the covariance matrix E[y n y * n ] is not necessarily diagonal. This allows studying the asymptotic performance of the GLRT under a certain class of alternatives. We also mention that [22] studied the behaviour of max i,j |(Ĉ N ) i,j | under H 0 , and established that max i,j |(Ĉ N ) i,j |, after recentering and appropriate normalization, converges in distribution towards a Gumbel distribution, which, of course, allows to test the hypothesis H 0 . This first contribution was extended later in several works, in particular in [6] who considered the case where the samples y 1 , . . . , y N have some specific correlation pattern. Still, in the asymptotic regime M N → d, [29] proposed to test hypothesis H 0 when the components of y share the same spectral density. In this case, the rows of the M × N matrix (y 1 , . . . , y N ) are independent and identically distributed under H 0 . [29] established a central limit theorem for linear spectral statistics of the empirical covariance matrixR N defined by (1.10), and used this test statistics to check whether H 0 holds or not. We notice that the results of [29] are valid in the non-Gaussian case.
In our knowledge, no existing work studied the behaviour of linear spectral statistics of the matrixĈ(ν) in the asymptotic regime defined in the present paper. However, we mention that this regime was considered in [3] to solve a completely different problem, i.e. the use of shrinkage in the frequency domain in order to enhance the performance of the spectral density estimate (1.1) when the components of y are not uncorrelated. We notice that B 3/2 N is supposed to converge towards 0 in [3]. When B = O(N α ), this condition is equivalent to α < 2/3, while we rather study situations where α > 1/2. We finally mention that our works [27] and [31] also consider the present asymptotic regime and study respectively the behaviour of sup i<j,ν∈G N |Ĉ i,j (ν)| (G N is the set {k B+1 N , k = 0, . . . , N B+1 }) and the largest eigenvalues ofĈ(ν) in the presence of an extra signal, independent from y, and having a low-rank spectral density matrix.

General approach
To simplify the notations, we denote by ψ N (f, ν) the statistics defined by To study the behaviour of sup ν |ψ N (f, ν)|, we establish exponential concentration inequalities that allow to evaluate is then obtained by using Lipschitz properties of the function ν → ψ N (f, ν).
To evaluate P(|ψ N (f, ν)| > N u N ) for each ν, we use the following approach: • We first study the behaviour of the modified sample spectral coherency matrixC(ν) defined bỹ We notice thatC(ν) is obtained fromĈ(ν) by replacing the estimated diagonal matrix diag Ŝ (ν) by its true value diag (S(ν)). Using classical results of [4], we establish that for each ν,C(ν) can be represented as is another matrix such that, for any > 0, there exists γ > 0, independent from ν, such that for each large enough N ∈ N: We deduce from (1.13) thatĈ(ν) can be written aŝ where Δ(ν) satisfies the concentration inequality for each > 0, where γ does not depend on ν. Using (1.13) and (1.14), we establish that the eigenvalues ofC(ν) andĈ(ν) are localized with high probability in a neighbourhood of the support of the Marcenko-Pastur distribution μ (c) MP . C(ν) appears as a useful intermediate matrix because the study of MP is based on the evaluation of each term of the following decomposition: Using the above-mentioned results related to the localization of the eigenvalues ofC(ν) andĈ(ν), we also argue that it is sufficient to do so when f is compactly supported.

B+1
. As the matrix X(ν) is Gaussian, it is possible to use standard Gaussian tools (Poincaré-Nash inequality and the integration by parts formula) to have a good understanding of the behaviour of Q(z), and to prove that for each > 0, there exists γ independent from ν such that whereφ N (f ) is a deterministic term defined as the action of f on a compactly supported distributionD N depending on μ MP .

5389
• Using a standard Gaussian concentration inequality as well as the structure of the matrixC(ν), we obtain that for each > 0, there exists γ independent from ν such that using the Helffer-Sjöstrand formula. We first show where p N andp N are the Stieltjes transforms of the compactly supported distributions D N andD N introduced previously. This immediately implies that if α ≤ 2 3 , then Gathering the above approximations and using the Lipschitz properties of the function ν → ψ(f, ν), we finally obtain (1.7).
We also indicate how the use of lag window estimators of the spectral densities (s m ) m=1,...,M allows to design an estimatorr N (ν) of r N (ν) defined by (1.8) for which the rate of convergence towards 0 of the statisticsψ N (f, ν) obtained by replacing r N (ν) byr N (ν) in Eq. (1.11) is still u N . In particular, we establish that for each > 0, P sup ν |ψ N (f, ν)| > N u N converges towards 0 exponentially. We now formulate the following assumptions on the growth rate of the quantities N, M, B:
We denote by (s m ) m≥1 the corresponding sequence of spectral densities (i.e. s m coincides with the spectral density of the times series (y m,n ) n∈Z ). For each m ≥ 1, we denote by r m = (r m,u ) u∈Z the autocovariance sequence of (y m,n ) n∈Z , i.e. r m,u = E[y m,n+u y * m,n ]. We formulate the following assumptions on (s m ) m≥1 and (r m ) m≥1 : for i = 0, 1, 2, 3 (s It is easy to check that (1.18) holds for each γ 0 > 0, and that (1.17) is verified as well.

Notations.
A zero mean complex valued random vector y is said to be N C (0, Σ) distributed if E(yy * ) = Σ and if each linear combination x of the entries of y is a complex Gaussian random variable, i.e. Re(x) and Im(x) are independent Gaussian random variables sharing the same variance. If x is a random variable, we denote by x • the random variable defined by (1.20) If A is a P ×Q matrix, A and A F denote its spectral norm and Frobenius norm respectively. If P = Q and A is Hermitian, λ 1 (A) ≥ . . . ≥ λ P (A) are the eigenvalues of A. The spectrum of A, which is here the set of its eigenvalues (λ k (A)) k=1,...,P , is denoted by σ(A). For A and B square Hermitian matrices, if all the eigenvalues of A − B are non negative, we write A ≥ B. We define Re A = (A+A * )/2 and Im A = (A−A * )/2 where A * is the conjugate transpose of the matrix A.

P. Loubaton and A. Rosuel
C p represents the set of all real-valued functions defined on R whose first p derivatives exist and are continuous, and C p c is the set of all compactly supported functions of C p .
We recall that S(ν) represents the M ×M diagonal matrix S(ν) = diag(s 1 (ν), . . . , s M (ν)). We notice that S depends on M , thus on N (through M := M (N )), but we often omit to mention the corresponding dependency in order to simplify the notations. In the following, we will denote by y m the N -dimensional vector y m = (y m,1 , . . . , y m,N ) T .
A nice constant is a positive a constant that does not depend on the frequency ν, the time series index m, the complex variable z of the various resolvents and Stieltjes transforms used throughout the paper, as well as on the dimensions B, M and N . A nice polynomial is a polynomial whose degree and coefficients are nice constants. If z ∈ C + and if P 1 and P 2 are two nice polynomials, terms such as P 1 (z)P 2 ( 1 Imz ) play an important role in the following. C and C(z) will represent a generic notation for respectively a nice constant and a term P 1 (z)P 2 ( 1 Imz ), and the values of C and C(z) may change from one line to the other.
We also recall how a function can be applied to Hermitian matrices. For an M × M Hermitian matrix A with spectral decomposition UΛU * where Λ = diag(λ m , m = 1, . . . , M) and the (λ m ) m=1,...,M are the real eigenvalues of A, then for any function f defined on R, we define f (A) as: . . .
C + is the upper half-plane of C, i.e. the set of all complex numbers z for which Im z > 0.
For μ a probability measure, its Stieltjes transform s μ is the function defined on C \ Supp μ as We recall that for each z ∈ C + . Moreover, if μ is carried by R + , then for any a > 0, the function − 1 z(1+asμ(z)) is also the Stieljes transform of a probability distribution carried by R + , a property which implies that for each z ∈ C + (see [15], Proposition 5-1, item 4).
If λ 1 , . . . , λ M denote the eigenvalues of a Hermitian matrix A and if μ : δ λi denotes the empirical eigenvalue distribution of A, then we have the following relation: where Q A (z) represents the resolvent of A defined by We finally mention the following useful control for the norm Q A . For each z ∈ C + , we have (1.25)

Overview of the paper
We first recall in Section 2 useful technical tools: in Paragraph 2.1, the concept of stochastic domination adapted from [13] which allows to considerably simplify the exposition of the following results, in Paragraph 2.2 some useful properties of the extreme eigenvalues and of the resolvent of large Wishart matrices, two well-known Gaussian concentration inequalities expressed using the stochastic domination framework in Paragraphs 2.3 and 2.4, and the Helffer-Sjöstrand formula in Paragraph 2.5. We establish in Section 3 the stochastic representations (1.13) and (1.14) ofC(ν) andĈ(ν). In Section 4, we prove for each ν the concentration of |ψ N (f, ν)| defined by (1.11), and indicate how it is possible to estimate the term r N (ν) in order to keep equal to u N the rate of convergence of the statisticsψ N (f, ν) obtained by replacing r N (ν) byr N (ν) in (1.11). In Section 5, we establish Lipschitz properties for the functions ν → ψ N (f, ν) and ν →ψ N (f, ν) that allow to establish the concentration of sup ν |ψ N (f, ν)| and sup ν |ψ N (f, ν)|. We finally provide in Section 6 some numerical simulations that support our results.

Stochastic domination
We now present the concept of stochastic domination introduced in [13]. A nice introduction to this tool can also be found in the lecture notes [2].
be two families of nonnegative random variables, where U (N ) is a set that may possibly depend on N . We say that X is stochastically dominated by Y if for all (small) > 0, there exists some γ > 0 (which of course depends on ) such that: To simplify the notations, we will very often denote X (N ) ≺ Y (N ) or X ≺ Y when the context will be clear enough.
Moreover, if for some complex valued family X we have |X| ≺ Y we also write Finally, we say that a family of events Ξ = Ξ (N ) (u) holds with exponentially high (small) probability if there exists N 0 and γ > 0 such that for N ≥ N 0 , is a sequence of positive random variables, satisfying X N ≺ a N N for any > 0 for some positive real numbers sequence (a N ) N ∈N . It turns out that this precisely means that X N ≺ a N . Indeed, consider an arbitrary > 0. By the stochastic domination property of X N , one can take such that 0 < < and write which goes to zero exponentially since X N ≺ a N N for the chosen. This argument will be used in the proof of Lemma 4.2.
Lemma 2.1. Take four families of non negative random variables X 1 , X 2 , Y 1 and Y 2 defined as in Definition 2.1. Then the following holds: We omit the proof of this lemma.

Remark 2.2.
Note that Definition 2.1 is slightly different from the original one [13] which states that the left hand side of (2.1) should be bounded by a quantity of order N −D for any finite D > 0. In the present paper, all the random variables are Gaussian, and exponential concentration rates can be achieved.

Properties of the eigenvalues and of the resolvent of large Wishart matrices
In this paper we will at multiple occasion use properties of the eigenvalues of matrices

Concentration of the largest and the smallest eigenvalues
We first recall concentration results of the largest and smallest eigenvalue of [14]. We have for any > 0 and the event: It is clear that using (2.2) and (2.3), Λ N, holds with exponentially high probability for any > 0. This will be of high importance in the following since it will enable us to work on events of exponentially high probability where the norm of X N X * N B+1 and the norm of its inverse are bounded.
Finally, the following (weaker) statement is a simple consequence of the equations (2.2) and (2.3), which will sometimes be enough in the following: We finally notice that if we consider a family is a certain set possibly depending on N , then (2.2) and (2.3) hold for each u ∈ U (N ) because the constant C in (2.2) and (2.3) is universal. This implies that the stochastic domination (2.5) is still satisfied by the family X N (u), u ∈ U (N ) . Moreover, the family of events Λ N, (u) defined by (2.4) when X N is replaced by X N (u) still holds with exponentially high probability.

Asymptotic behaviour of the resolvent of
We next review known results related to the asymptotic behaviour of the resolvent Q N (z) of the matrix X N X * N B+1 that can be deduced from standard Gaussian tools. The Poincaré-Nash inequality (see e.g. [30, Proposition 2.1.6] in the Gaussian real case and Eq. (18) in [16] in the complex Gaussian case) implies immediately that the following Lemma holds.

Lemma 2.2. Consider deterministic M × M and (B + 1) × (B + 1) matrices A andÃ. Then, it holds that
We recall that C(z) represents a generic notation for P 1 (z)P 2 ( 1 Imz ) where P 1 and P 2 are nice polynomials.
The integration by parts formula states that if h(X, X * ) is a C 1 function of the entries of X and X * with polynomially bounded first derivatives, then, it holds that in conjunction with the Poincaré-Nash inequality, allows to evaluate easily the asymptotic behaviour of the entries of E(Q N (z)) (see e.g. [30]). We first notice that properties of the distribution of the matrix X N immediately imply that E(Q N (z)) is reduced to β N (z)I M where β N (z) coincides with E(Q m,m (z)) for each m. Then, it holds that MP . In other words, t N (z) is the unique Stieltjes transform satisfying the equation . (2.10) It is also convenient to definet N (z) bỹ so that t N (z) is also given by .
It is well-known thatt N (z) is the Stieltjes transform of the probability distribution c N μ

Concentration of functionals of Gaussian entries
It is well-known (see e.g. [34, Th. 2.1.12]) that for any 1-Lipschitz real valued function f defined on R N and any N -dimensional random variable X ∼ N (0, I N ), there exists a universal constant C such that: This inequality continues to remain valid when X ∼ N C (0, I N ): in this context, f (X) is replaced by a real-valued function f (X, X * ) depending on the entries of X and X * . f (X, X * ) can of course be written as f (X, 2Im(X)). We just finally mention that f , considered as a function of (X, X * ), andf have Lipschitz constants that are of the same order of magnitude. More precisely, if we define the differential operators ∂ ∂z and ∂ ∂z by Within the stochastic domination framework, the concentration inequality (2.14) implies that for a family The proof is immediate: consider > 0 and obtain that for each u as expected. This result can easily be extended in the complex case, ie. when X N (u) ∼ N C (0, I N ).

Hanson-Wright inequality
The Hanson-Wright inequality [32] is useful to control deviations of a quadratic form from its expectation. While it is proved in the real case in [32], it can easily be understood that it can be extended in the complex case as follows: let We now write (2.15) in the stochastic domination framework. Consider a family of independent N C (0, 1) random variables (X n (u)) n=1,...,N where u ∈ U (N ) and a sequence of N × N matrices A N (u) that possibly depend on u.
We can therefore rewrite (2.16) as the following stochastic domination:

Helffer-Sjöstrand formula
If μ is a probability measure, the Helffer-Sjöstrand formula can be seen as an alternative to the Stieltjes inversion formula that allows to express fdμ in terms of the Stieltjes transform s μ (z) of μ (see (1.21)) when f is a regular enough compactly supported function. In order to introduce this tool, we consider a class C k+1 compactly supported function f for a certain integer k, and denote where ρ : R → R + is smooth, compactly supported, with value 1 in a neighbourhood of 0. Function Φ k (f ) coincides with f on the real line and extends it to the complex plane. Let∂ = ∂ x + i∂ y . It is well-known that (a proof of this result can be found in [9] or [18]) if y belongs to the neighbourhood of 0 in which ρ is equal to 1. The Helffer-Sjöstrand formula can be written as In order to understand why the integral at the right hand side of (2.19) is well defined, we take, to fix the ideas, ρ ∈ C ∞ such that ρ(y) = 1 for |y| ≤ 1 and ρ(y) = 0 for |y| > 2, and denote by [a 1 , a 2 ] an interval containing the support of f . Then, it appears that the integral on C + is in fact over the compact set . Therefore, the right hand side of (2.19) is well defined.
We finally mention that the Helffer-Sjöstrand formula remains still valid for any compactly supported distribution D (see e.g. [26], section 9). The Stieltjes transform of D, denoted by s D (z), is defined for each z ∈ C + as the action of for each z ∈ C + where n 0 is related to the order of the distribution. We refer the reader to [5] (Theorem 4.3) and the references therein for more details on Stieltjes transforms of distributions. Then, if f is a C ∞ function supported by [a 1 , a 2 ], < D, f > is given by for k ≥ n 0 . We also recall that an alternative expression for < D, f > is given by the Stieltjes inversion formula, also valid for distributions, i.e.

Stochastic representations ofC(ν) andĈ(ν)
The first step is to show thatC(ν) andĈ(ν) can be approximated by the sample covariance matrix of a sequence of i.i.d. Gaussian random vectors, and to control the order of magnitude of the corresponding errors. This is the objective of the following result.
and two matrices (Δ N (ν), Δ N (ν)) such that: In particular, Theorem 3.1 allows to make precise the location of the eigenvalues ofC N (ν) andĈ N (ν). In order to formulate the corresponding result, we define some notations. We introduce the events ΛC N, (ν) and ΛĈ N, (ν) defined by Then, we establish in the following the Corollary: and ΛĈ N, (ν), N ≥ 1, ν ∈ [0, 1] hold with exponential high probability.

Step 1: stochastic representation ofC
In order to establish (3.1), we prove the following Proposition.
where the family of random variables Proof. Denote by Σ the M × (B + 1) random matrix defined by where we recall that the normalized Fourier transform ξ y is defined in (1.2), so thatŜ defined in (1.1) is equal to ΣΣ * /(B + 1). Denote by ω m the m-th row of Σ. In other words, ω m coincides with the (B + 1)-dimensional Gaussian complex row vector defined by: where the error is uniform over m ≥ 1 and ν ∈ [0, 1]. Therefore one can claim that there exists some Hermitian matrix Υ m (ν) and some nice constant C such that: Moreover, the regularity of the mapping ν → s m (ν) specified in Assumption 1.4 implies that there exists quantities m such that: where: If we define matrix Φ m as: where we recall that v N is defined by (1.5). The spectral norm of Φ m can be roughly bounded by the following inequality: Moreover, it is easily checked that the Frobenius norm of Φm B+1 satisfies Using the Gaussianity of the vector ω m and the expression (3.8), we obtain that ω m can be represented as where x m1 and x m2 are independent for m 1 = m 2 . This comes from the mutual independence of the time series ((y m,n ) n∈Z ) m=1,...,M . It is clear that can be written as Therefore, it holds that: We denote by X and Γ the M × (B + 1) matrices with rows (x m ) m=1,...,M , and (x m Ψ m ) m=1,...,M respectively. Then, it holds that where we recall that Σ is defined by (3.7). We recall the definition of the matrix C given bỹ The representation (3.14) implies thatC can also be written as Equivalently, for each m 1 , m 2 , the entry (C) m1,m2 is given by This completes the proof of (3.5). It remains to show (3.6). We denote by Z the M × M matrix Z = 1 B+1 ΓΓ * . As Z satisfies it is enough to prove the two following facts: We start with (3.17). The definition of Γ leads to It remains to prove (3.18). We use the observation that )h|, and use a classical -net argument that allows to deduce the behaviour of Z−E[Z] from the behaviour of any recentered quadratic form g * Zg − Eg * Zg where g ∈ C M is a deterministic unit norm vector. We thus first concentrate g * Zg − Eg * Zg using the Hanson-Wright inequality (2.17). For this, we need to express g * Zg as a quadratic form of a certain complex Gaussian random vector with i.i.d. entries. We denote by z the M -dimensional random vector Therefore, z can be written as z = G 1/2 w for some w ∼ N C (0, I M ) random vector. As a consequence, the quadratic form g * Zg − Eg * Zg can be written as The Hanson-Wright inequality (2.17) can now be applied: The substitution of (3.20) in equation (3.19) gives the following control of g * Zg − Eg * Zg: Consider > 0, and an -net N of C M , that is a set of C M unit norm vectors {h k : k = 1, . . . , K} such that for each unit norm vector u ∈ C M , there exists a vector h ∈ N for which u − h ≤ . It is well known that the cardinality of N is bounded by C 0 1 2M where C 0 is a universal constant. Then, denote g s a (random) unit norm vector such that |g * s Zg s − Eg * s Zg s | = Z − EZ , and define h s ∈ N as the closest vector from g s . Therefore, we have It is clear that: This implies that for each t > 0, . Using the union bound, we obtain that Here, we would like to use equation (3.21). By the definition of ≺, (3.21) is valid uniformly on any set of vector with cardinality polynomial in N . Here, the cardinality of the set N is a O( −2M ) term and therefore exponential in M . As a consequence, we have to accept to lose some speed when going from the stochastic domination of |g * (Z − EZ)g| for a fixed g to the same stochastic domination but uniformly over N . More specifically, write again (3.21) but here without the notation ≺ in order to understand precisely how a change in speed affects the probability. Take t N a sequence of positive numbers such that t N ≥ B 2 /N 2 . Using the estimates (3.20) of G and G 2 F , and the fact that min(a 1 , a 2 ) > min(b 1 , b 2 ) when a 1 > b 1 and a 2 > b 2 , we obtain that there exists some nice constant C > 0 such that: The Hanson-Wright inequality (2.15) provides: for some nice constant C that depends on C 1 . Finally, the union bound on N gives:

P. Loubaton and A. Rosuel
If we take t N = N (B 2 /N 2 ), then, there exists γ > 0 such that holds for each N large enough. (3.22) thus implies (3.18). This completes the proof of (3.5).
Corollary 3.2 is a rewriting of Proposition 3.1 in a more concise way. Define: The result is immediate using decompositionΔ from (3.24): We now take benefit of Corollary 3.2 to establish the first part of Corollary 3.1 and to analyse the location of the eigenvalues of matricesŜ. We denote by D andD the matrices D = D(ν) := diag(S(ν)) Therefore, the event Therefore,

Step 2: estimates forŝ m (ν)
We Roughly speaking, this ensures that with exponentially high probability,ŝ m stays bounded and away from zero. This result implies the following (weaker) statement, but will still be enough for some proofs and reduces the complexity of the arguments.
Proof. See Appendix A.3

Step 3: stochastic representation ofĈ
We are now in a position to prove the result concerningĈ of Theorem 3.1 and of Corollary 3.1.
Proof. We have first to control the operator norm of: The operator norm of Δ has already been proved in Corollary 3.2 to satisfy Δ ≺ ( B N ). Moreover, recall that Θ can be written as a function ofD −1/2 − D −1/2 in (3.30), so that one can use Lemma 3.2 and Lemma 3.3 to dominate each term and get: Summing the estimate of Θ and the one ofΔ, one gets: which is the desired result.
As a consequence, we state here Corollary 3.4 about the localization of the eigenvalues ofĈ(ν).
Proof. We simply write: and use the same arguments as in the proof of Corollary 3.3.

Stochastic domination of the family ψ N (f, ν), N ≥ 1, ν ∈ [0, 1]
We have first to define the distribution D N introduced in the definition (1.11) of ψ N (f, ν). For this, we consider the function p N (z) defined by where we recall that t N andt N are defined by (2.10) and (2.11). Then (see Lemma 9.2 in [26]), p N is the Stieltjes transform of a distribution whose support is contained in the support Supp μ MP . This distribution is D N introduced in (1.11). In the following, we consider LSS for function f satisfying the following assumptions.   1.1, 1.2, 1.3 and 1.4, the family |ψ N (f, ν) Before starting the proof of Theorem 4.1, we first mention that it is sufficient to establish (4.2) when f is compactly supported by a neighbourhood of Supp μ (c) MP . To justify this claim, we consider κ > 0 and define χ : R → R as a C ∞ function such that: where the last inequality follows from the observation that 1 M tr f (Ĉ) = 1 M trf (Ĉ) on ΛĈ κ (ν). Moreover, the family of events ΛĈ κ (ν) holds with exponential high probability, which implies that P (ΛĈ κ (ν)) c converges towards 0 exponentially fast. Therefore, |ψ N (f, ν)| ≺ u N implies (4.2) as expected. From now on, we thus assume that the function f is supported by Supp μ In order to establish (4.2), we evaluate the four terms of the righhandside of (1.15).

Step 1: evaluation of
We evaluate this term using the Helffer-Sjöstrand formula. We keep the notations of paragraphs 2.5 and 2.2: we assume that the support of f is included in [a 1 , a 2 ] with a 1 = (1 − √ c N ) 2 − 2κ and a 2 = (1 + √ c N ) 2 + 2κ. Moreover, the resolvent of the matrix X N X * N B+1 is denoted Q N (z) (we omit to mention that the matrices depend on ν), and β N (z) represents E((Q N (z)) mm ) for each m. We also denote by N (z) the error term defined by (2.9) which satisfies | N (z)| ≤ 1 M 2 P 1 (|z|)P 2 ( 1 Imz ) on C + for some nice polynomials P 1 and P 2 . Then, for k ≥ deg(P 2 ), it holds that where D is defined as in paragraph 2.5. D ∂ Φ k (f )(z) P 1 (|z|)P 2 ( 1 Imz ) dx dy is finite, and by (2.9), the following bound holds: for some nice constant C. We have therefore established the following result.

Step 2: evaluation of 1 M Tr f (C(ν)) − E 1 M Tr f (C(ν))
In order to evaluate the above term, we use the Gaussian concentration inequality introduced in Paragraph 2.3. We recall thatC can be interpreted as a function of (X, X * ) (see (3.16))). Therefore, 1 M Tr f (C(ν)) can be written as g(X, X * ) for some real valued function g. We establish in the following that g is O( 1 B )-Lipschitz, which in turn, will imply that For this, we evaluate (4.6) Using classic identities for the derivation of Hermitian matrices, we obtain that 1 M

P. Loubaton and A. Rosuel
Straightforward calculations lead to Using sup i I + Φ i ≤ C for some nice constant C as well asC = 1 B+1 (X + Γ)(X + Γ) * , we obtain immediately that As f ∈ C ∞ and is compactly supported, the function λ → λ f 2 (λ) is bounded by some constant, and there exists a nice constant C such that This proves that g is O( 1 B )-Lipschitz. Paragraph 2.3 thus leads to (4.5).

Step 3: evaluation of 1 M Tr f (Ĉ(ν)) − 1 M Tr f (C(ν))
The goal of this paragraph is to establish the following Proposition.

MP ) with Stieltjes transformp
(4.7) Then, if we denote <D N , f > byφ N (f ), we have φ N (f ) v N , and its substraction from 1 M Tr f (Ĉ(ν)) − 1 M Tr f (C(ν)) allows to retrieve a term stochastically dominated by u N .

Remark 4.2. We notice that (3.35) leads immediately to
an approximation which is considerably more pessimistic than (4.8). As seen below, the derivation of (4.8) is rather demanding, and is based on subtle effects. In order to understand why (4.9) can be improved, we consider the simple case f (λ) = log λ. We thus have which depends only on the estimators (ŝ m (ν)) m=1,...,M . We just provide a brief analysis of the above term. For this, we first remark that it is possible to study then follows from the Hanson-Wright inequality. Putting all the pieces together, and using that 1 Comparing this result with (4.8), we deduce that <D N , f >= −1. We just check this formula directly. For this, we notice that function z → log z is holomorphic inside a neighbourhood of the interval [a 1 , a 2 ]. We consider the expression (2.21) of <D N , f > and remark that if (∂R ) − denotes the negatively oriented contour then, by (2.21), <D N , f > can also be written as the contour integral

Using the expression ofp N (z) and the integration by part trick, we get that
Taking the limit → 0, and using the Stieltjes inversion formula for the Marcenko-Pastur distribution μ (c N ) MP , we finally obtain that which is the expected result.
Proof. We now establish (4.8). In order to simplify the notations, we put The Helffer-Sjöstrand formula implies that

Reduction to the study of ζ
We define where we recall that the row vectors (x m ) m=1,...,M are the rows of the i.i.d. matrix X. We establish in this paragraph that (4.8) will then follow directly from (4.13).
Plugging in the integral expression of ζ, and using the expression (4.7), we get: We recall the definition of Θ :=Ĉ −C from (3.29). We will proceed in three steps, which, in turn, will imply (4.13): Step 1. Using the well-known identity We claim that it is possible to approximate trQΘQ by tr QΘQ. Indeed, we have The following rough bounds are enough to control T 1 (we used (1.25) to control the norm of the resolvents): Concerning T 2 and T 3 , we write similarly thatQ − Q = −QΔQ, and obtain that Plugging these estimations into the left hand side of (4.14), we obtain that Moreover, the concentration results (3.35) for Θ and (3.26) for Δ from Proposition 3.1, imply that This finally establishes (4.14).
Therefore, using (4.23): so that the left hand side of (4.18) is recognised in the right hand side of (4.24). We can finally prove (4.15) by following the same idea as in Step 1: This proves (4.15) and ends Step 2.
Step 3. By definition of the resolvent, the following identity holds XX * B+1 − zI M Q(z) = I M , which leads to the so-called resolvent identity: Using (4.25) as well the identity Q (z) = Q 2 (z) one can write: where θ m is some random quantity betweenŝ m and s m . Therefore (4.26) becomes holds on ΛD (ν), where we recall that C(z) can be written as P 1 (|z|)P 2 ( 1 Imz ) for some nice polynomials P 1 and P 2 . Following again the same argument as in Step 1, we obtain that

P. Loubaton and A. Rosuel
We have thus shown that We denote by η N (z) the term defined by and define δ N as In order to establish (4.16), it is sufficient to prove that |δ N | ≺ u N . For this, we first remark thatŝ m = s m , so that η N (z) can also be written as We express η N (z) as η N (z) = η 1,N (z) + η 2,N (z) + η 3,N (z) where (η i,N ) i=1,2,3 are defined by 2,3 to δ N . We recall the definition (1.20) of ((zQ) mm ) • . In order to evaluate δ 1,N , we remark that |(zQ) mm | = |Q mm + zQ 2 mm | ≤ C(z) and that Therefore, for k large enough, δ 1,N satisfies |δ 1, The Hanson-Wright inequality as well as the bound (3.10) of the Frobenius norm of Φ m imply that |δ 1,N | ≺ u N . We now evaluate δ 2,N . For this, we notice that the results reviewed in Paragraph 2.2.2 imply that E(zQ) mm = (zβ N (z)) = (zt N (z)) + (z N (z)) where |(z N (z)) | ≤ C(z) M 2 . Therefore, using (3.9), we obtain that In order to address δ 3,N , we interpret δ 3,N as a function g of (X, X * ), and use the Gaussian concentration inequality presented in Paragraph 2.3. In particular, we verify that As E(δ 3,N ) = 0, this leads immediately to |δ 3,N | ≺ u N . We just check that For this, we express (zQ) mm as (zQ) mm = Q mm + zQ 2 mm and notice that Using the Jensen inequality, we obtain that Summing over i, j leads to the expected evaluation of (4.29) and to |δ 3,N | ≺ u N . This, in turn, completes the proof of (4.16) and of (4.13). Up to the Lemma 4.2 and Lemma 4.4, Theorem 4.1 is proved.

Proof of Lemma 4.2 and Lemma 4.4
We now establish Lemma 4.2 and Lemma 4.4.
Recall that x m 2 2 is a χ 2 2(B+1) random variable. Therefore it is clear that: Knowing this, the idea is to show that, conditioned on the event where the random variables as well as A (ν) = ∩ M m=1 A m, (ν). It is clear that the family of events A m, (ν), m = 1, . . . , M, ν ∈ [0, 1] holds with exponentially high probability, and that the same property holds for the family A (ν), ν ∈ [0, 1]. We claim that there exists a family of C ∞ functions (g B, ) B≥1 satisfying for each B, where C is a nice constant. Indeed consider h ∈ C ∞ such that it satisfies |h(t)| ≤ 2|t| for each t and Then, it is easy to check that the family (g B, ) B≥1 defined by satisfies the requirements (4.32).

P. Loubaton and A. Rosuel
We have therefore established that Therefore, it remains to prove that |ζ 2, − E(ζ 2, )| ≺ B B . This is true by Lemma 4.3 below. The stochastic domination relation |ζ 1 − Eζ 1 | ≺ B B is proved similarly. This completes the proof of Lemma 4.2.
Proof. In the following, we evaluate the norm square of the gradient ofζ 2, w.r.t.
the variables X i,j , X * i,j and just compute i,j ∂ζ2, ∂Xij is of the same order of magnitude.
We recall that Moreover it is clear that Collecting the derivatives (4.33) and (4.34) we get after some algebra that It remains to control i,j ∂ζ2, ∂Xij 2 . From the integral representation ofζ 2, , the derivative with respect to X ij is applied only on the integrand as follows: Plugging in the derivative computed in (4.35) we get: Using the bounds of g B, and g B, from inequalities (4.32), the observation that g B, (t) = 0 if |t − 1| ≥ 2B √ B , and that |z| is bounded on D, one can write: It remains to sum over i, j.
Collecting the terms in T (1) ij , T (2) ij and T ij , and since M/(B + 1) = O(1) by Assumption 1.3, we can write: As Q 4 + Q 5 + |z| Q 6 ≤ C(z), we obtain that for k large enough, as expected.
It remains to study E[ζ], and establish the following Lemma.
Proof. As in the proof of Lemma 4.2, we only consider Apply now the Cauchy-Schwartz inequality: As it is clear that E for some nice constant C. Using (4.37) in (4.36), we get that for k large enough: This completes the proof of Lemma 4.4.

Remark 4.3.
We notice that, instead of using (1.15), an alternative approach to study 1

MP could have been based on the decomposition
The first term of the r.h.s. of (4.38) can be addressed using the Gaussian concentration inequality. However, the calculations are more complicated than the evaluation of 1 M Tr f (C(ν)) − E 1 M Tr f (C(ν)) because, considered as a function of (X, X * ), 1 M Tr f (Ĉ(ν)) is not a Lipschitz function. Using techniques similar to those developed to evaluate ζ − E(ζ) (see Lemma 4.2), it could however be shown that In order to evaluate the second term of the r.h.s. of (4.38), one should prove that and E(ζ) = O( 1 B ). The proof of (4.40) does not appear simpler than the proof of (4.13): the 3 steps that allowed to establish (4.13) should still be used, except that the stochastic domination properties should be replaced by properties of the mathematical expectation of the various terms. However, proving stochastic domination appears simpler than showing the desired properties of the above mathematical expectations. In sum, while the use of decomposition (4.38) allows to avoid Lemma 4.2, the justification of (4.39) needs to develop tools that are similar to those of Lemma 4.2, and the proof of (4.40) tends to be more complicated than the proof of (4.13). This explains why we have chosen to use decomposition (1.15) rather than (4.38).

Step 4: evaluation of E
The Helffer-Sjöstrand formula implies that Therefore, we are back to evaluate E 1 M tr (Q N (z) − Q N (z)) .
In order to simplify the exposition of the results of this paragraph, we introduce the following notation. If (h N (z)) N ≥1 is a sequence of complex-valued functions defined on C + and if (w N ) N ≥1 is a sequence of positive real numbers, the notation h N (z) = O z (w N ) means that there exists two nice polynomials P 1 and P 2 such that |h N (z)| ≤ w N P 1 (|z|)P 2 ( 1 Imz ) for each z ∈ C + .
In this paragraph, we establish the following Proposition.
can be written as The Helffer-Sjöstrand formula thus leads to the following Corollary: ) is given by (4.42) MP . In particular, establishing (4.41) (and thus (4.42)) will complete the proof of Theorem 4.1.
Proof. The proof of (4.41) is based on the Gaussian tools reviewed in Paragraph 2.2.2, and needs long and very tedious calculations. Therefore, we just provide a sketch of proof. In particular, we justify that term, but do not establish its expression (4.41).
The starting point of the proof is to expressQ − Q as Therefore, E 1 M tr (Q N (z) − Q N (z)) can be written as It is clear that the moduli of the second and third terms of the right hand side of (4.43) are controlled by C(z)E( Δ 2 ) and C(z)E( Δ 3 ) respectively. We now state the following useful Lemma, proved in the Appendix, which implies that these terms are (4.44) For this, we first express E 1 M tr Q 2Δ as The third term of the right hand side is clearly O z (( B N ) 2 ). We thus need to check that the first two terms are also O z (( B N ) 2 ). We just verify this property for the first term. For this, we evaluate E 1 M tr Q ΓX * B+1 using the Gaussian tools, and take the derivative w.r.t. z to obtain the expression of E 1 M tr Q 2 ΓX * B+1 .
In order to simplify the notations, we denote by W the matrix W = X √ B+1 , and denote by w 1 its M rows. In particular, the row m of the matrix Γ √ B+1 coincides with w m Ψ m where we recall that matrix Ψ m is defined by (3.12). If (e 1 , . . . , e m ) represents the canonical basis of C M , can be written as We now state the following Lemma whose proof is given in Appendix. We recall that β N (z) = E((Q N (z)) mm for each m.
Lemma 4.6. If A represents a (B + 1) × (B + 1) matrix, the following equality holds Using (1.23) in the case s μ (z) = β(z) as well as (2.9), we easily obtain that . We now use (4.45) for A = Ψ m , and differentiate (4.45) for A = Ψ m w.r.t. z. Using the Schwartz inequality the inequalities (2.6) and (2.7), and (3.13), we obtain immediately that and that It is easily checked that . Using (3.9), we thus obtain that We

Estimation of r N (ν)
The term sup ν |ψ N (f, ν)| depends on the unknown true spectral densities (s m ) m=1,...,M through the term r N (ν) defined by (1.8). In order to be able to use Theorem 4.1 in practice, it appears necessary to estimate r N (ν) by an accurate enough estimater N (ν), and to replace ψ r N (ν) has to be chosen in such a way that |ψ  satisfies It is easy to check that Ω (ν) F = O( L 3/2 N 1/2 ) and therefore that R The Hanson-Wright inequality leads immediately to |ŝ m,L (ν) − E(ŝ m,L (ν))| ≺ L 3/2 N 1/2 . Moreover, it is easy to check that (1.18) implies that

Lipschitz properties
The goal of this paragraph is to prove the following Proposition.
In the following, we just establish (5.2). For this, we evaluate separately the Lipschitz constants of ν → 1 M tr f (Ĉ(ν)) and of ν →r N (ν).

Lipschitz constant of
To show that ν → 1 M tr f (Ĉ(ν)) is MN 3/2 -Lipschitz with overwhelming probability, we need to establish a number of intermediate properties. We also claim that sup In order to verify (5.6), we first observe that for any n ≥ 1, we have the following control: We consider a frequency ν * ∈ [0, 1] (depending on m) where |ξ ym (ν)| is maximum, and have thus to establish that for each > 0, then there exists γ > 0 depending only on such that for each N larger than a certain integer N 0 ( ). We introduce the discrete set whose cardinality is |V p N | = N p . We notice that (5.5) in conjunction with the union bound implies that sup νp∈V p N |ξ ym (ν p )| ≺ 1. We denote by ν * ,p the element of V p N for which |ν * − ν p | is minimum, and notice that |ν * − ν * ,p | ≤ 1 N p . Then, we have the following inequality As sup νp∈V p N |ξ ym (ν p )| ≺ 1, the second term of the right hand side of (5.9) converges exponentially towards 0. In order to evaluate the first term of the r.h.s. of (5.9), we use (5.7), and obtain that We choose p so that p − 1 > 3/2, and use (5.4) to conclude that P |ξ ym (ν * ) − ξ ym (ν * ,p )| > N 2 converges towards 0 exponentially. This establishes (5.6).
In order to complete the proof of Proposition 5.2, we consider an individual entryŝ ij (ν) ofŜ(ν) for i, j ≤ M , and write that Using the estimations (5.6) and (5.7), we get: Combining the eigenvalue localisation result from Corollary 3.3 and the Lipschitz behaviour ofŜ from Proposition 5.2, the following statement holds. Then, ΛŜ and ΛD hold with exponentially high probability.
has the same property. To justify this claim, we remark that Proposition 5.2 implies that for each κ > 0, the probability converges to 0 exponentially fast. As the following inclusion holds, we get that exponentially fast as well. For p large enough, N κ 1 N p MN 3/2 will finally become smaller than /2. This proves that holds with exponentially small probability.
The same argument can be used to control ΛD . This completes the proof of Corollary 5.1.
We deduce immediately from Corollary 5.1 the following result that can be seen as a refinement of (3.28) and of Lemma 3.1.

Corollary 5.2. It holds that
A useful consequence of this is the following Corollary, which states that the Lipschitz result holds forĈ(ν).

Corollary 5.3. It holds that
Proof. For more clarity in the following argument, denote ν 1 = ν and ν 2 = ν +δ. Recall thatD = diagŜ. Using the definition ofĈ from equation (1.3), we write: Moreover, we writê Therefore, applying the operator norm, we get by the triangle inequality: It is easy to check that holds. Therefore, Proposition 5.2 and Corollary 5.2 immediately imply (5.12).
Finally, we can write for the spectrum ofĈ the same kind of result as in Corollary 5.1. Then, ΛĈ holds with exponentially high probability.
Proof. The proof is similar to the proof of Corollary 5.1 and is thus omitted.
We finally use the above results to prove that ν → 1 is MN 3/2 -Lipschitz with overwhelming probability. For this, we establish the following Proposition.

|
We are now in a position to establish the main result of this paper.
We denote by ν * ∈ [0, 1] an element where the supremum is achieved, and consider ν * p the closest element of V p N to ν * , where we recall that V p N is defined by (5.8). Therefore, one can write: (4.52) implies that P ψ N (f, ν * p ) > 1 2 N u N converges exponentially towards 0. It thus remains to study P ψ N (f, ν * ) −ψ N (f, ν * p )) > 1 2 N u N . For this, we of course use (5.2), Corollary 5.3, and write

P. Loubaton and A. Rosuel
If we choose p large enough, MN 3/2 satisfies MN 3/2 N p u N , and 2 N p N u N converges towards 0 exponentially as expected. This completes the proof of (5.16).

Numerical simulations
In this section we examine the impact of the correction quantity r N (ν)φ N (f )v N when α > 2 3 and see how it improves the estimation of the LSS 1 M tr f (Ĉ(ν)). More precisely, we start by examining the behaviour of the LSS and the impact of the correction term under H 0 . We recall that φ N (f ) is the deterministic term defined as the action of f on the compactly supported distribution D N , whose Stieltjes transform is: Motivated by [28], we consider f (λ) = (λ − 1) 2 where it can be verified with a bit of algebra and residue calculus that and φ N (f ) = c N . Take y n generated by the following simple model: where ( n ) n∈Z is an independent sequence of N C (0, I M ) distributed random vectors, and where A is the diagonal matrix defined by A = θ I M for θ ∈ C such that |θ| < 1. Under (6.1), each time series is independent AR(1) processes. In Figure 1 is represented on the left the values of the LSS associated to f (λ) = (λ − 1) 2 for each ν ∈ (0, 1) when (N, B, M ) = (10119, 1600, 800) (so  N (f, ν). We again observe that the majority of the deviation from zero of the LSS is corrected by the O( B N ) 2 terms. Around ν = ±0.1, the corrections' precision seems to have degraded. This can be understood since ν = ±0.1 corresponds to peaks in s m , which leads to greater estimation errors forŝ m at this frequency than for the other ones.
We now check the derived speed of convergence towards zero in Theorem 5.1, and more precisely that the following estimations hold true: and we abbreviate 1 M tr f (Ĉ(ν)) − R f dμ against its improved estimations sup ν∈F N |ψ N (f, ν)| and sup ν∈F N |ψ N (f, ν)|. We see that the oracle corrected statistics ψ(f, ν) is more concentrated around 0, and that its estimated counter- partψ(f, ν) is close to ψ(f, ν) but exhibits more spread due to the additional estimation step ofŝ m (ν).

P. Loubaton and A. Rosuel
We start by proving that the first set of the right hand side of (A.8) holds with exponentially small probability, ie. for any > 0, there exist γ > 0 such that: P F ≤ C B for some nice constant C. Therefore, (A.12) leads immediately to (2.17).

A.3. Proof of Lemma 3.3
Proof. These estimates can be proved in a compact way by using the calculus rules available in the stochastic domination framework introduced in Definition 2.1 and proved in Lemma 2.1. Using Lemma 3.2 and Lemma A.4 (see below): The second inequality is similar to prove:

A.4. Proof of Lemma A.5
Lemma A.5. The set of random variable (

A.5. Proof of Lemma 4.5
We expressΔ asΔ = XΓ * B+1 + ΓX * B+1 + ΓΓ * B+1 . Therefore, we have Using the Schwartz inequality, we obtain that It is well-known that E XX * B+1 k ≤ C for some nice constant depending on k. Therefore, we establish that a property which will imply that E Δ k ≤ C B N k . For this, we put Z = ΓΓ * B+1 . As

A.6. Proof of Lemma 4.6
We denote by η m the term of interest, i.e. η m = E(w m AW * Qe m ). It can be written as The integration by parts formula (2.8)