The spectrogram: A threshold-based inferential tool for extremes of stochastic processes

: We extend the statistical toolbox for inferring the extreme value behavior of stochastic processes by deﬁning the spectrogram. It is based on a pseudo-polar representation of bivariate data and represents a collection of spectral measures that characterize the bivariate extremal dependence structure in terms of a univariate distribution. Based on threshold exceedances in the original event data, estimation of the spectrogram is ﬂexible and eﬃcient. The virtues of the spectrogram for exploratory studies and for parametric inference are highlighted. We propose a variance reduction technique that can be applied to the estimation of summary statistics like the extremogram. Parametric inference based on distance measures between the empirical and the model spectrogram, as for instance pair-wise likelihoods or least squares distances, is developed. Simulated data illustrate gains in parametric estimation eﬃciency compared to a standard estimation approach. An application to precipitation data collected in the C´evennes region in France shows the practical utility of the introduced notions.


Introduction
The statistical analysis of extreme outcomes of stochastic processes plays an important role in many disciplines, including environmental and climatological research, finance and actuarial science.Extreme events in such processes can stretch across space and time and can inflict considerable damage.Extreme value analysis provides a coherent methodology to stochastically model extremes and to quantify extreme risks.
The last decade has seen a boom in the development of workable models and statistical inference for spatial and spatio-temporal extremes in the asymptotically dependent case; see Davison, Padoan and Ribatet [17], Davison and Gholamrezaee [15] who provide an overview of the field of geostatistics of extremes.The commonly used models are based on max-stable processes [19].Max-stability arises naturally as an asymptotic property for maxima and has its counterpart for threshold exceedances in the peaks-over-threshold stability of generalized Pareto processes [26,20,2].Estimation approaches based on threshold exceedances [7, 28, 3, 48, for instance] currently receive increasing attention due to their potential for more efficient and more flexible estimation in comparison to an often used composite likelihood approach with block maxima [34].
In this paper, we exploit peaks-over-threshold stability in the bivariate distributions: in adequately defined pseudo-polar coordinate systems, it corresponds to limiting independence between the angular and the radial component as the radial level increases towards infinity.The spectral measure describes the distribution of angles and characterizes the bivariate extremal dependence.It presents the advantage of being a univariate measure, which simplifies statistical procedures.Throughout the following presentation, we employ the term "space-time" when referring to time, space or space augmented by the time dimension.
Based on the assumption of stationarity, we define the spectrogram as an inferential tool for pairwise behavior with respect to the lag in space-time.It is constituted from the spectral measures for point pairs in space-time and provides a generalized and unified view on dependence characterizations like the extremal coefficient function [42] or the tail correlation function, a variant of the extremogram [14].We propose novel and flexible estimation procedures for such summary functionals and parametric extreme value models.
The remainder of this paper is organized as follows: At the beginning of Section 2, we first present some necessary background on extreme value theory and max-stable processes.Then, it is adapted to our purposes by defining the notions of radial aggregation and spectral measures closely related to well-known summaries and characterizations of extreme value dependence.This approach generalizes the usual norm-based definition of radial aggregations in the literature.The spectrogram and its inferential toolbox are subject of the central Section 3, which features novel theoretical results for a variance reduction technique in empirical estimation and for parametric inference in model-based approaches.We substantiate the utility of the introduced notions: first numerically, by means of a simulation study in Section 4 that shows to what extent our parametric estimation approach can improve estimation efficiency in comparison to a standard approach, then in the context of an application to daily precipitation data over the French Cévennes region in Section 5.The concluding remarks in Section 6 address some remaining issues and perspectives.A recall of useful dependence structures for spatial extremes is deferred to the appendix.All computations in this paper were carried out with the statistical software library R [47].

Domain of attraction
The fundamental limit theorem of extreme value theory states that convergence of a process of rescaled pointwise maxima leads to a max-stable limit process [19]: for n independent copies {X i (s), s ∈ D} of a process X defined on a nonempty measurable set D ⊂ R d , we assume that max i=1,...,n where a n (s) > 0 and b n (s) are adequately chosen normalizing sequences and Z = {Z(s)} is a non-degenerate limit process.Then Z is max-stable, and we say that X is in the domain of attraction of Z.For continuous stochastic processes, the domain of attraction condition (2.1) has recently been related to exceedances over a high threshold.The limiting Generalized Pareto (GP) processes have been studied by Ferreira and de Haan [26] and Dombry and Ribatet [20]; an adaptation to the flexible extremal-t dependence model was developed in [48].
When continuity is assured on a compact domain D, each max-stable process has its Pareto equivalent and vice versa.
A general constructive approach to generate max-stable models goes back to the seminal paper of de Haan [19]: Let {V i } be the points of a Poisson process on (0, ∞) with intensity v −2 dv, and let independently Q i (s) for s ∈ D and i = 1, 2, . . .be independent replicates of a random function Q with EQ + (s) = 1.Then Z(s) = max i=1,2,...
defines a max-stable field on D with unit Fréchet margins.This construction was interpreted by Smith [44] to introduce a class of models sometimes referred to as storm profile processes: a uniform Poisson process )ds 2 = 1 for all s 1 , which describes the deterministic shape of storms.The Gaussian extreme value process is obtained for the Gaussian density f .Random storm shapes Q were proposed by Schlather [41] and define the class of extremal Gaussian processes if Q is a Gaussian random field Q gauss .A more general class are the extremal t processes which arise from Q = c η (Q gauss + ) η with η > 0 and a normalizing constant c η [32].The Brown-Resnick process [6,29] arises from Q = exp( Q − 0.5 var( Q)) with Q defined as a fractional Brownian motion [31].We refer to Appendix A for the bivariate expressions and spectral measures of these models.
Max-stable processes are characterized by finite-dimensional max-stable distributions, also known as multivariate extreme value distributions (MEVD) G. Max-stability means that vector sequences α n > 0 and β n exist satisfying G n (α n z + β n ) = G(z).In particular, univariate marginal distributions G j of a max-stable process are generalized extreme value distributions (GEV): with three parameters: ξ j for shape, ν j for position and σ j > 0 for scale.The case It is convenient to assume univariate marginal distributions given by the unit Fréchet distribution G * j (z) = exp(−1/z) χ (0,∞) (z).The transformation −1/ log G j (Z j ) can be applied to all margins Z j of Z, which yields a multivariate extreme value distribution G * with unit Fréchet marginal distributions G * j .Balkema and Resnick [4] showed that any MEVD G can be represented through an exponent measure µ as G(z) = exp(−µ(−∞, z] C ).For G * , the exponent measure µ * can be defined on the punctured cone E = [0, ∞) \ {0} ⊂ R m and satisfies the following property of homogeneity of order −1: for Borel sets ).We can standardize the margins X j of X to assure convergence of pointwise maxima towards G * .Therefore, we use a probability integral transform to transform X j to a nonnegative variable X * j with standard Pareto tail behavior such that x P (X * j > x) → 1 as x tends to infinity.Then (2.1) is true for X * with a n = n, b n = 0 and limit distribution G * .There is no unique choice for the * -transformation; common choices in the literature are to use either , leading respectively to unit Fréchet or standard Pareto target distributions when F j is continuous.
The convergence and estimation of univariate extreme value behavior has been extensively studied, see the introductory presentation in Coles [11], for instance.After standardizing the marginal distributions with the * -transformation, it is convenient to focus on the dependence structure.The representation through the exponent measure µ * allows us to formulate convergence not in terms of maxima, but in a more general way based on the original data through the concept of multivariate regular variation, see Chapter 6 in Resnick [37].The following vague convergence property is satisfied on E: Vague convergence is equivalent to the convergence µ * n (K) → µ * (K) for all sets K relatively compact in E for which µ * (∂K) = 0.

Pseudo-polar representations and spectral measures
We now introduce the notion of radial aggregation and the resulting spectral measures.A sizeable amount of literature on spectral measures exists; we refer the reader to Section 8.2.3 in [5] for a presentation of the usual approach with the aggregation of a vector defined as a norm of the vector.In the following, this framework is slightly generalized to 1-homogeneous functions, allowing us to apply useful aggregation functions that are not norms.

Radial aggregation
The homogeneity (2.4) allows us to factorize µ * in pseudo-polar coordinate systems with a radial and an angular component.If the radial component is typically chosen as a norm of a vector x ∈ E in the literature, other choices will be meaningful for our purposes.Therefore, we propose the more general concept of an aggregation function, which aggregates a vector x ∈ E into a scalar value.

Definition 2.1 (Aggregation function). A non-null and measurable function
is called aggregation function if it is continuous and verifies the following 1homogeneity condition: rad(tx) = t rad(x), t > 0.
An aggregation function defines the radius r = rad(x) of x ∈ E.
Due to the continuity and the homogeneity of rad, the unit sphere S rad = {x ∈ [0, ∞) | rad(x) = 1} associated to rad is relatively compact in the space E. To introduce a pseudo-polar transformation T , we choose a norm • on R m , called angular norm, and write S = {x ∈ [0, ∞) | x = 1} for the intersection of the first quadrant and the unit sphere with respect to • .We define The mth coordinate of the angle w is determined by the remaining ones and can thus be suppressed.We can now represent the limit measure µ * and the vague convergence (2.5) in pseudo-polar coordinates.In general, difficulties in proving convergence with respect to pseudo-polar coordinates arise from the fact that rad and T are not defined for the values at infinity, whereas the vague convergence in (2.5) is stated with respect to the "compactified" space [0, ∞] punctured at the origin (cf. the proof of Theorem 6.1 in Resnick [37], for instance).The spectral measure ρ rad associated with T and µ * is defined on S by ρ rad (B) = µ * ({x ∈ E | rad(x) ≥ 1, x/ x ∈ B}) for measurable sets B ⊂ S and gives rise to a product representation of µ * : on (0, ∞) × {w ∈ S : rad(w) > 0}, with an invariant radial component showing a standard Pareto tail and the spectral measure ρ rad as the angular component that captures the extremal dependence structure.In the case where ρ rad (S) > 0, the normalized measure ρ rad (•)/ρ rad (S) is called spectral distribution.When there is no risk of confusion, we will use the simpler notation ρ instead of ρ rad .
To simplify the proof of Proposition 2.1, we first provide a useful lemma.
and denote by ∂A r1,r2 its boundary with respect to the space E. Then Proof of Lemma 2.1.First, we remark that µ * (rS rad ) = 0 for 0 < r < ∞.Otherwise, the homogeneity property of µ * would lead to a contradiction concerning the Radon property of µ * .For r 2 < ∞, the set Due to the continuity of rad at 0 and rad(0) = 0 we have 0 ∈ ∂A r1,∞ .From Proposition 6.1 of [37], we conclude that A r1,∞ is relatively compact in E. Since ∂A r1,∞ = r 1 S rad , we observe µ * (∂A r1,∞ ) = 0, which finishes the proof.
Proof of Proposition 2.1.Using the technical properties concerning the region of radial exceedances detailed in Lemma 2.1, the proof is straightforward.The convergences (2.9) and (2.10) are obtained from the convergence (2.5) by using certain relatively compact sets.
With (R, W) = T (X * ), the inferential approaches worked out in the following rely upon the approximation for a high radial threshold r.

Properties of the spectral measure
The choice of angular norm is a choice of convenience since convergence results do not depend on it.In the following, we focus on the angular L 1 -norm x = m j=1 x j on E. In the bivariate case, the first angle component w = w 1 designates an angle (w 1 , 1−w 1 ), so we can identify S with [0, 1].If rad(w) > 0 for all w ∈ S, then µ * is uniquely determined by the spectral measure ρ which is only subject to the moment constraints S w j rad(w) ρ(dw) = 1, j = 1, . . ., m. (2.12) Denote by e j the canonical vector with j-th component 1 and 0 elsewhere.If rad(•) = • rad is a norm such that e j rad = 1 for j = 1, . . ., m, then 1 ≤ ρ(S) ≤ m.With extremal independence, we observe ρ({e j }) = 1; with full extremal dependence, we observe ρ(S) = ρ({1/ 1 }) = 1 rad .We can switch between two spectral measures ρ 1 and ρ 2 associated to aggregation functions rad 1 and rad 2 respectively, given that {w ∈ S : rad 2 (w) > 0} ⊂ {w ∈ S : rad 1 (w) > 0}, via the transformation formula which holds on {w ∈ S : rad 1 (w) > 0}.In Section 3.1, we discuss a variance reduction technique based on (2.13).

Empirical estimation of spectral measures
Given data vectors x i , i = 1, . . ., n, we can define an empirical spectral measure based on the tail approximation (2.11).The set of atoms of the empirical measure is given by the angles that are associated to radii exceeding a high radial threshold r 0 (see Beirlant et al. [5], Section 9.4.1).In practice, we first have to apply a probability integral transform for standardizing x i to x * i .This requires knowledge or estimation of marginal data distributions F j and the choice of a marginal target distribution with support in [0, ∞) and tail asymptotics according to the standard Pareto distribution.We can estimate marginal distributions either nonparametrically via the empirical distribution function [22,23] or via some other adequate marginal distribution, for instance with its tail behavior estimated according to the parametric tail model (2.3) from univariate extreme value theory [21].The commonly used target is the unit Fréchet distribution.Then we fix a high radial threshold r 0 > 0 and define the empirical spectral measure When the number of radial exceedances ) is at least 1, we can directly estimate the empirical spectral distribution ρ p,n by replacing the weight r 0 n −1 in (2.14) by N −1 .In some cases, the enforcement of the moment constraints (2.12) upon the empirical spectral measure may be desirable for reasons of theoretical coherence.Therefore, we can wiggle atom weights by applying either an empirical likelihood procedure [23] or a Euclidean likelihood procedure [18].The choice of r 0 is important, yet rarely straightforward.It should take into account bias and variance according to convergence in the domain of attraction setting.In practice, it is based on exploratory analyses, see also the remarks in Section 3.1 and in the application in Section 5.
The convergence rate in data towards the asymptotic distribution may suggest an aggregation function rad to use for the empirical estimator.Aggregations like the maximum, which retain many of the extreme observations outside the joint bivariate tail, might be advised against in case of slow convergence.On the other hand, aggregations like the minimum use only a limited amount of information about extreme value behavior by focusing on observations falling into the joint tail.Convenient choices are the mean rad(x 1 , x 2 ) = 0.5(x 1 + x 2 ) or one of the marginal components such that rad(x 1 , x 2 ) = x 1 or rad(x 1 , x 2 ) = x 2 , leading to ρ(S) = 1 being invariant to the dependence structure; in such cases, the number of exceedances N carries no information about ρ rad .

Related bivariate dependence measures
Alternative characterizations of bivariate extremal dependence are obtained as transformations of the spectral measure.Since the exponent function , is a commonly used tool to quantify the extremal dependence.It satisfies V (1, 1) ∈ [1, 2] with 1 corresponding to full asymptotic dependence and 2 corresponding to asymptotic independence.It is equal to ρ rad (S) with rad = max.The two following approaches are also often used.

Pickands dependence function
Pickands [35] showed that a bivariate distribution G * is a MEVD if and only if there exists a convex function A(•) such that with . The lower and upper bounds correspond to complete dependence and independence, respectively.The Pickands dependence function A(•) can be related to the spectral measure ρ rad [5]: (2.16)

Extremogram function
When (X 1 , X 2 ) is in a maximum domain of attraction, the (upper) tail coefficient λ, defined as the conditional limit lim u→∞ pr( , is often used to measure the extremal dependence.It is related to the extremal coefficient, V (1, 1) = 2 − λ.This coefficient λ can be defined as µ * ([1, ∞]) and is equal to ρ rad (S) for rad = min.
For stationary stochastic processes, functional dependence summaries based on the extremal coefficient or the tail coefficient can be considered.Given a space-time lag ∆s and the aggregation function rad = max, the extremal coefficient function is ∆s → V (1, 1; ∆s) = ρ max (S; ∆s).For the tail coefficient, we obtain the tail correlation function (see Strokorb, Ballani and Schlather [46,45], Fiebig, Strokorb and Schlather [27] for some theoretical characterizations) ∆s → ρ min (S; ∆s), (2.17) which is a variant of the so-called spatial extremogram [10] originally defined in the time series context [14].The extremogram (2.17) has a simple interpretation as a correlation function.More generally, [14] have coined the term "extremogram" for not necessarily standardized data and expressions like µ(A × B) or µ(A × B)/µ(A) with limit measures µ that are (−α)-homogeneous with some α > 0, where A and B are Borel sets that must be bounded away from 0. In our space-time setting with a univariate random field and marginally standardized data, it is most useful to specify as in the definition (2.17) that we will use in what follows.

The spectrogram
In this section, the extremal dependence structure of a stationarity process in a maximum domain of attraction is considered, leading to invariance with respect to space-time shifts.We apply the angular L 1 -norm such that we can identify Given a space-time lag ∆s and an aggregation function rad, we denote by ρ rad ( • ; ∆s) the bivariate spectral measure of the limit process in (2.1).Examples of isotropic spatial spectrograms are shown in Figure 1 for the mean aggregation rad(x 1 , x 2 ) = 0.5(x 1 +x 2 ).For small distances, densities concentrate close to 0.5 since x 1 and x 2 tend to be close.At distance 0, there is discrete mass 1 in 0.5.If there is asymptotic independence when distances increase to infinity, a very high value arises only in either x 1 or x 2 , whereas the other component remains relatively small.Then, the mass of the spectral measures concentrates closer and closer to 0 or 1.The extremal Gaussian process stands apart since it is not mixing and always exhibits long-range dependence.Moreover, in addition to the density displayed in Fig. 1, discrete mass arises in the spectral measure for the angles 0 and 1: ρ({0}; ∆s) = 0.25 (1−Cor(∆s)), where Cor is the correlation function of the Gaussian field Q in the construction (2.2), cf.Appendix A.

Empirical estimators and variance reduction
In the sequel, we will shortly write ρ instead of ρ rad when there is no confusion.Given a sample of observations of a space-time process, an empirical spectrogram consists of a collection of empirical spectral measures for a selection of lags in space-time obtained according to the estimators presented in Section 2.2.3.Typically, we assume temporal stationarity of marginal distributions, which may require some pretransformation.Using a mixing assumption [28], we can then apply the empirical transform to standardize margins according to the * -transformation.Moreover, we can estimate univariate marginal tail parameters with procedures for dependent samples as for instance the composite likelihood of univariate margins.
Statements on the consistency and asymptotic normality of cumulative empirical spectral measures in the bivariate case have been deduced in the literature mentioned in Section 2.2.3 [21,22,23,18], but they depend on the nature of marginal data transformations and the rate of convergence towards the asymptotic dependence structure.When margins are standardized based on the marginal empirical distribution functions, consistency is obtained when (2.5) holds and both r 0 and r −1 0 n tend to infinity.Strong consistency further necessitates that r −1 0 n/(log log n) tends to infinity [22,Theorem 1].For asymptotic normality, more involved conditions are necessary to control the second-order behavior in convergence (2.5) [22,Theorem 2].
For an assessment of some important aspects of estimation variance, we here propose to work in a simplified setting where marginal distributions are assumed to be standardized and the limiting independence of the angular and the radial component is satisfied at finite levels.In practice, the standardization of margins based on some estimator of marginal distribution functions could induce additional bias and variance.In practical modeling, we must fix r 0 to define radial exceedance observations.When we do not have equality in the tail approximation (2.11) for a finite threshold r 0 , we must consider how bias and variance depend on r 0 .In the following, we look at variance from a more theoretical stance.Providing a general theory for bias terms is more involved since it would need assumptions on the second-order behavior in convergence (2.10).In practice, it is useful to check if estimated dependence structures change in a systematic way when we move r 0 to more or less extreme levels.Denote by (X * i (s 0 ), X * i (s 0 + ∆s)) T (i = 1, . . ., n) the normalized sample for a space-time lag ∆s obtained from the marginal standardization of data.Assume that the corresponding pseudo-polar sample (R i , W i ) with R i = rad(X * i (s 0 ), X * i (s 0 + ∆s)) and W i = X * i (s 0 )/(X * i (s 0 ) + X * i (s 0 + ∆s)) contains 0 ≤ N ≤ n radial exceedances, indexed for N ≥ 1 without loss of generality by 1, . . ., N such that R i ≥ r 0 (i = 1, . . ., N ).We assign the dummy value W i = −1 to the censored observations below the radial threshold with R i < r 0 , i = N + 1, . . ., n.We make two assumptions: (A1) The angles {W i , i = 1, . . ., n} are independent and identically distributed according to a density f (w; ∆s).
(A2) The homogeneity property (2.8) holds exactly above the fixed radial threshold r 0 > 0 such that Hence we can write with f ( • ; ∆s) the density of ρ( • ; ∆s) and r −1 0 ρ(S; ∆s) the exceedance probability for the threshold r 0 .Then N ∼ Bin(n, r −1 0 ρ(S; ∆s)).For convenience of notation, we suppress the index ∆s in the following.
Proof.The properties (3.5) of the standard estimator follow from those of the binomial distribution of N (w).The transformation estimator (3.4) is directly related to the idea of importance sampling: based on the proposal distribution with density f , we estimate the moment ρ ′ (w) = The unbiasedness of ρ′ and the variance formula (3.6) follow from the standard results of the related theory, cf.Section 3.3 of [38].
In comparison to the variance of the standard estimator ρ′ of ρ ′ in (3.5), the variance of ρ′ (B) for a measurable set The latter case allows for variance reduction.For instance, we can estimate the extremogram associated to rad ′ = min by transforming empirical estimators obtained for the mean rad(x 1 , x 2 ) = 0.5(x 1 + x 2 ) or for some other appropriate aggregation.Moreover, the well-known Capéraà-Fougères estimator of Pickands' dependence function A in (2.16) can be represented as a transformation estimator ρ′ .
When the spectral measure ρ is equal to the spectral distribution ρ p , i.e. ρ([0, 1]) = 1, the distribution of N is independent of the dependence structure in ρ.This is the case for the mean aggregation and the marginal exceedance aggregation.It is then convenient to condition on the value of N and work with the density f (w) = f (w) for W i (i = 1, . . ., N ) and replace n −1 r 0 by N −1 in the direct estimator (3.3) and the subsequent calculations.This yields ρ(w) = N −1 N (w) with N (w) ∼ Bin(N, ρ(w)).The properties of the binomial distribution yield Eρ(w) = ρ(w) and Var(ρ(w)) = N −1 (ρ(w) − ρ(w) 2 ).The corresponding variance of the transformation estimator (3.6) becomes Since actually N ∼ Bin(n, r −1 0 ) under the standing assumptions, we get Example 3.2 (Extremogram estimation).Estimation of the extremogram defined in (2.17) requires estimates of ρ min (S).The estimator given by the standard empirical spectral measure ρmin has variance n −1 r 0 ρ min (S)(1 − r −1 0 ρ min (S)).Similar to replacing nr −1 0 by the binomial variable N when ρ(S) = 1, we can improve this estimator by replacing nr −1 0 by the binomial variable N marg , the number of marginal exceedances above r 0 such that N marg = n i=1 χ (r0,∞) (X * (s 0 )) and EN marg = nr −1 0 .The resulting new estimator, written ρ(S), corresponds to a slightly adapted version of the empirical extremogram as proposed by Davis and Mikosch [14], Section 3.3.Conditional on N marg , its variance under the standing assumptions is Var(ρ(w)) = N −1 marg ρ min (S)(1 − ρ min (S)).Using the transformation approach, we can define an interesting alternative based on rad(x 1 , x 2 ) = 0.5(x 1 + x 2 ) and rad ′ (x 1 , x 2 ) = min(x 1 , x 2 ).Then, conditional on N , the variance of the transformation estimator is (3.8) We remark that 2 min(v, 1 − v) < 1 almost everywhere, such that the integral in (3.8) is smaller than ρ min when ρ min ≡ 0 has a continuous density.

Parametric minimum composite distance estimation
Current standard methods for inferring models in geostatistics of extremes focus on the information from the observed point pairs.In many cases, either analytical expressions of the likelihood cannot be calculated for higher dimensions or the application of multivariate estimation methods is impeded by the curse of dimensionality.We propose to perform minimum distance estimation of parameters by minimizing a distance measure between the empirical spectrogram and a family of theoretical spectrograms according to a parametric dependence structure of a max-stable process.Such distance measures can also serve for model selection and goodness-of-fit procedures, see the application in Section 5 for an example.

Definition 3.3 (Composite distance)
. For ∆ k s, k = 1, . . ., k 0 , a collection of observed lags with n k ≥ 1 observed data pairs respectively, we use some distance metric dist(ρ k,n k , ρ k,θ ) between the empirical spectral measure and a and parametric family of spectral measures to define a composite distance The vector of unknown parameters θ 0 characterizes the true asymptotic extreme value dependence structure in the data.Appendix A recalls some common extreme value dependence models.Minimizing D(θ) yields the parameter estimate θ.Given a fixed radial threshold r 0,k leading to 0 ≤ N k ≤ n k radial exceedances among the n k data pairs, we calculate an empirical spectral measure from the observed angles W k,ℓ with possibly inhomogeneous weights ω k,ℓ .Inhomogeneous weights could be used to improve estimation efficiency and to reduce the computational burden of minimizing D(θ), for instance, by giving weight 0 to distant site pairs ("tapering", see Sang and Genton [39]).The exceedance probability p k,θ = r −1 0,k ρ k,θ (S) is associated to the threshold r 0,k .

Examples of composite distances
can be interpreted as the result of applying the Kullback-Leibler divergence for dist.When parameters to estimate are identifiable from pairwise information, pairwike likelihood estimators have asymptotic properties similar to those for the full likelihood under the usual regularity assumptions.An overview of asymptotic consistency and normality, tests and information criteria in the pairwise likelihood approach is given in [49] and the references therein.In Section 3.2.2, the expression of asymptotic variance and its estimation will be clarified.
The pairwise likelihood approach has been used for the special case of mean aggregation and the Brown-Resnick model by Engelke et al. [24].
In geostatistics of extremes, relatively simple yet robust estimation approaches based on an L 2 -distance between a theoretical and an empirical version of summary statistics like the extremogram (or equivalently, the extremal coefficient function) have been applied and can be interpreted as a composite distance in our sense.Since such summary estimators of dependence are statistics that are not sufficient, these approaches use only partial information of the data.Instead, we here propose an L 2 -distance between the cumulative measures ρ k,n k and ρ k,θ using the full information from observed angles.
Example 3.4 (Pairwise L 2 -distance).Assume that ρ k,θ (S) = 1 is invariant with respect to the dependence structure and that for ρ k,n k (•).We apply the statistic Whereas the likelihood-based distance (3.9) is related to the density representation of the spectrogram, the focus of the L 2 -distance (3.10) is on the cumulative spectrogram.

Asymptotic properties of composite distance estimators
For the asymptotic variance and consistency in the general context of so-called extremum estimators, here obtained by minimizing the composite distance, we refer to Chapter 4 in Amemiya [1].The following proposition states sufficient conditions for weak consistency of the minimum composite distance estimator.We denote n the sample size, for instance given as the number of observed temporal replications on a fixed configuration of spatial sites, and D n (θ) the composite distance for a sample of size n.Proposition 3.2 (Weak consistency).Let Θ be the compact parameter space, and assume that the asymptotic dependence structure of the data distribution is characterized by the true parameter vector θ 0 ∈ Θ. Assume that there exists a sequence d(n) such that n −1 d(n)D n (θ) is continuous in θ ∈ Θ and converges in probability to a non-stochastic limit D ∞ (θ), uniformly for θ ∈ Θ.If D ∞ (θ 0 ) < D ∞ (θ) for all θ different from the actual value θ 0 , then any estimator θn satisfying D n ( θn ) = min θ∈Θ D n (θ) is weakly consistent, i.e. converges to θ 0 in probability.
Proof.The assumptions made in Proposition 3.2 allow us to directly apply Theorem 4.1.1 of Amemiya [1].
Example 3.5 (Weak consistency of the L 2 estimator).Assume that we have n independent and identically distributed temporal replications of spatial observation vectors x i , i = 1, . . ., n, for a fixed spatial design with m sites s 1 , . . ., s m .We can fix a radial aggregation and use a standard empirical estimator of the pairwise spectral measures, indexed by k = 1, . . ., m(m − 1)/2 and defined as in (3.3).The pairwise estimators (3.3) are weakly consistent if the domain of attraction condition (2.5) holds and If ρ k,θ (S) = 0 and ρ k,θ (w) = ρ k,θ0 (w) for at least one k and values w in a neighborhood (w 0 − ε, w 0 + ε) of w 0 ∈ (0, 1) with ε > 0, then Proposition 3.2 establishes the weak consistency of the estimator θn .
Moreover, under regularity conditions necessary for asymptotic normality (which are not made explicit here due to their quite technical nature) the asymptotic variance is given by the information sandwich The information matrix I θ0 can be estimated by the numerically observed Hessian.The matrix J θ0 = EU(θ 0 )U(θ 0 ) T captures the (co)variance structure of the score vector U = (∂D(θ)/∂θ j ) j .Since U( θ) = 0 for the estimation θ of the true parameter θ 0 , estimating J θ0 is more intricate.Based on the assumption of temporal mixing, we can compute an estimate of the variance J θ0 or directly of M θ0 with subsampling techniques like the jackknife or the bootstrap (see Shao and Tu [43], Davison and Hinkley [16], Carlstein et al. [9]).

Simulation results for parametric estimation
We illustrate numerically the estimation efficiency of the parametric estimation approaches from Section 3.2 with respect to the number of exceedances when the excursion stability (3.1) is satisfied above the fixed radial threshold r 0 .The radial mean aggregation rad(x 1 , x 2 ) = 0.5(x 1 + x 2 ) is applied.We focus on the Gaussian extreme value process as an easily interpretable benchmark process, for which exact simulation of an excursion-stable version is possible (cf.Ferreira and de Haan [26]): for the sample region K = [−3.5, a + 3.5] 2 with a > 0, we simulate realizations from the field (a + 2 × 3.5) 2 Y f ((•) − S) with standard Pareto distributed Y , storm centers S ∼ Unif(K) and the isotropic identity covariance matrix Σ = diag(1, 1) of the Gaussian density f .Neglecting storm centers outside K leads to edge effects which we reduce by using sample points s j (j = 1, . . ., 25) from a uniform distribution on [0, a] 2 for estimation.For the sake of simpler exposition, we investigate only the case of simulating an isotropic model, and we study estimates σ11 of σ 11 = 1 and σ12 of σ 12 = 0. Results for σ22 are not reported since they are similar to those for σ11 .Unreported results show that estimator performances in the case of geometric anisotropy allow us to draw the same conclusions on comparative estimator performance.We choose a ∈ {1, 3, 5} corresponding to strong, middle and weak spatial dependence and nr −1 0 ∈ {10, 20, 40, 200} expected radial exceedances for bivariate spectral measures.We use the estimator ρ(w) = N −1 N j=1 χ [0,w] (W j ) with ρ(S) = 1 for estimating the empirical spectral measures and then apply the composite log-likelihood distance (CL) from (3.9) and the composite L 2distance (L2) from (3.10).We further study the behavior of jackknife estimates of the standard errors based on 20 blocks each consisting of n/20 realizations of the process, i.e. s.e.jackknife = 19/20 × 20 i=1 (σ −i − σ) 2 with σ = σ 11 or σ = σ 12 and σ−i the estimator obtained from removing block i.
For comparative purposes, we also report estimates from a standard estimation procedure consisting of the (composite) bivariate partially censored loglikelihood (CPCL): for a standard-scale threshold u and bivariate standard-scale data (x 1 , x 2 ), we use the bivariate likelihood contribution calculated from the see, for instance, [30] and the references therein and [15].The latter approach focuses on the behavior of exceedances with respect to a constant thresholding field u and uses only partial information for simple exceedances outside the joint tail.We set u = r 0 .The results in Table 1 for σ 11 and in Table 2 for σ 12 show a slight tendency of CL to overestimate and of L2 to underestimate spatial dependence.The estimators appear robust over the range of chosen parameters.Bias and variance tend to 0 when the number of exceedances increases (nr −1 0 : 10 → 200) or when spatial mixing improves (a : 1 → 5).The jackknife estimates of the standard error behave similarly for both of the approaches CL and L2.In general, they are close to the actually observed estimation error and appear reliable.Throughout, estimation via CL or L2 tends to be (slightly) more efficient than via CPCL, particularly for moderate spatial dependence with a = 3 and a = 5, where fewer bivariate observations fall into the joint tail region.

Application to precipitation in the Cévennes region
We analyze spatial dependence among extremes in daily precipitation data for 21 gauge stations in the French Cévennes region with observations from 1959  to 2003 (data are available from Météo France).The elevation map on the left in Fig. 2 shows the spatial setup of stations and the surrounding region which is close to the Mediterranean sea.The Cévennes region is known for violent thunderstorms during autumn which often lead to inundations.We limit modeling to the spatial extremal dependence structure between the intra-day cumulated precipitation amounts at each site for the period from September 1 to January 31, leaving marginal behavior and temporal clustering out of consideration and assuming stationary seasonal behavior.Removing days without precipitation at any of the stations and days with missing data records, we keep n = 4193 days.On average, 45.3% of the sites are affected by precipi- tation.In order to take into account the presence or absence of precipitation, we modify the standard marginal empirical probability transform to the unit Fréchet distribution, leading to atomic mass at 0. Denote by f0 the observed proportion of 0-values at site s j and write n =0 = (1 − f0 )n for the number of nonzero observations.We transform a nonzero observation X i (s j ) (i = 1, . . ., n and j = 1, . . ., 21) with rank ran i among all n =0 nonzero observations at s j to The factor (1 − f0 ) −1 stretches the transformed nonzero data distribution such that x pr(X * (s j ) > x) → 1 as x tends to infinity.In X * i (s j ), we keep the value 0 for the absence of precipitation.On the right in Fig. 2, the extremogram values with respect to temporal lags for the time series of each of the stations indicate little temporal extremal dependence.Estimation was achieved by the transformation estimator ρmin with rad ′ = min and the mean aggregation rad(x 1 , x 2 ) = 0.5(x 1 + x 2 ), as proposed in Example 3.2.
We apply a check for the homogeneity property (2.11), based on the tail behavior of the maximum component of a multivariate random vector, which has been proposed by [25] and [2].If the domain of attraction condition (2.1) is satisfied, we observe x pr(max j=1,...,21 X * (s j ) > x) ≈ V s1,...,s21 (1) for large enough x, where V s1,...,s21 (1) is the extremal coefficient for all 21 sites.We therefore have to check if x pr(max j=1,...,21 (X * (s j ) > x) is approximately constant for large x.We find that the homogeneity assumption is realistic for our data only at very extreme levels.For illustration, the display on the left of Fig. 3 shows estimates n ) with respect to the expected number of marginal exceedances nr −1 0 .As r −1 0 tends to 0 in Fig. 3, the estimates seem to stabilize only very close to 0. The p-values reported on the middle display in Fig. 3 for the test procedure in [2] confirm this finding.
We now focus on the pairwise mean aggregation with rad(x 1 , x 2 ) = 0.5(x 1 + x 2 ).Based on the preceding findings, we fix the radial threshold r 0 corresponding to nr −1 0 = 20 expected radial exceedances.The right-hand display in Fig.
3 shows the empirical extremogram, again estimated by the transformation estimator from Example 3.2.The display further presents the orientation of the spatial lag for each of the station pairs, suggesting an anisotropic dependence structure with vertically oriented lags taking higher values.Therefore we replace distances (∆s) T (∆s) by (∆s) T M(∆s), where M is a symmetric matrix with diagonal elements 1, m 22 and non-diagonal element m 12 .We apply the minimum composite distance estimators to fit three parametric dependence structures, notably the Brown-Resnick model with power semi-variogram ("BR"; for the composite likelihood distance (3.9) and the L 2 -distance(3.10))and the extremal-t model with correlation of the stable or the Cauchy type ("evt"; for the composite likelihood (3.9) and the L 2 -distance); see Appendix A for the description of correlation models.Since 0-observations arise with probability 0 in the Brown-Resnick model, we transform them to [log(n+ 1)] −1 > 0 for the composite likelihood estimation of this model via (3.9); i.e., they are assigned rank 1.
As in the simulation study, we use the estimator ρ(w) = N −1 N j=1 1(W j ≤ w) with ρ(S) = 1 for estimating empirical spectral measures.Estimates of the parameter vector θ (see Appendix A for details on the parametrization) are summarized in Table 3.The reported jackknife estimates of the standard errors are based on a reorganisation of the data into 41 blocks that we consider independent.Finally, we assess the goodness-of-fit by measuring the mean error between the empirical extremogram ρmin (S; ∆s) obtained from the mean aggregation empirical spectrogram and the fitted extremogram ρ min, θ(S; ∆s).Therefore, define scribed in Section 3.2.They quantify the mean absolute error (i = 1) or the mean squared error (i = 2) and further allow for a cross-validation procedure to check the predictive quality of a model: for each site, we remove the corresponding observations from the data set to create the training set consisting of 20 sites, we fit the model to the training set and we use the fitted model to predict the sum terms in (5.1) associated to distances ∆s between the removed site and the 20 other sites.Finally, we sum up all these terms to obtain the corresponding cross-validation error errcv i .The resulting error statistics reported in Table 3 are unanimously in favor of the extremal-t model with the stable correlation, estimated by minimizing the L 2 -distance; see Figure 4 for a visual presentation of the fitted model.Fitted discrete masses ρ mean ({0}; ∆s) = ρ mean ({1}; ∆s) are small, which adequately reflects the situation for the data where less than 0.7% of angles take one of the values 0 or 1 corresponding to extreme rainfall in one site and no rainfall in the other site.

Conclusion and perspectives
Representing multivariate limit distributions for threshold exceedances with the spectral measure opens a versatile toolbox for statistical analysis with both formal and visual tools.Further investigation could be dedicated to how we best include temporal dependence, which plays a more prominent role when switching from block maxima to exceedances in the original process.Some max-stable space-time process models are introduced in [13].Moreover, current research efforts are directed towards estimation methods capable to exploit more than only bivariate information.

Fig 2 .
Fig 2. The geographic setup of Cévennes gauge stations and temporal extreme value dependence in the Cévennes precipitaton data.Left: Elevation map with Cévennes stations; right: boxplots of extremogram values for the time series of the 21 stations with respect to temporal lag and nr −1 0 = 100 expected marginal exceedances.

Fig 3 .
Fig 3. Exploratory plots for modeling spatial extremes in the Cévennes precipitation data.Left: estimated extremal coefficient for the 21 stations with respect to number of expected marginal exceedances; middle: p-values of a homogeneity test with respect to number of expected marginal exceedances; right: the extremogram, transformed from the mean aggregation spectrogram with nr −1 0 = 20, with pins indicating the orientation of the space lag.

Table 1
Mean and standard errors of composite distance estimators (CL -likelihood distance (3.9), L2 -L 2 -distance (3.10), CPCL -composite partially censored likelihood (4.1)) for σ 11 = 1 in the Gaussian extreme value model, based on 500 simulations; for CL and L 2 with mean and standard deviation of the jackknife estimate of the standard estimation errors calculated with 20 blocks

Table 2
Mean and standard error of composite distance estimators (CL -likelihood distance (3.9), L2 -L 2 -distance (3.10), CPCL -composite partially censored likelihood (4.1)) for σ 12 = 0 in the Gaussian extreme value model, based on 500 simulations; for CL and L2 with mean and standard deviation of the jackknife estimate of standard estimation errors calculated with 20 blocks

Table 3
Comparison of estimated extremal dependence models for the Cévennes precipitation data.BR -Brown-Resnick model with power variogram; evt -extremal-t model with either the stable or the Cauchy correlation model; CL -likelihood distance (3.9); L2 -L 2 -distance (3.10); standard error in parentheses (estimated by the jackknife technique); within-sample and cross-validation mean absolute errors (err 1 and errcv 1 ) and mean squared errors (err 2 and errcv 2 ) for the extremogram