Order-sensitivity and equivariance of scoring functions

: The relative performance of competing point forecasts is usually measured in terms of loss or scoring functions. It is widely accepted that these scoring function should be strictly consistent in the sense that the expected score is minimized by the correctly speciﬁed forecast for a certain statistical functional such as the mean, median, or a certain risk measure. Thus, strict consistency opens the way to meaningful forecast comparison, but is also important in regression and M-estimation. Usually strictly consistent scoring functions for an elicitable functional are not unique. To give guidance on the choice of a scoring function, this paper introduces two additional quality criteria. Order-sensitivity opens the pos- sibility to compare two deliberately misspeciﬁed forecasts given that the forecasts are ordered in a certain sense. On the other hand, equivariant scoring functions obey similar equivariance properties as the functional at hand – such as translation invariance or positive homogeneity. In our study, we consider scoring functions for popular functionals, putting special em-phasis on vector-valued functionals, e.g. the pair (mean, variance) or (Value at Risk, Expected Shortfall).


Introduction
From the cradle to the grave, human life is full of decisions. Due to the inherent nature of time, decisions have to be made today, but at the same time, they are supposed to account for unknown and uncertain future events. However, since these future events cannot be known today, the best thing to do is to base the decisions on predictions for these unknown and uncertain events. The call for and the usage of predictions for future events is literally ubiquitous and even dates back to ancient times. In those days, dreams, divination, and revelation were considered as respected sources for forecasts, with the most prominent example being the Delphic Oracle which was not only consulted for decisions of private life, but also for strategic political decisions concerning peace and war. With the development of natural sciences, mathematics, and in particular statistics and probability theory, the ancient metaphysical art of making qualitative forecasts turned into a sophisticated discipline of science adopting a quantitative perspective. Subfields such as meteorology, mathematical finance, or even futurology evolved.
Acknowledging that forecasts are inherently uncertain, two main questions arise: (i) How good is a forecast in absolute terms? (ii) How good is a forecast in relative terms?
While question (i) deals with forecast validation, this paper focuses on some aspects of question (ii) which is concerned with forecast selection, forecast comparison, or forecast ranking. Specifically, we present results on order-sensitivity and equivariance of consistent scoring functions for elicitable functionals. These results may provide guidance for choosing a specific scoring function for forecast comparison within the large class of all consistent scoring functions for an elicitable functional of interest.
We adopt the general decision-theoretic framework following Gneiting (2011); cf. Savage (1971); Osband (1985); Lambert, Pennock and Shoham (2008). For some number n ≥ 1, one has to be negatively oriented, that is, if a forecaster reports the quantity x ∈ A and y ∈ O materializes, she is assigned the penalty S(x, y) ∈ R.
The observations y t can be real-valued (GDP growth for one year, maximal temperature of one day), vector-valued (wind-speed, weight and height of persons), functional-valued (path of the exchange rate Euro-Swiss franc over one day), or also set-valued (area of rain on a given day, area affected by a flood).
In this article, we focus on point forecasts that may be vector-valued, which is why we assume A ⊆ R k for some k ≥ 1 and we equip the Borel set A with the Borel σ-algebra. One is typically interested in a certain statistical property of the underlying (conditional) distribution F t of Y t . We assume that this property can be expressed in terms of a functional T : F → A such as the mean, a certain quantile, or a risk measure. Examples of vector-valued functionals are the covariance matrix of a multivariate observation or a vector of quantiles at different levels. Common examples for scoring functions are the absolute loss S(x, y) = |x − y|, the squared loss S(x, y) = (x − y) 2 (for A = O = R), or the absolute percentage loss S(x, y) = |(x − y)/y| (for A = O = (0, ∞)). Forecast comparison is done in terms of realized scores (1.1) That is, a forecaster is deemed to be the better the lower her realized score is. However, there is the following caveat: The forecast ranking in terms of realized scores not only depends on the forecasts and the realizations (as it should definitely be the case), but also on the choice of the scoring function. In order to avoid impure possibilities of manipulating the forecast ranking ex post with the data at hand, it is necessary to specify a certain scoring function before the inspection of the data. A fortiori, for the sake of transparency and in order to encourage truthful forecasts, one ought to disclose the choice of the scoring function to the competing forecasters ex ante. But still, the optimal choice of the scoring function remains an open problem. One can think of two situations: (i) A decision-maker might be aware of their actual economic costs of utilizing misspecified forecasts. In this case, the scoring function should reflect these economic costs. (ii) The actual economic costs might be unclear and the scoring function might be just a tool for forecast ranking. However, the directive is given in terms of the functional T : F → A one is interested in.
For situation (i) described above, one should use the readily economically interpretable cost or scoring function. Therefore, the only concern is situation (ii). In this paper, we consider predictions in a one-period setting, thus, dropping the index t. This is justified by our objectives to understand the properties of scoring functions S which do not change over time and is common in the literature (Murphy and Daan, 1985;Diebold and Mariano, 1995;Lambert, Pennock and Shoham, 2008;Gneiting, 2011).
Assuming the forecasters are homines oeconomici and adopting the rationale of expected utility maximization, given a concrete scoring function S, the most sensible action consists in minimizing the expected score E F S(x, Y ) with respect to the forecast x, where Y follows the distribution F , thus issuing the Bayes act arg min x∈A E F S(x, Y ). Hence, a scoring function should be incentive compatible in that it encourages truthful and honest forecasts. In line with Murphy and Daan (1985) and Gneiting (2011), we make the following definition. Clearly, elicitability and consistent scoring functions are naturally linked also to estimation problems, in particular, M-estimation (Huber, 1964;Huber and Ronchetti, 2009) and regression with prominent examples being ordinary least squares, quantile, or expectile regression (Koenker, 2005;Newey and Powell, 1987).
The necessity of utilizing strictly consistent scoring functions for meaningful forecast comparison is impressively demonstrated in terms of a simulation study in Gneiting (2011). However, for a given functional T : F → A, there is typically a whole class of strictly consistent scoring functions for it, such as all Bregman functions in case of the mean (Savage, 1971); further examples are given below. Patton (2017) shows that the forecast ranking based on (1.1) may depend on the choice of the strictly consistent scoring function for T in finite samples, and even at the population level if we compare two imperfect forecasts with each other.
Therefore, we naturally have a threefold elicitation problem: (i) Is T elicitable? (ii) What is the class of strictly F-consistent scoring functions for T ? (iii) What are distinguished strictly F-consistent scoring functions for T ?
Even though the denomination and the synopsis of the described problems under the term 'elicitation problem' are novel, there is a rich strand of literature in mathematical statistics and economics concerned with the threefold elicitation problem. Foremost, one should mention the pioneering work of Osband (1985), establishing a necessary condition for elicitability in terms of convex level sets of the functional, and a necessary representation of strictly consistent scoring functions, known as Osband's principle (Gneiting, 2011). Whereas the necessity of convex level sets holds in broad generality, Lambert (2013) could specify sufficient conditions for elicitability for functionals taking values in a finite set, and Steinwart et al. (2014) showed sufficiency of convex level sets for real-valued functionals satisfying certain regularity conditions. Moments, ratios of moments, quantiles, and expectiles are in general elicitable, whereas other important functionals such as variance, Expected Shortfall or the mode functional are not (Savage, 1971;Osband, 1985;Weber, 2006;Gneiting, 2011;Heinrich, 2014). Concerning subproblem (ii) of the elicitation problem, Savage (1971), Reichelstein and Osband (1984), Saerens (2000), and Banerjee, Guo and Wang (2005) gave characterizations for strictly consistent scoring functions for the mean functional of a one-dimensional random variable in terms of Bregman functions. Strictly consistent scoring functions for quantiles have been characterized by Thomson (1979) and Saerens (2000). Gneiting (2011) provides a characterization of the class of strictly consistent scoring functions for expectiles. The case of vector-valued functionals apart from means of random vectors has been treated substantially less than the one-dimensional case (Osband, 1985;Banerjee, Guo and Wang, 2005;Lambert, Pennock and Shoham, 2008;Frongillo and Kash, 2015a,b;Fissler and Ziegel, 2016).
The strict consistency of S only justifies a comparison of two competing forecasts if one of them reports the true functional value. If both of them are misspecified, it is per se not possible to draw a conclusion which forecast is 'closer' to the true functional value by comparing the realized scores. To this end, some notions of order-sensitivity are desirable. According to Lambert (2013) we say that a scoring function S is F-order-sensitive for a one-dimensional functional T : F → A ⊆ R if for any F ∈ F and any x, z ∈ A such that either This means, if a forecast lies between the true functional value and some other forecast, then issuing the forecast in-between should yield a smaller expected score than issuing the forecast further away. In particular, order-sensitivity implies consistency. Vice versa, under weak regularity conditions on the functional, strict consistency also implies order-sensitivity if the functional is real-valued; see Nau (1985, Proposition 3), Lambert (2013, Proposition 2), Bellini and Bignozzi (2015, Proposition 3.4).
This article is dedicated to a thorough investigation of order-sensitive scoring functions for vector-valued functionals, thus contributing to a discussion of subproblem (iii) of the elicitation problem. Furthermore, we investigate to which extent invariance or equivariance properties of elicitable functionals are reflected in their respective consistent scoring functions. Lambert, Pennock and Shoham (2008) introduced a notion of componentwise order-sensitivity for the case of A ⊆ R k . Friedman (1983) and Nau (1985) considered similar questions in the setting of probabilistic forecasts, coining the term of effectiveness of scoring rules which can be described as order-sensitivity in terms of a metric. In Section 3, we consider three notions of order-sensitivity in the higher-dimensional setting: metrical order-sensitivity, componentwise ordersensitivity, and order-sensitivity on line segments. We discuss their connections (Lemma 3.5) and give conditions when such scoring functions exist (Lemma B.2, Propositions 3.7, 3.8, Corollary 3.16) and of what form they are for the most relevant functionals, such as vectors of quantiles (Propositions 3.11, 3.12, Example 3.14), expectiles (Proposition 3.15), ratios of expectations (Propositions 3.6, 3.9, 3.10, 3.17), the pair of mean and variance (Proposition 3.18, Example 3.19), and the pair consisting of Value at Risk and Expected Shortfall (Proposition 3.20, Example 3.21), two important risk measures in banking and insurance.
Complementing our results on order-sensitivity, in Section 2, we consider the analytic properties of the expected score x →S(x, F ), x ∈ A ⊆ R k , for some scoring function S and some distribution F ∈ F. The (strict) consistency of S for some functional T is equivalent the expected score having a (unique) global minimum at x = T (F ). Order-sensitivity ensures monotonicity properties of the expected score. As a technical result, we show that under weak regularity assumptions on T , the expected score of a strictly consistent scoring function has a unique local minimum -which, of course, coincides with the global minimum at x = T (F ) (Proposition 2.6). Accompanied with a result on self-calibration (Proposition 2.8), a continuity property of the inverse of the expected score, which ensures that the minimum of the expected score is well-separated in the sense of van der Vaart (1998), these two findings may be of interest on their own right in the context of M-estimation (Theorem 2.9).
In Section 4, we consider functionals having an invariance or equivariance property such as translation invariance or homogeneity. It is a natural question whether a functional T that is, for example, translation equivariant has a consistent scoring function that respects this property in the sense that if we evaluate forecast performance of translated predictions and observations, the ranking of predictive performance remains the same as that of the original data. In parametric estimation problems, such a scoring function may allow to translate the data without affecting the estimated parameter values. For one-dimensional functionals, invariance of the scoring function often determines it uniquely up to equivalence while this is not necessarily the case for higher-dimensional func-tionals (Proposition 4.7 and Corollary 4.12).
In Appendix A, we gather a list of common assumptions, which were originally introduced in Fissler and Ziegel (2016). Appendix B consists of technical results, while all proofs are of the main part of this paper are deferred to Appendix C.

Monotonicity
It is appealing that one does not have to specify a topology on F to define mixture-continuity because it suffices to work with the induced Euclidean topology on [0, 1] and on A ⊆ R k .
It turns out that mixture-continuity of a functional is strong enough to imply order-sensitivity in the case of one-dimensional functionals (see Nau (1985, Proposition 3), Lambert (2013, Proposition 2), Bellini and Bignozzi (2015, Proposition 3.4)), and desirable monotonicity properties of the expected scores also in higher dimensions (Propositions 2.4 and 2.6). At the same time, numerous functionals of applied relevance are mixture-continuous, and we start by giving examples and a sufficient condition (Proposition 2.2).
It is straight forward to see that the ratio of expectations is mixture-continuous. Moreover, by the implicit function theorem, one can verify the mixturecontinuity of quantiles and expectiles directly under appropriate regularity conditions (e.g., in the case of quantiles, all distributions in F should be C 1 with non-vanishing derivatives). Generalizing Bellini and Bignozzi (2015, Proposition 3.4c), we give a sufficient criterion for mixture-continuity in the next proposition. Our version is not restricted to distributions with compact support (however, the image of the functional must be bounded), and we formulate the result for k-dimensional functionals.
Similarly to the original proof of Bellini and Bignozzi (2015), a sufficient criterion for the continuity ofS(·, F ) for any F ∈ F is that for all y ∈ O, the score S(x, y) is quasi-convex and continuous in x. 2 Recall that, under appropriate regularity conditions on F, the asymmetric piecewise linear loss S α (x, y) = (1{y ≤ x} − α)(x − y) and the asymmetric piecewise quadratic loss S τ (x, y) = |1{y ≤ x} − τ |(x − y) 2 are strictly consistent scoring functions for the α-quantile and the τ -expectile, respectively, and both, S α as well as S τ , are continuous in their first argument and convex. Hence, Proposition 2.2 yields that both quantiles and expectiles are mixture-continuous. Steinwart et al. (2014) used Osband's principle (Osband, 1985) and the assumption of continuity of T with respect to the total variation distance to show order-sensitivity. Bellini and Bignozzi (2015) showed that the weak continuity of a functional T implies its mixture-continuity. Consequently, one can also derive the order-sensitivity in the framework of Steinwart et al. (2014) directly using only mixture-continuity. Lambert (2013) showed that it is a harder requirement to have order-sensitivity if T (F) is discrete. Then both approaches, invoking Osband's principle or using mixture-continuity, do not work because the interior of the image of T is empty. Moreover, mixture-continuity implies that the functional is constant (such that only trivial cases can be considered). Furthermore, it is proven in Lambert (2013) that for a functional T with a discrete image, all strictly consistent scoring functions are order-sensitive if and only if there is one order-sensitive scoring function for T .In particular, there are functionals admitting strictly consistent scoring functions that are not order-sensitive, one such example being the mode functional. 3 Let us turn attention to vector-valued functionals now. To understand the monotonicity properties of the expected score of a mixture-continuous elicitable functional T : F → A ⊆ R k , it is useful to consider paths γ : [0, 1] → A ⊆ R k , γ(λ) = T (λF + (1 − λ)G) for F, G ∈ F. If T is elicitable, a classical result asserts that T necessarily has convex level sets (Gneiting, 2011, Theorem 6). This implies that the level sets of γ can only be closed intervals including the case of singletons and the empty set. This rules out loops and some other possible pathologies of γ. Furthermore, under the assumption that T is identifiable as defined below, one can even show that the path γ is either injective or constant; see Lemma B.1.

Definition 2.3 (Identifiability). Let
In line with Gneiting (2011, Section 2.4), one can often obtain an identification function as the gradient of a sufficiently smooth scoring function. However, the converse intuition is not so clear -at least in the higher dimensional setting k > 1: Not all strict identification functions can be integrated to a strictly consistent scoring function. They have to satisfy the usual integrability conditions (Königsberger, 2004, p. 185); see also Fissler and Ziegel (2016, Corollary 3.3) and the discussion thereafter.
Proposition 2.4. Let F be convex and T : F → A ⊆ R k be mixture-continuous and surjective. Let S :

Remark 2.5. (i) Proposition 2.4 remains valid if S is only F-consistent.
Then, we merely have that the function [0, 1] λ →S(γ(λ), F ) is decreasing, so the last inequality in Proposition 2.4 is not necessarily strict.
(ii) If one assumes in Proposition 2.4 that T is also identifiable, one can use the injectivity of γ implied by Lemma B.1 to see that the function [0, 1] λ →S(γ(λ), F ) is strictly decreasing.
Under certain (weak) regularity conditions, the expected scores of a strictly consistent scoring function has no other local minimum apart from the global one at x = T (F ).
Proposition 2.6. Let F be convex and T : F → A ⊆ R k be mixture-continuous and surjective. If S : A × O → R is strictly F-consistent for T , then for all F ∈ F the expected scoreS(·, F ): A → R has only one local minimum which is at x = T (F ).

Self-calibration
With Proposition 2.4 it is possible to prove that, under mild regularity conditions, strictly consistent scoring functions are self-calibrated which turns out to be useful in the context of M-estimation.

Definition 2.7 (Self-calibration). A scoring function
The notion of self-calibration was introduced by Steinwart (2007) in the context of machine learning. In a preprint version of Steinwart et al. (2014), 5 the authors translate this concept to the setting of scoring functions as follows (using our notation): "For self-calibrated S, every δ-approximate minimizer ofS(·, F ), approximates the desired property T (F ) with precision not worse than ε. [. . . ] In some sense order sensitivity is a global and qualitative notion while self-calibration is a local and quantitative notion." In line with this quotation, self-calibration can be considered as the continuity of the inverse of the expected scoreS(·, F ) at the global minimum x = T (F ) -and as such, it is a local property of the inverse. This property ensures that convergence of the expected score to its global minimum implies convergence of the forecast to the true functional value. On the other hand, self-calibration of a scoring function S is equivalent to the fact that the argmin T (F ) of the expected scoreS(·, F ) is a well-separated point of minimum in the sense of van der Vaart (1998, p. 45) -as such being a global property of the expected score itself. That means that for any ε > 0 It is relatively straight forward to see that self-calibration implies strict consistency: In the preprint version of Steinwart et al. (2014) it is shown for k = 1 that order-sensitivity implies self-calibration. The next Proposition shows that the kind of order-sensitivity given by Proposition 2.4 also implies self-calibration for k ≥ 1.
Proposition 2.8. Let F be convex, A ⊆ R k be closed, and T : F → A be a surjective and mixture-continuous functional.
We end this subsection about self-calibration by demonstrating its applicability in the context of M-estimation. Theorem 2.9. Let S : A × O → R be an F-self-calibrated scoring function for a functional T : F → A ⊆ R k . Then, the following assertion holds for all F ∈ F.
The proof of Theorem 2.9 is a direct consequence of van der Vaart (1998, Theorem 5.7). Recall that under some additional regularity conditions, it is also possible to derive a Central Limit Theorem associated to the consistency result established in Theorem 2.9. The rate is driven by the dependence structure of the observations Y 1 , Y 2 , . . .. If they are independent the rate is typically n −1/2 . The form of the scoring function only enters via the asymptotic covariance. For details, we refer the reader to Chapter 5.3 in van der Vaart (1998). A detailed discussion of the asymptotic covariance and related efficiency considerations of the estimator are beyond the scope of this paper.

Different notions of order-sensitivity
The idea of order-sensitivity is that a forecast lying between the true functional value and some other forecast is also assigned an expected score lying between the two other expected scores. If the action domain is one dimensional, there are only two cases to consider: both forecasts are on the left-hand side of the functional value or on the right-hand side. However, if A ⊆ R k for k ≥ 2, the notion of 'lying between' is ambiguous. Two obvious interpretations for the multidimensional case are the componentwise interpretation and the interpretation that one forecast is the convex combination of the true functional value and the other forecast.
Definition 3.1 (Componentwise order-sensitivity). A scoring function S : A × O → R is called componentwise F-order-sensitive for a functional T : F → A ⊆ R k , if for all F ∈ F, t = T (F ) and for all x, z ∈ A we have that: Moreover, S is called strictly componentwise F-order-sensitive for T if S is componentwise F-order-sensitive and if x = z in (3.1) implies thatS(x, F ) < S(z, F ).

Remark 3.2.
In economic terms, a strictly componentwise order-sensitive scoring function rewards Pareto improvements 6 in the sense that improving the prediction performance in one component without deteriorating the prediction ability in the other components results in a lower expected score. 6 The definition of the Pareto principle according to Scott and Marshall (2009): "A principle of welfare economics derived from the writings of Vilfredo Pareto, which states that a legitimate welfare improvement occurs when a particular change makes at least one person better off, without making any other person worse off. A market exchange which affects nobody adversely is considered to be a 'Pareto-improvement' since it leaves one or more persons better off. 'Pareto optimality' is said to exist when the distribution of economic welfare cannot be improved for one individual without reducing that of another." Definition 3.3 (Order-sensitivity on line segments). Let · be the Euclidean norm on R k . A scoring function S : is increasing. If the map ψ is strictly increasing, we call S strictly F-ordersensitive on line segments for T .
These two notions of order-sensitivity do not allow for a comparison of any two misspecified forecasts, no matter where they are relative to the true functional value. An intuitive requirement could be 'the closer to the true functional value the smaller the expected score', thus calling for the notion of a metric. Since, for a fixed functional T and some fixed distribution F , we always have a fixed reference point T (F ) and we have the induced vector-space structure of If the assertion does not depend on the choice of p, we shall usually omit the p in the notation. For other choices of A, it would be also interesting to replace the norm by a metric in the following definition.

Definition 3.4 (Metrical order-sensitivity). Let
If additionally the inequalities in (3.2) are strict, we say that S is strictly metrically F-order-sensitive for T relative to · p .
Similarly to (strict) consistency, all three notions of (strict) order-sensitivity are preserved when considering two scoring functions that are equivalent. 7 The notion of componentwise order-sensitivity corresponds almost literally to the notion of accuracy-rewarding scoring functions introduced by Lambert, Pennock and Shoham (2008). Metrically order-sensitivity scoring functions have their counterparts in the field of probabilistic forecasting in effective scoring rules introduced by Friedman (1983) and further investigated by Nau (1985). Actually, the latter paper has also given the inspiration for the notion of ordersensitivity on line segments. It is obvious that any of the three notions of (strict) order-sensitivity implies (strict) consistency. The next lemma formally states this result and gives some logical implications concerning the different notions of order-sensitivity. The proof is standard and therefore omitted.

Componentwise order-sensitivity
Under restrictive regularity assumptions, Lambert, Pennock and Shoham (2008, Theorem 5) claim that whenever a functional has a componentwise order-sensitive scoring function, the components of the functional must be elicitable. Moreover, assuming that the measures in F have finite support, they assert that any componentwise order-sensitive scoring function is the sum of strictly consistent scoring functions for the components. Lemma B.2 shows the first claim under less restrictive smoothness assumptions on the scoring function. For many common examples of functionals, the second claim can be shown relaxing the restrictive condition on F.
. . k}, are mixture-continuous and elicitable with strictly F-consistent scoring functions S m : A m × O → R, then they are order-sensitive according to Lambert (2013, Proposition 2) and Bellini and Bignozzi (2015, Proposition 3.4). Therefore, the sum k m=1 S m (x m , y) is strictly componentwise F-order-sensitive for (T 1 , . . . , T k ). More interestingly, one can establish the reverse of the last assertion. Any strictly componentwise ordersensitive scoring function must necessarily be additively separable. In Fissler and Ziegel (2016, Section 4), we established a dichotomy for functionals with elicitable components: In most relevant cases, the functional (the corresponding strict identification function, respectively) satisfies Assumption (V4) therein (e.g., when the functional is a vector of different quantiles and / or different expectiles with the exception of the 1/2-expectile), or it is a vector of ratios of expectations with the same denominator, or it is a combination of both situations. Under some regularity conditions, Fissler and Ziegel (2016, Propositions 4.2 and 4.4) characterize the form of strictly consistent scoring functions for the first two situations, whereas Fissler and Ziegel (2016, Remark 4.5) is concerned with the third situation. For this latter situation, any strictly consistent scoring function must be necessarily additive for the respective blocks of the functional. And for the first situation, Fissler and Ziegel (2016, Proposition 4.2) yields the additive form of S automatically. It remains to consider the case of Fissler and Ziegel (2016, Proposition 4.4), that is, a vector of ratios of expectations with the same denominator.
The notion of componentwise order-sensitivity has an appealing interpretation in the sense that it rewards Pareto improvements of the predictions; see Remark 3.2. The results of Lemma B.2 and Proposition 3.6 give a clear understanding of the concept including its limitations to the case of functionals only consisting of elicitable components. Ehm et al. (2016) introduced Murphy diagrams for forecast comparison of quantiles and expectiles. Murphy diagrams have the advantage that forecasts are compared simultaneously with respect to all consistent scoring functions for the respective functional. For many multivariate functionals such as ratios of expectations, the methodology cannot be readily extended because there are no mixture representations available for the class of all consistent scoring functions. Proposition 3.6 shows that when considering only componentwise ordersensitive consistent scoring functions, the situations is different and mixture representations (and hence Murphy diagrams) are readily available for forecast comparison.

Metrical order-sensitivity
For a real-valued functional T there can be at most one strictly metrically ordersensitive scoring function, up to equivalence. To show this, we use Osband's principle and impose the corresponding regularity conditions.

Proposition 3.7. Let T : F → A ⊆ R be a surjective, elicitable and identifiable functional with an oriented strict F-identification function
and (VS1) (with respect to both scoring functions) hold, then S and S * are equivalent almost everywhere.
For the higher-dimensional setting we can show a slightly more limited version of Proposition 3.7. Two scoring functions that are additively separable as in (3.3) and that are strictly metrically order-sensitive for the same functional must necessarily be equivalent. For most practically relevant cases -namely when we consider an p -norm with p ∈ [1, ∞) and when the functional possesses an identification function satisfying Assumption (V4) or that are ratios of expectations with the same denominator -Lemma 3.5, Proposition 3.6 and Fissler and Ziegel (2016, Proposition 4.2) yield that any metrically order-sensitive scoring function -presuming there is one -is additively separable. Hence, for these situations, metrically order-sensitive scoring functions are unique, up to equivalence.
Then S * is strictly metrically F-order-sensitive (with respect to the same p -norm as S) if and only if λ 1 = · · · = λ k .
Next, we use the derived theoretical results to examine when some popular functionals admit strictly metrically order-sensitive scoring functions, and if so, of what form they are.

Ratios of expectations with the same denominator
We start with the one-dimensional characterization.
Proposition 3.9. Let F be convex and p, q and assume that T is surjective and int(A) = ∅ is convex. Then the following two assertions are true: (i) Any scoring function which is equivalent to , then any scoring function S * : A × O → R, which is strictly metrically F-order-sensitive and satisfies Assumptions (S1) and (VS1), is equivalent to S defined at (3.4) almost everywhere.
Now, we turn to the multivariate characterization.
and assume that T is surjective and int(A) = ∅. Then, the following assertions are true: (i) Any scoring function which is equivalent to is strictly metrically F-order-sensitive for T with respect to the 2 -norm.
, then any scoring function S * : A × O → R, which is strictly metrically F-order-sensitive with respect to the 2 -norm and satisfies Assumptions (S1) and (VS1), is equivalent to S defined at (3.5) almost everywhere.
, then there is no scoring function S * : A × O → R which satisfies Assumptions (S1) and (VS1) and which is strictly metrically Forder-sensitive with respect to an p -norm with p ∈ [1, ∞) \ {2}.
Savage (1971, Section 5) has already shown that in case of the mean, the squared loss is essentially the only symmetric loss in the sense that it is the only metrically order-sensitive loss for the mean. See also Patton (2017, Section 2.1) for a discussion that symmetry -or metrical order-sensitivity -is not necessary for strict consistency of scoring functions with respect to the mean.

Quantiles
Since we treat only point-valued functionals in this article, we shall assume that the α-quantile of F is a singleton and identify the set with its unique element (henceforth, we shall refer to this assumption as F having a unique α-quantile). 9 Furthermore, note that assuming the identifiability of the α-quantile with the canonical identification function V α (x, y) = 1{y ≤ x} − α on a class F amounts to assuming that F (q α (F )) = α for all F ∈ F. 10 Proposition 3.11. Let α ∈ (0, 1) and F be a family of distribution functions there is no strictly metrically F-order-sensitive scoring function for T α satisfying Assumption (S1).
The reasons for the non-existence of a strictly metrically order-sensitive scoring function for the α-quantile are of different nature in the two cases that α = 1/2 and that α = 1/2 in the proof of Proposition 3.11. In both cases, we used Osband's principle to derive a representations of the derivative of the expected score. Assuming that the derivative has the form as stated in Osband's principle, one can directly derive a contradiction for α = 1/2. However, for α = 1/2, this form merely implies that the distributions in F must be symmetric around their medians. This is not contradictory to the form of the gradient derived via Osband's principle, but only to the assumption that F is convex. Dropping this assumption, we can derive the following Lemma. The proof is straight forward from Lemma B.3. Proposition 3.12. Let F be a family of distribution functions on R with unique medians T 1/2 : F → R and finite first moments. If all distributions in F are symmetric around their medians in the sense that for all F ∈ F, x ∈ R, then any scoring function that is equivalent to the absolute loss S : R × R → R, S(x, y) = |x − y|, is strictly metrically F-order-sensitive with respect to the median. 9 Recall that the α-quantile of a distribution F consists of all points 10 Actually, assuming F is convex and rich enough, this holds for any identification function for the α-quantile. Indeed, consider some distribution function F 0 ∈ F and some level α ∈ (0, 1). Fix some As mentioned above, under the conditions of Proposition 3.12, the necessary characterization of strictly consistent scoring functions via Osband's principle is not available. In particular, this means that we cannot use Proposition 3.7. Indeed, if the distributions in F are symmetric around their medians in the sense of (3.6) and under the integrability condition that all elements in F have a finite first moment, the median and the mean coincide. Hence, any convex combination of a strictly consistent scoring function for the mean and the median provides a strictly consistent scoring function. A fortiori, any scoring function which is equivalent to S(x, y) = (1−λ)|x−y|+λ|x−y| 2 , λ ∈ [0, 1] is strictly metrically Forder-sensitive. However, the class of strictly metrically F-order-sensitive scoring functions is even bigger - Lehmann and Casella (1998, Corollary 7.19, p. 50) show that (subject to integrability conditions) for an even and strictly convex function Φ : R → R, the score S(x, y) = Φ(x − y) is strictly metrically F-ordersensitive for the median. Note that if the distributions in F are symmetric, their center of symmetry, which is the functional solving (3.6), is unique (Fissler, 2017, Lemma 4.1.34), even if the median is not unique. The result of Lehmann and Casella (1998, Corollary 7.19, p. 50) holds for this center of symmetry. Acknowledging that some popular choices for Φ are not strictly convex (see Example 3.14), the following proposition gives a refinement of their result. y)), and for x, z ∈ R the set M x,z = {y ∈ R : Ψ x (y) − Ψ z (y) > 0}. If for all F ∈ F and for all x, z ∈ R with |x| > |z| one has that P(Y − C(F ) ∈ M x,z ) > 0, Y ∼ F , then S is strictly metrically F-order-sensitive for C. In particular, if for all F ∈ F and for all

be a convex and even function, and S
If Φ is strictly convex then M x,z = R for all |x| > |z|.
Example 3.14. Let F be a class of symmetric distributions and S(x, y) = Φ(x − y).
(i) If Φ(t) = |t| 2 , the squared loss arises. Since Φ is strictly convex, the squared loss is strictly metrically F-order-sensitive. (ii) For Φ(t) = |t|, S takes the form of the absolute loss. Then S is strictly metrically F-order-sensitive (and strictly F-consistent) if and only if C(F ) ∈ supp(F ) for all F ∈ F. 11 (iii) Another prominent example of a metrically order-sensitive scoring function for the center of a symmetric distribution besides the absolute or the squared loss is the so-called Huber loss which was presented in Huber (1964) and arises upon taking S(x, y) = Φ(x − y) with We emphasize that there are not only metrically-order sensitive strictly consistent scoring functions for the center of symmetric distributions. One can also use asymmetric scoring functions, for example those for the median or the mean, to elicit the center of symmetry.
Due to the negative result of Proposition 3.11 we dispense with an investigation of scoring functions that are metrically order-sensitive for vectors of different quantiles.

Expectiles
The special situation of the 1/2-expectile, which coincides with the mean functional, was already considered in Subsection 3.3.1, so let τ = 1/2. It is obvious that the canonical scoring function for the τ -expectile, that is, the asymmetric squared loss is not metrically order-sensitive since x → S τ (x + y, y) is not an even function. A fortiori, it turns out that (under some assumptions) there is no strictly metrically F-order-sensitive scoring function for the τ -expectile for τ = 1/2.
Proposition 3.15. Let τ ∈ (0, 1), τ = 1/2, and T τ = μ τ : F → A ⊆ R, int(A) = ∅ convex, be the τ -expectile. Assume that T τ is surjective, and that Assumption (V1) holds with respect to the strict F-identification function V τ (x, y) = 2|1{y ≤ x} − τ | (x − y). Suppose thatV (·, F ) is twice differentiable for all F ∈ F and that there is a strictly F-consistent scoring function S : Interestingly, the arguments provided in the proof of Proposition 3.15 leads to an alternative proof that the squared loss is the only strictly metrically ordersensitive scoring function for the mean, up to equivalence; see Remark C.1 for details.

Order-sensitivity on line segments
Recalling Lemma 3.5, every componentwise order-sensitive scoring function is also order-sensitive on line segments. However, for the particular class of linear functionals, the following corollary shows that any strictly consistent scoring function is already strictly componentwise order-sensitive on line segments. 12 Corollary 3.16. If F is convex and T : F → A ⊆ R k is linear and surjective, then any strictly F-consistent scoring function for T is strictly F-order-sensitive on line segments.
Corollary 3.16 immediately leads the way to the result that the class of strictly order-sensitive scoring functions on line segments is strictly bigger than the class of strict componentwise order-sensitive scoring functions (for some functionals with dimension k ≥ 2.) E.g. consider a vector of expectations satisfying the conditions of Proposition 3.6 which are the same as the ones in Fissler and Ziegel (2016, Proposition 4.4). Due to the latter result, there are strictly consistent scoring functions -and hence, with Corollary 3.16, strictly order-sensitive on line segments -which are not additively separable. By Proposition 3.6 they cannot be strictly componentwise order-sensitive.
We can extend the result of Corollary 3.16 to the case of ratios of expectations with the same denominator.
is strictly F-order sensitive on line segments, where φ is strictly convex differentiable function on A. Fissler and Ziegel (2016, Proposition 4.4) shows that essentially all strictly consistent scoring functions for T in the above Proposition 3.17 are of the form at (3.7); see also Frongillo and Kash (2015a, Theorem 13).
Order-sensitivity on line segments is stable under applying an isomorphism via the revelation principle (Gneiting, 2011, Theorem 4). However, dropping the linearity assumption on the bijection in the revelation principle, order-sensitivity on line segments is generally not preserved; see Subsection 3.4.1.

The pair (mean, variance)
The pair (mean, variance) is of importance not only from an applied point of view but it is also an interesting example in the theory about elicitability. Due to the lack of convex level sets, variance is not elicitable (Gneiting, 2011, Theorem 6). However, the pair (mean, variance) is a bijection of the (elicitable) pair (mean, second moment), and, invoking the revelation principle (Gneiting, 2011, Theorem 4), variance is jointly elicitable with the mean. The revelation principle provides an explicit link between the class of strictly consistent scoring functions for the first two moments which are of Bregman-type (Fissler and Ziegel, 2016, Proposition 4.4) and the respective class for mean and variance.
As the pair (mean, variance) has of a non-elicitable component, if fails to be componentwise order-sensitive (Lemma B.2) and therefore, it is also not metrically order-sensitive. A priori, order-sensitivity on line segments is not ruled out. Corollary 3.16 implies that any strictly consistent scoring function for the pair of the first and second moment is order-sensitive on line segments. Even though the bijection connecting (mean, variance) with the pair of the first two moments is not linear, the following proposition gives necessary and sufficient conditions for scoring functions to be order-sensitive on line segments for (mean, variance). Example 3.19 shows the existence of order-sensitive scoring functions on line segments for (mean, variance).
Proposition 3.18. Let F be a class of distributions on R with finite second moments such that the functional T = (mean, variance) :

scoring function that is (jointly) continuous and for any y ∈ R, the function A x → S(x, y) be twice continuously differentiable. Then S is F-order-sensitive on line segments for T if and only if S is of the form
, is a convex, three times continuously differentiable function such that the second order partial derivatives Example 3.19. An example for a class of strictly convex C 3 -function φ : A → R satisfying (3.9) and (3.10) with equality is given by For the case b 1 = b 2 = b 3 = 0, the resulting scoring function of the form at (3.8) is (3.11) Interestingly, this results not only in an order-sensitive scoring function on line segments for the pair (mean, variance), but it is also a mixed positively homogeneous scoring function of degree −2; see Section 4.2.

The pair (Value at Risk, Expected Shortfall)
Value at Risk (VaR) and Expected Shortfall (ES) are popular risk measures in banking and insurance. For a financial position Y with distribution F and a level α ∈ (0, 1), they are defined as  (2014) where numerous further references are given. VaR α , as a quantile, is elicitable under mild regularity conditions, whereas ES α fails to be elicitable (Gneiting, 2011). However, recently it was shown in Fissler and Ziegel (2016, Theorem 5.2 and Corollary 5.5) that the pair (VaR α , ES α ) is elicitable and the class of strictly convex scoring functions was characterized to be of the form (3.12) (under the conditions of Osband's principle, Fissler and Ziegel (2016, Theorem 3.2, Corollary 3.3)). Note that the proof of Fissler and Ziegel (2016, Theorem 5.2(ii) and Corollary 5.5) is imprecise for the case that a distribution F ∈ F is not continuous at its α-quantile. Moreover, one needs to impose additional assumptions on the action domain A which are satisfied, for example, if A coincides with the maximal action domain {(x 1 , x 2 ) ∈ R 2 : x 1 ≥ x 2 }; see Fissler and Ziegel (2019) for details. Proposition 3.20. Let α ∈ (0, 1), F be a class of continuously differentiable distribution functions on R with finite first moments and unique α-quantiles. Let A ⊆ {(x 1 , x 2 ) ∈ R 2 : x 1 ≥ x 2 } be convex. Define A 2 as the projection of A onto the second coordinate axis and let S : A × R → R be a scoring function of the form with g : R → R differentiable and increasing and φ : A 2 → R twice differentiable, and φ > 0, φ > 0. If

13)
then S is strictly F-order-sensitive on line segments for (VaR α , ES α ).
One might wonder if Proposition 3.20 establishes an alternative set of conditions for strict consistency of scoring functions for (VaR α , ES α ) different from the ones introduced in Fissler and Ziegel (2019, Proposition 2). Indeed, this is the case since strict order-sensitivity on line segments implies strict consistency. However, it is not the condition at (3.13) which is essential for the strict consistency, but rather the condition that g be increasing and φ > 0, and φ > 0.

Example 3.21. Consider the action domain
for all F ∈ F, x ∈ A (Lambert, Pennock and Shoham, 2008;Steinwart et al., 2014). One possible generalization of orientation for higher-dimensional functionals is the following. Let T : for all v ∈ S k−1 := {x ∈ R k : x = 1}, for all F ∈ F and for all s ∈ R such that T (F ) + sv ∈ A. Our notion of orientation differs from the one proposed by Frongillo and Kash (2015a). In contrast to their definition, our definition is per se independent of a (possibly non-existing) strictly consistent scoring function for T . Moreover, whereas their definition has connections to the convexity of the expected score, our definition shows strong ties to order-sensitivity on line segments.
If the gradient of an expected score induces an oriented identification function, then the scoring function is strictly order-sensitive on line segments, and vice versa. However, the existence of an oriented identification function is not sufficient for the existence of a strictly order-sensitive scoring function on line segments. The reason is that -due to integrability conditions -the identification function is not necessarily the gradient of some (scoring) function.

Equivariant functionals and order-preserving scoring functions
Many statistical functionals have an invariance or equivariance property. For example, the mean is a linear functional, and hence, it is equivariant under linear transformations. So E[ϕ(X)] = ϕ(E[X]) for any random variable X and any linear map ϕ : R → R (of course, the same is true for the higher-dimensional setting). On the other hand, the variance is invariant under translations, that is Var(X − c) = Var(X) for any c ∈ R, but scales quadratically, so Var(λX) = λ 2 Var(X) for any λ ∈ R. The next definition strives to formalize such notions.
). If a functional T is elicitable, π-equivariance can also be expressed in terms of strictly consistent scoring functions; see also Gneiting (2011, p. 750). The proof of Lemma 4.3 is direct. It implies that the scoring function is also strictly F-consistent for T . Similarly to the motivation of order-sensitivity of scoring functions, for fixed π : Φ → Φ * , it is a natural requirement on a scoring function S that for all ϕ ∈ Φ the ranking of any two forecasts is the same in terms of S and in terms of S π,ϕ .
Definition 4.4 (π-order-preserving). Let π : Φ → Φ * . A scoring function S : for all F ∈ F and for all x, x ∈ A, where S π,ϕ is defined at (4.1). S is linearly π-order-preserving if for all ϕ ∈ Φ and for all x, x ∈ A there is a λ > 0 such that for all y ∈ O. If S is linearly π-order-preserving with a λ > 0 independent of x, x ∈ A, then we call S uniformly linearly π-order-preserving.
The following lemma is immediate.
Lemma 4.5. Let π : Φ → Φ * . If a scoring function S : A × O → R is linearly π-order-preserving, it is π-order-preserving with respect to any class F of probability distributions on O.
The two practically most relevant examples of uniform linear π-order preservingness are translation invariance and positive homogeneity of scoring functions, or, to be more precise, of score differences. They are described in the two subsequent subsections.

Translation invariance
Consider a translation equivariant functional such as the mean treated in Example 4.2 (ii). Then, a scoring function S : R k × R k → R is said to have translation invariant score differences if it is uniformly linearly π-equivariant with λ = 1 for all ϕ ∈ Φ. In formulae, we require S to satisfy We say that a functional T : Adopting this notion, we say that a scoring function S : Then, the following assertions hold.
for all x ∈ R k and for all F ∈ F.
Using Fissler and Ziegel (2016, Proposition 4.4) one can establish the converse of Proposition 4.7: If V is a linearly (id R k , id R k )-invariant strict F-identification function, then (4.4) implies that S has linearly (id R k , id R k )-invariant score differences. The following lemma shows how to normalize scores with translation invariant score differences to obtain a translation invariant score.
In case of the mean functional on R, Proposition 4.7 has already been shown by Savage (1971) who showed that the squared loss is the only strictly consistent scoring function for the mean that is of prediction error form, up to equivalence. 13 Furthermore it implies that general τ -expectiles and α-quantiles have essentially one linearly (id R , id R )-invariant strictly consistent scoring function only, namely the canonical choices The uniqueness -up to equivalence -disappears for k > 1. For example, for the the 2-dimensional mean functional, the previous results yield that any scoring function S : is strictly consistent for the 2-dimensional mean functional and it is linearly (id R 2 , id R 2 )-invariant, for any h 11 > 0 and h 11 h 22 − h 2 12 > 0. Due to the additive separability of strictly consistent scoring functions for vectors consisting of different quantiles and expectiles (Fissler and Ziegel, 2016, Proposition 4.2), strictly consistent scoring functions that are linearly (id R , id R k )invariant for these vectors are not unique. However, the only flexibility in that class consists in choosing different weights for the respective summands of the scores.
The pair (

consistent scoring function for
13 That means that the scoring function is a function in x − y only.

T that is (jointly) continuous, and for any y ∈ R, the function A x → S(x, y)
is twice continuously differentiable. If S has linearly (M O , M A )-invariant score differences, then there is a λ ≥ 0 and an F-integrable functional a : R → R such that S(x 1 , x 2 , y) = λ(x 1 − y) 2 + a(y).
In particular, S cannot be strictly F-consistent for T .
Then, the following assertions hold: is strictly F-consistent for T and has linearly (M O , M A )-invariant score differences with M O = id R , M A = (1, 1) . (ii) Under the conditions of Fissler and Ziegel (2016, Theorem 5.2(iii)), there are strictly F-consistent scoring functions for T with linearly (M O , M A )invariant score differences if and only if there is some c > 0 such that (4.6) holds. Then, any such scoring function is necessarily equivalent to S d defined at (4.7) almost everywhere, with d ≥ c.
The scoring function S c has a close relationship to the class of scoring functions S W proposed in Acerbi and Szekely (2014); see Fissler and Ziegel (2016, Equation (5.6)). Indeed, S c (x 1 , x 2 , y) = c 1{y ≤ x 1 } − α (x − y) + S W (x 1 , x 2 , y) with W = 1. That means it is the sum of the standard α-pinball loss for VaR α -which is translation invariant -and S 1 . In the same flavor, the condition at (4.6) is similar to the one at Fissler and Ziegel (2016, Equation (5.7)). Since ES α ≤ VaR α , the maximal action domain where S c is strictly consistent is the stripe A c = {(x 1 , x 2 ) ∈ R 2 : x 2 ≤ x 1 < x 2 + c}. Of course, by letting c → ∞, one obtains the maximal sensible action domain {(x 1 , x 2 ) ∈ R 2 : x 1 ≥ x 2 } for the pair (VaR α , ES α ). However, considering the properly normalized version S c /c, this converges to a strictly consistent scoring function for VaR α as c → ∞, but which is independent of the forecast for ES α . Hence, there is a caveat concerning the tradeoff between the size of the action domain and the sensitivity in the ES-forecast. This might cast doubt on the usage of scoring functions with translation invariant score differences for (VaR α , ES α ) in general.
Interestingly, the scoring function S c at (4.7) has positively homogeneous score differences if and only if c = 0. However, A 0 = ∅, which means that the requirement of translation invariance and homogeneity for score differences are mutually exclusive in case of strictly consistent scoring functions for (VaR α , ES α ).

Homogeneity
If one is interested in a positively homogeneous functional of degree one such as the mean, expectiles, quantiles, or ES, a scoring function S : R × R → R is said to have positively homogeneous score differences of degree b ∈ R for this functional if the scoring function is uniformly linearly π-equivariant with Φ = {R x → cx ∈ R, c > 0} the multiplicative group, π the identity on Φ and λ = c b in (4.2). This means that S needs to satisfy for all x, z, y ∈ R and c > 0. Since positive homogeneity of score differences is equivalent to invariance of forecast rankings under a change of unit, it has been argued that it is important in financial applications (Acerbi and Szekely, 2014). Nolde and Ziegel (2017) give a characterization of scoring functions with positively homogeneous score differences for many risk measures of applied interest, such as VaR / quantiles, expectiles, and the pair (VaR, ES); cf. Patton (2011) for results concerning the mean functional.
If the functional T is vector-valued, the degree of homogeneity can be different in the respective components, e.g. in case of the pair (mean, variance) or the vector consisting of the first k moments; cf. Example 4.2(vi). One can denote this property by mixed positive homogeneity, which means in case of the vector of the first k moments that T (L(cY )) = Λ(c)T (L(Y )) (4.9) for all c > 0, where Λ(c) is the k × k-diagonal matrix with diagonal elements c, c 2 , . . . , c k . 14 In this situation, an interesting instant for uniformly linearly πorder-preserving scoring functions S : A × R → R are those with mixed positively homogeneous score differences of degree b ∈ R. That is, for all x, z ∈ A, y ∈ R, and for all c > 0. With k = 2, corresponding assertions hold for the pair (mean, variance) and the respecitve scoring functions.

Proposition 4.11. Let
Let S : A × R → R be a consistent scoring function for the vector of the first k moments of the form − (y, y 2 , . . . , y k ) + a(y), (4.11) where φ : A → R is convex and differentiable with gradient ∇φ (considered as a row vector). Then S has mixed positively homogeneous score differences of degree b ∈ R if and only if for all c > 0 the map is constant.
Recall that the scoring functions of the form at (4.11) are essentially all consistent scoring functions for the vector of different moments (Fissler and Ziegel, 2016, Proposition 4.4). Using Proposition 4.11 it is straight forward to derive consistent scoring functions for (mean, variance) with mixed positively homogeneous score differences.

. Let Assumptions (F1) and (V1) be satisfied with the strict F-identification function
be a strictly F-consistent scoring function for T that is (jointly) continuous and for any y ∈ R, the function A x → S(x, y) be twice continuously differentiable. Then S has mixed positively homogeneous score differences of degree b ∈ R if and only if S(x 1 , x 2 , y) = −φ(x 1 , x 2 + x 2 1 ) + ∇φ(x 1 , x 2 + x 2 1 ) x 1 − y x 2 + x 2 1 − y 2 + a(y), (4.13) where φ : A → R is strictly convex, twice continuously differentiable, and moreover for all c > 0 the map is constant.
It appears that the class of (strictly) convex functions φ satisfying (4.12) is rather flexible. One subclass is the class of additively separable functions φ. That is, where each φ m needs to be convex and Reviewing Nolde and Ziegel (2017, Theorem 5) and restricting attention to the case A ⊆ (0, ∞) k , φ m can be an element of the class Ψ b/m , where Ψ b consists of functions ψ b : (0, ∞) → R of the form On the other hand, there are choices of φ not satisfying such an additive decomposition as in (4.15). One such example can be found in Example 3.19 for b = −2, and is of the form φ(x 1 , x 2 ) = (x 2 − x 2 1 ) −1 for x 2 > x 2 1 .

Appendix A: Assumptions
We present a list of assumptions used in this paper. For more details about their interpretations and implications, please see Fissler and Ziegel (2016) were they were originally introduced.
Assumption (V1). Let F be a convex class of distribution functions on R and assume that for every Note that if V : A×R → R k is a strict F-identification function for T : F → A which satisfies Assumption (V1), then for each x ∈ int(A) there is an F ∈ F such that T (F ) = x. Assumption (V2). For every F ∈ F, the functionV (·, F ) is continuous.

Assumption (F1).
For every y ∈ R there exists a sequence (F n ) n∈N of distributions F n ∈ F that converges weakly to the Dirac-measure δ y such that the support of F n is contained in a compact set K for all n.
Assumption (VS1). Suppose that the complement of the set ·) and S(x, ·) are continuous at the point y} has (k + d)-dimensional Lebesgue measure zero.
Assumption (S2). For every F ∈ F, the functionS(·, F ) is continuously differentiable and the gradient is locally Lipschitz continuous. Furthermore,S(·, F ) is twice continuously differentiable at t = T (F ) ∈ int(A).

Lemma B.2. Let
Proof. Let S be metrically F-order sensitive for T relative to d.
Proof of Proposition 2.6. Let F ∈ F with t = T (F ). Due to the strict Fconsistency of S, the expected scoreS(·, F ) has a local minimum at t. Assume there is another local minimum at some x = t. Then there is a distribution G ∈ F with x = T (G). Consider the path γ : [0, 1] → A, λ → T (λF +(1−λ)G). Due to Proposition 2.4 the function λ →S(γ(λ), F ) is decreasing and strictly decreasing when we move on the image of the path from x to t. HenceS(·, F ) cannot have a local minimum at x = γ(0).
Proof of Proposition 2.8. Let F ∈ F, t = T (F ) and ε > 0. Define Due to the continuity ofS(·, F ), the minimum is well-defined and, as a consequence of the strict F-consistency of S for T , δ is positive. Let x ∈ A. If

C.2. Proofs for Section 3
Proof of Proposition 3.6. Due to the fact that for fixed y ∈ O, V (x, y) is a polynomial in x, Assumption (V3) is automatically satisfied. Let h : int(A) → R k×k be the matrix-valued function given in Osband's principle; see Fissler and Ziegel (2016, Theorem 3.2). By Fissler and Ziegel (2016, Proposition 4.4(i)) we have that for all r, l, m ∈ {1, . . . , k}, l = r, where the first identity holds for almost all x ∈ int(A) and the second identity for all x ∈ int(A). Moreover, the matrix h rl (x) l,r=1,...,k is positive definite for all x ∈ int(A). If we can show that h lr = 0 for l = r, we can use the first part of (C.1) and deduce that for all m ∈ {1, . . . , k} there are positive functions g m : for all (x 1 , . . . , x k ) ∈ int(A). Then, we can conclude like in the proof of Fissler and Ziegel (2016, Proposition 4.2(ii)). 15 Fix l, r ∈ {1, . . . , k} with l = r and F ∈ F such that T (F ) ∈ int(A). Due to the strict F-consistency of S l,z defined at (B.1) we have that and by assumptionq(F ) > 0. Using the surjectivity of T we obtain that h lr (t) = 0 for all t ∈ int(A), which ends the proof.
Proof of Proposition 3.7. We apply Osband's principle, that is, Fissler and Ziegel (2016, Theorem 3.2 for all F ∈ F, x ∈ int(A). Due to the strict F-consistency of S and the orientation of V , it holds that h ≥ 0. We show that actually h > 0. Applying Lemma B.3, one has thatS . Hence, also the derivative with respect to x of the left-hand side of (C.3) must coincide with the derivative on the right-hand side. This yields, using (C.2),  y) is an oriented strict F-identification function for T . Applying Osband's principle to S * , one obtains a function h * : Due to the analogue of (C.3) for S * and (C.4), one obtains . By a similar reasoning as above, one can deduce that h * must be constant and positive. Now, the claim follows by Fissler and Ziegel (2016, Proposition 3.4); see Fissler and Ziegel (2019) for a correction.
Again with Lemma B.3 one obtains the assertion. (ii) The only interesting direction is to assume that S * is strictly metrically F-order-sensitive (with respect to the same p -norm as S). We will show that Setting ε :=S 1 (x 1 , F ) −S 1 (z 1 , F ) > 0, one obtains with the same calculation Proof of Proposition 3.9. (i) We can apply Lemma B.3. Let F ∈ F. Then is an even function in x. Moreover, equivalence of scoring functions preserves (strict) metrical order-sensitivity. (ii) The convexity of A is implied by the mixture-continuity of T and the convexity of F. Then, the claim follows with Proposition 3.7.
We prove (ii) and (iii) together. Assume there is a scoring function S * satisfying the conditions above, so in particular, it is strictly metrically F-ordersensitive with respect to the p -norm for p ∈ [1, ∞). Invoking Lemma 3.5(i), S * is strictly componentwise F-order-sensitive for T . Thanks to Proposition 3.6, S * is additively separable. By Proposition 3.9(i), it is of the form a m (y).
If p = 2, part (i) and Proposition 3.8(ii) yield that λ 1 = · · · = λ k , and hence, S and S * are equivalent. For p = 2, we obtainS(T (F ) + x, Proof of Proposition 3.11. Assume that there exists a strictly metrically Forder-sensitive scoring function S α : R × R → R satisfying Assumption (S1). Due to Lemma B.3, for any F ∈ F and any x ∈ R Using Osband's principle (Fissler and Ziegel, 2016, Theorem 3.2) and taking the derivative with respect to x on both sides, this yields for some positive function h : R → R (the fact that h ≥ 0 follows from the strict consistency of S α and the surjectivity of T α , and h > 0 follows like in the proof of Proposition 3.7). Assume that T α (F 0 ) = 0. For λ ∈ R, we have T α (F 0 (· − λ)) = λ. Therefore, (C.5) implies Setting λ = ±x, one can see that h(±∞) := lim x→±∞ h(x) exists and that On the other hand, for fixed λ ∈ R, we obtain As a consequence, the only remaining possibility is α = 1/2. For fixed x ∈ R, we have implying that h must be constant using (C.5), and that F 0 must be symmetric around its median, i.e. F 0 (x) = 1 − F 0 (−x) for all x ∈ R. 16 Moreover, since h is constant, (C.5) implies that also any other distribution F ∈ F must be symmetric around its median, i.e. F (T 1/2 (F ) + x) = 1 − F (T 1/2 (F ) − x) for all x ∈ R. However, if F 0 is symmetric around its median, then any translation F λ of F 0 is symmetric around its median. But then, there is a convex combination of F 0 and F λ with mixture-parameter β ∈ (0, 1), β = 1/2, such that βF 0 +(1−β)F λ is not symmetric around its median if λ = 0. Consequently, the conditions of the proposition are violated such that a strictly metrically F-order-sensitive function for the median does not exist in this setting.
Proof of Proposition 3.13. Let |x| > |z|. Note that due to the convexity of Φ, it holds that Ψ x ≥ Ψ z . Let F ∈ F with center of symmetry c = C(F ) and let Y ∼ F . Then, using the fact that Φ is even and that This shows the strict metrical F-order-sensitivity. The strict F-consistency follows upon taking z = 0.
Proof of Proposition 3.15. Under the assumptions, Osband's principle yields the existence of a function h : int(A) → R, h > 0 (by an argument like in the proof of Proposition 3.7) such that for all Using the same argument as in the proof of Osband's principle (Fissler and Ziegel, 2016, Theorem 3.2), h is twice differentiable. Assume that S is metrically F-order sensitive. Then, due to Lemma B.3, for any F ∈ F the function g F : A x → g F (x) =S(T τ (F ) + x, F ) is an even function. Hence, invoking the smoothness assumptions, the third derivative of g F must be odd. So necessarily g F (0) = 0. Denoting t F = T τ (F ), some tedious calculations lead to Recalling that h > 0 and τ = 1/2 implies g F1 (0) = g F2 (0). So S cannot be metrically F-order-sensitive.
Remark C.1. Inspecting the proof of Proposition 3.15, equation (C.7) yields for τ = 1/2 for any F ∈ F, t F = T τ (F ). With the surjectivity of T τ this proves that h = 0, such that h is necessarily constant. Hence, we get an alternative proof that the squared loss is the only strictly metrically order-sensitive scoring function for the mean, up to equivalence.
Proof of Corollary 3.16. The linearity of T implies that T is mixture-continuous. Then the assertion follows directly by Proposition 2.4 and the special form of the image of the path γ in the proof therein, which is a line segment.
Proof of Proposition 3.17. Let F ∈ F, t = T (F ), v ∈ S k−1 and 0 ≤ s < s such that t + sv, t + s v ∈ A. ThenS(t + sv, F ) =q(F )(−φ(t + sv) + s∇φ(t + sv)v). The subgradient inequality yields Proof of Proposition 3.18. Let S be F-order-sensitive on line segments. This implies that S is F-consistent. Using the revelation principle, S : A × R → R, is an F-consistent scoring function for T = (T 1 , T 2 + T 2 1 ): F → A , the pair of the first and second moment. Moreover, S fulfils the same regularity conditions as S. Fissler and Ziegel (2016, Proposition 4.4) holds mutatis mutandis also for consistent scoring functions with φ convex. It is straight forward to check that the conditions for Fissler and Ziegel (2016, Proposition 4.4) are fulfilled for S and T with the canonical identification function V : where a : R → R is some F-integrable function and φ : A → R is a convex C 3 -function with gradient ∇φ (considered as a row vector) and Hessian ∇ 2 φ = (φ ij ) i,j=1,2 . In summary, (C.8) yields the form at (3.8). Now, we verify conditions (3.9) and (3.10). Let F ∈ F, with (t 1 , t 2 ) = T (F ).

C.3. Proofs for Section 4
Proof of Proposition 4.7. If a random variable Y has distribution F with F ∈ F, we write F − z for the distribution of Y − z where z ∈ R k . To show the first part, consider any F ∈ F and z ∈ R k . Then Since V is a strict F-identification function for T , T (F − z) = T (F ) − z.
For the second part, Fissler and Ziegel (2016, Theorem 3.2) implies that there exists a matrix-valued function h : R k → R k×k such that for all x ∈ R k and for all F ∈ F. We will show that h is constant. Sincē S(x, F ) −S(x , F ) =S(x − z, F − z) −S(x − z, F − z) for all x, x , z ∈ R k and F ∈ F, we obtain by taking the gradient with respect to x where the second identity is due to the linear (id R k , id R k )-invariance of V . So (C.15) is equivalent toV Now, one can use Assumption (V1) and Fissler and Ziegel (2016, Remark 3.1), which implies that Since x, z ∈ R k were arbitrary, the function h is constant.
Proof of Lemma 4.8. If S has linearly (id R k , id R k )-invariant score differences, S satisfies (4.3) for all x, x , y, z ∈ R k . Due to Lemma 4.3, T must be π id R k ,id R kequivariant, hence, T (δ y ) − z = T (δ y−z ). This yields that S 0 defined at (4.5) is linearly (id R k , id R k )-invariant. Since S and S 0 are of equivalent form, also S 0 is strictly F-consistent for T . The non-negativity follows directly from the fact that F contains all point measures and from the strict consistency.
Proof of Proposition 4.10. The scoring function S c is of equivalent form as given at (3.12) with g(x 1 ) = −x 2 1 /2 + cx 1 and φ(x) = (α/2)x 2 2 . This means that φ is strictly convex and the function x 1 → x 1 φ (x 2 )/α + g(x 1 ) is strictly increasing in x 1 if and only if x 2 + c > x 1 , that is, if and only if (x 1 , x 2 ) ∈ A c . Moreover, one can verify that the action domain A c satisfies the conditions introduced in Proof of Proposition 4.11. Suppose φ satisfies (4.12). This implies that for any c > 0 the map z → φ(Λ(c)z) − c b φ(z) is an affine function. Moreover, a Taylor expansion yields that for all Then, a direct calculation yields the result. Now, suppose (4.10) is satisfied. Its left-hand side equals whereas the right-hand side is Both terms are polynomials in y of degree k, which leads to the identity
Proof of Corollary 4.12. The form at (4.13) follows as in the proof of Proposition 3.18. The rest follows by Proposition 4.11.