Axiomatic arguments for decomposing goodness of fit according to Shapley and Owen values

We advocate the decomposition of goodness of fit into contributions of (groups of) regressor variables according to the Shapley value or—if regressors are exogenously grouped—the Owen value because of the attractive axioms associated with these values. A wage regression model with German data illustrates the method. AMS 2000 subject classifications: 62J05, 62P20, 91A12.


Introduction
One of the unwritten conventions in applied econometrics is that authors provide their readers with some goodness-of-fit measure (GOF) at the end of each regression table.Very rarely, however, the GOF is allocated to individual regressor variables, even though-or because-the literature provides numerous different approaches to do so [3,9].Rather, the discussion of 'relevance' of regressor variables is often confined to the sign and p-value of their corresponding coefficients [12].Due to space constraints, many coefficients are not even reported, leaving readers bewildered as to how important (with respect to GOF) such 'omitted' variables were compared to those variables of primary interest.
In the present paper, we advocate the method that employs the Shapley value [17]-and its generalizations-to distribute the GOF of the model among the regressor variables, henceforth Shapley value decomposition [20].This method takes account of the interplay of regressor variables in sub-models and is calculated on the basis of information on the same type of GOF in these sub-models.Its attractiveness also stems from the fact that it emerges as the unique solution to the decomposition problem under a sound set of assumptions.
A generalization of the Shapley value, the Owen value [14], allows for decomposition in the context of exogenously grouped regressors as is suggested by Shorrocks [18].Such groups may arise, e.g., if the model includes polynomial terms of a variable, dummy variables that recode a categorical variable, or variables that are conceptually related for other reasons.Under such circumstances, it is necessary to adjust the processing of the information about the GOF, such that both the resulting values of the variables and the values of groups (defined as the sum of the values of the variables in the respective group) can be interpreted.In contrast to the model without exogenous groupings, this requires, for instance, that equally performing groups receive the same group values.
Apart from the characterizing properties, the methods advocated satisfy other nice properties.For instance, if the GOF is insensitive to a transformation of variables, this insensitivity is passed on to the valuation of variables.Also, a variable that contributes nothing to GOF in all sub-models receives the value zero.Moreover, the Owen values satisfies the following 'consistency' property.The sum of the values attributed to the variables of an exogenously given group equals the amount given to the group if the GOF would be assigned to the groups directly-not to the variables-using the Shapley value decomposition.Thus, the Owen value provides the theoretical underpinning to allocate GOF among the groups by means of the Shapley value decomposition.
The paper presents both concepts applied to regression analysis.We then provide an illustrative example with the decomposition of R2 of a wage regression with data from Germany.Our conclusion covers some possible extensions of the approach.

Method
Consider the OLS regression model and let K = {x 1 , ..., x j , ..., x k } denote the set of regressor variables.This modelwhich we refer to as the 'full model'-produces a particular worth 2 for some GOF measure, such as R 2 .We seek to distribute this worth among the regressor variables.For this purpose, we will consider additional regression models for every combination of variables T ⊆ K: Each of these sub-models is associated with a worth of the respective GOF, e.g.R 2 (T ).These worths can be collected in a function that maps from K's power set 2 K to the reals, assigning to every combination of variables its GOF: where, e.g., f (K) denotes the GOF of the full model.In the following we assume that f is zero-normalized such that the empty model exhibits a GOF of zero, i.e., f (∅) = 0.4 As a generalization, consider the case where the regressor variables are grouped (e.g., for reasons mentioned in the introduction) such that K is partitioned into G = {G 1 , ..., G , ..., G γ }.Estimation of the full OLS model again gives the GOF of the full model f (K) that is to be distributed. 5he decomposition problem now boils down to the following question: Given the function f , how should f (K) be distributed among the variables x 1 , . . ., x k ?Our answer makes use of results from cooperative game theory.
Cooperative game theory provides insights into rules for distributing f (K) systematically among players, or in the present case, the regressor variables.These rules exhibit certain properties, although not all rules satisfy all desirable properties.Instead of judging the attractiveness of (ad hoc) formulae to decompose f (K), one should judge the attractiveness of a decomposition rule on the basis of its characteristic properties.
Before we turn to a discussion of sound conditions for such a purpose, we describe a way to calculate the Shapley value (ungrouped case) and the Owen value (for grouped regressor variables).

Calculating the Shapley value and the Owen value
Starting with the full model, assume we successively remove regressor variables, one by one and according to a particular ordering of the variables.The difference in GOF associated with the elimination of a variable can be regarded as the variable's marginal contribution in this particular ordering of the regressors.Treating all orderings equally probable, the Shapley value of a variable equals the variable's average marginal contribution over all possible orderings.
More formally, let θ be a permutation of the variables with the interpretation that variable x j has the position θ (j) in θ.The set of variables that appear before x j in θ is denoted by P (θ, x j ) := {x p ∈ K | θ (p) < θ (j)}.Thus, in the permutation θ, variable x j changes the GOF by which we call variables x j 's marginal contribution in θ.
Denoting by Θ(K) the set of all |K|! permutations on K, we may now calculate the Shapley value of variable x j as M C(x j , θ). 6   Now we turn to the case where explanatory variables are organized in groups whose composition is known a priori to the analyst.Then the Owen value, a generalization of the Shapley value, takes the implied restrictions on the set of possible sub-models into account, as follows.To outsiders-i.e., variables belonging to other groups-, the members of a particular group can only appear jointly and will therefore 'negotiate' a value for their group as a whole.Therefore, a group can only be subdivided when its members negotiate the distribution of the group's payoff between themselves.In this situation, the other groups are either completely present or completely absent.In comparison to the previous paragraph on the Shapley value, this implies that sub-models in which two or more groups are represented by some, but not all of their constituent variables are not considered anymore.The set of rank orders Θ(K, G) that respect the partitioning scheme G is lower now (as long as not all groups are singleton groups, γ = |K|, or all variables belong to one group, γ = 1): Given this limited set of admissible rank orders, the Owen value can then be calculated along the lines of the Shapley value: Of course, computing these values per se is expensive.Moreover, the costs to calculate the GOF for subsets grows substantially with the number of regressor variables.For R 2 as GOF, this burden can be alleviated to some extent if the calculation is based on the covariance structure of the variables rather than the individual observations [6].

Why Shapley value decomposition should be used
In the following we motivate the conditions under which the Shapley value remains as the only candidate for decomposing f (K), given the information in 6 For every T ⊆ K, there are |T |! • (|K| − |T | − 1)! permutations θ, such that T = P (θ, x j ).Thus, an alternative and computationally less expensive formula for the Shapley value is: imsart-ejs ver.2011/12/06 file: TSWLatexianTemp_001933.tex date: June 23, 2012 f : T → f (T ) for T ⊆ K. Let φ be a decomposition rule.Formally, this is a function that assigns to every f the outcomes of the variables, i.e., φ xj (f ) is the value we attribute to variable x j if the combinations of the variables are associated with GOF according to f .The first condition of interest merely states what is to be distributed among the variables: Efficiency: The GOF of the full model is decomposed among the regressor variables, i.e., xj ∈K φ xj (f ) = f (K).
Next, we identify the criterion on which the judgment about the explanatory performance of a variable should be based.Virtually all approaches in the literature refer to the marginal contributions of a variable, which is compatible with the following condition.
Monotonicity: A change in the GOF worths from f A to f B such that variable x j exhibits higher marginal contributions in f B , must not decrease the explanatory value attributed to variable x j , i.e., The Monotonicity condition might be less reasonable if Efficiency were not to be imposed.To see this, assume we had imposed xj ∈K φ xj (f ) = 1 instead of Efficiency, and assume there are two samples with the same explanatory variables.Sample A yields the GOF worths f A (T ) for T ⊆ K, and sample B yields f B (T ) for T ⊆ K.If some variable performs better in sample B than in sample A, it is supposed to be 'rewarded'.However, it would not be clear that φ xj (f B ) ≥ φ xj (f A )-as is required by Monotonicity-should hold, because other variables could exhibit even higher increases in performance, and the restriction to distribute 1 could have implied a decrease in φ xj .Given that we distribute f (K), however, higher explanatory performance due to the other variables should also increase f (K), and therefore an increase in the values of all variables is possible.
Finally, it should be the case that variables that perform equally with respect to GOF receive the same outcome.The only difficulty is to identify equally performing variables.To this end, we say that two variables x j and x j are substitutes according to f if it does not matter whether x j or x j is taken into a model, i.e., if Equal treatment property: If the variables x j and x j are substitutes according to f , then φ x j (f ) = φ x j (f ) .
To our mind, these three conditions are plausible and not too restrictive.What is also appealing about them is that they leave no room for ambiguity as to which decomposition method should be used.
Theorem 1 (Young [24]) The Shapley value is the only rule that satisfies Efficiency, Monotonicity, and the Equal treatment property. 7n other words, other decomposition rules violate at least one of the three conditions. 8The Shapley value brings about several other desirable properties.For example, a variable that never contributes anything to the model's GOF receives an outcome of zero.In the case of correlated regressors, a variable may receive a non-zero outcome if it contributes to GOF in sub-models, even though its coefficient in the full model is zero.Grömping [7] discusses this point and suggests that this property may be reasonable in many practical settings where causal relationships are not obvious.To be sure, the Shapley value does not identify causal mechanisms in the presence of multicollinearity, in the sense that it assumes that all sub-models provide useful information.9

Why Owen value decomposition should be used
In the case with a priori grouped regressor variables, a decomposition rule ϕ prescribes the outcome of the variables for any given pair (f, G).Note that a rule ϕ does not explicitly attribute a value to a group, so that the outcome of the group is defined as the sum of the values of all its constituent variables, ϕ G (f, G) := xj ∈G ϕ xj (f, G).The Efficiency and Monotonicity conditions can both be adapted accordingly, by adding some fixed a priori partition.
Efficiency*: The GOF of the full model is decomposed among the variables such that xj ∈K ϕ xj (f, G) = f (K).
Monotonicity*: Leaving G fixed, a change in the GOF worths from f A to f B , such that variable x j exhibits higher marginal contributions in B, must not decrease the explanatory value attributed to variable x j .
The handling of substitutes requires attention, as variables in different groups cannot be substitutes anymore.Therefore, we say x j and x j are substitutes according to f and G, if x j and x j belong to the same group and f (T ∪ {x j }) = f (T ∪ {x j }) for all T ⊆ K \ {x j , x j }.
Equal treatment of players property: If the variables x j and x j are substitutes according to f and G, then One may identify interchangeable groups as well.We say G and G are substitutes according to f if it does not matter whether G or G is taken into a model, given that the other groups are not split, i.e., if Equal treatment of groups property: If the groups G and G are substitutes according to f , then The conditions Equal treatment of players and Equal treatment of groups touch upon the interpretation of the a priori groups.If the decomposition rule did not satisfy these conditions, in particular the latter one, then it would not be possible to identify equally performing groups (substitutes) that we want to receive the same share of f (K); a fortiori, a comparison of group values would have little meaning.
A recent result from cooperative game theory suggests a unique solution to the decomposition problem when there are a priori groupings.
Theorem 2 (Khmelnitskaya and Yanovskaya [10]) The Owen value is the only value that satisfies Efficiency*, Monotonicity*, the Equal treatment of players property, and the Equal treatment of groups property.
The Owen value has other desirable properties not mentioned so far.For instance, a variable that never contributes anything to the GOF of the model receives the outcome zero. 10Further, the following consistency properties hold.If there are only trivial groups, i.e., if all variables belong to one group or if all variables form groups of their own, Owen and Shapley value decomposition coincide.
Now suppose an a priori group were replaced by one variable equipped with the same contribution to the GOF of the model as all the variables of the original group.Then this new variable obtains the same outcome as the replaced group would have obtained.Consequently, the model's GOF is distributed among the groups in the same fashion as it is distributed among the variables if there were no groups-namely according to the Shapely value decomposition.Hence, arguing for the Owen value decomposition also supports the approach to merely distribute the GOF of the full model f (K) among the groups according to the Shapely value decomposition if a further decomposition within the groups is not of interest.This can be attractive in the case of a large group of 'control variables' (e.g., dummy variables for regions), if a detailed decomposition among the group's member variables is computationally very costly.

Application to German wage data
As an illustrative application of the method we estimate an augmented Mincer regression model for male German workers.We focus on the relative importance of 'human capital' on earnings.Our data originate from the German Socio-Economic Panel wave of 2006 [22].
This particular wave features a short test on cognitive ability, the symboldigit correspondence test (SCT), for the group of participants who took the CAPI interview [11]. 11To simplify the interpretation, we rescale SCT such that it varies between 0 (lowest score) and 1 (highest score).Formal education is accounted for in the form of the years of schooling (EDUC). 12In addition, we consider the interaction term of ability and formal education.These three variables form the first group.
The second and third groups of regressors consists of a polynomial in years of labor market experience (EXPER) and a polynomial in years of job tenure (TENURE), respectively. 13Taken together, these first three groups reflect 'human capital'.The model also includes four groups of control variables: marital status (MARRIED), firm size (3 dummy variables), industry classification (6 dummy variables), and region (14 dummy variables). 14he dependent variable is the natural logarithm of hourly pre-tax earnings.We restrict the sample to male German citizens, aged 20-64 years, who worked for at least 10 hours per week, who were not self-employed and not disabled.This leaves us with 850 observations with valid observations.Table 3 presents Owen values and their group sums as percentage of the overall R 2 of the model, which turned out to be 0.501.According to these values, one third of the explained variance can be attributed to the group of formal education and ability variables.While the entire group is statistically significant at the 1% level, both the main effect of SCT and the interaction effect are only significant at the 10% level.While the GOF decomposition does not have standard errors, bootstrapping may help to attach greater reliability to comparisons of importance.Figure 1 shows the 90% bootstrap confidence intervals for the absolute (i.e., not standardized by R 2 ) group values. 15This reinforces the notion that the first group is the 'most important' one, as its confidence interval-that reaches from 14% to 20% of the variance in log wagesdoes not overlap with any of the other ones.Within this first group, the main effect of EDUC is clearly the most important one.Remarkably, the interaction term plays a more important role (8% of R 2 ) than the main effect of SCT (3% of R 2 ), again with confidence intervals not overlapping (Figure 2).Looking at the coefficients, the model implies that up to 16 years of education, more cognitive ability is associated with higher earnings.The polynomial terms of labor market experience and job tenure suggest positive effects on earnings in the first years, with turning points after about 30 years in both cases.Interestingly, our procedure assigns greater importance in terms of GOF to the tenure polynomial, although the coefficients suggest that the experience profile is the steeper one.However, both confidence intervals include the value of the respective other group, i.e., generalizations on the difference in importance should not be drawn on the basis of our data (Figure 1).
In terms of 'group importance', firm size categories and the regional composition reach a similar order of magnitude as the tenure polynomial.While our focus is not on these dummy variables, such information may nevertheless be of interest to the reader, e.g., against the backdrop of the long economic convergence process in East Germany after the fall of the Berlin wall.Group values may thus provide the reader a space-conserving impression of the importance of control variables that are usually omitted from regression tables.

Concluding remarks
Decomposition of GOF provides an attractive diagnostic tool for identifying important (groups of) explanatory variables in a given regression model.We have argued, on the grounds of its attractive properties, that the Shapley value should be used for this purpose.The Shapley value and its axiomatic foundations can be generalized.The Owen value constitutes such a generalization where an a priori grouping of the regressor variables is taken into account, which accommodates many empirical analyses in practice.A further generalization could allow for additional levels of aggregation [23].In our wage regression example, such a level structure design could be implemented to assign the first three groups into a 'human capital' cluster.
One can also imagine situations in which certain variables must always be included in all sub-models, e.g., time fixed effects in a panel data analysis, or situations in which external knowledge on causal relationships can be exploited.In such cases, restricting the set of potential models, such that some variables must always be present or can only appear in combination, seems appropriate.An implementation could follow along the lines of the Shapley value for Games with Restricted Coalitions [4], which has an axiomatic foundation in the same spirit as Young's axiomatization presented in Section 2.2.

Table 1
OLS regression results with decomposition of R 2 (in %)