On a New Class of Multivariate Prior Distributions: Theory and Application in Reliability

. In the context of robust Bayesian analysis for multiparameter distributions, we introduce a new class of priors based on stochastic orders, multivariate total positivity of order 2 ( MTP 2 ) and weighted distributions. We provide the new deﬁnition, its interpretation and the main properties and we also study the rela-tionship with other classical classes of prior beliefs. We also consider the Hellinger metric and the Kullback-Leibler divergence to measure the uncertainty induced by such a class, as well as its eﬀect on the posterior distribution. Finally, we conclude the paper with a real example about train door reliability.


Introduction
It is well known that Bayesian statistics provides an excellent theoretical framework for analyzing experimental data. The key of its success lies on its ability to incorporate prior knowledge about the quantity of interest as a distribution function. This prior distribution, together with experimental data, leads in general to a better estimation of the quantity under study. A thorough review of the Bayesian approach can be found in Berger (1985) and Bernardo (2003). As it is nicely described in Basu (1994), any elicitation process leading to a prior information is to some extent arbitrary. Therefore, the advantage given by the prior information becomes in some way a disadvantage, being in fact the critical and even polemical point of Bayesian methodology. This is exactly the problem addressed by robust Bayesian analysis, also called Bayesian sensitivity analysis, quantifying and interpreting the uncertainty induced by the partial knowledge of the prior information. Of course, this also holds for other classical elements in Bayesian analysis as likelihood and loss. A thorough review of the robust Bayesian approach can be 32

On a New Class of Multivariate Prior Distributions
found in Berger (1994) and Ríos Insua and Ruggeri (2000). With respect to uncertainty in the loss functions we refer the readers to Makov (1994), Dey et al. (1998), Martín et al. (1998) and, more recently, to Arias-Nicolás et al. (2006, 2009, among others. Focusing only on the prior uncertainty, it is a common practice in the literature to replace the specific prior distribution by a class of priors Γ. The use of this class somehow overcomes the classical criticism about the choice of a unique prior on the grounds of arbitrariness and bias. For such a reason, we find different valuable definitions in the literature depending on their use as parametric families, contamination classes, density bands, densities with a few specified percentiles, distributions bands, etc. Denoting in bold lowercase and bold uppercase vectors and random vectors, respectively, given a specific prior π and in (0, 1), the classical -contamination class is defined as Γ = {π : π = (1 − )π + Q, Q ∈ Q}, (1.1) where π is given by a mixture and Q is a family of priors called the class of contaminations, see e.g. Berger and Berliner (1986), Moreno and Cano (1991), Betrò et al. (1994), among others. For a detailed illustration of classes of priors and topological neighborhoods we refer to Berger (1985Berger ( , 1994, Fortini and Ruggeri (2000) and Moreno (2000).
Recently Arias-Nicolás et al. (2016) defined a new class of priors, called the distorted band, based on stochastic orders and distortion functions. That new class addresses the problem of uncertainty in models depending just on one-parameter. To better understand the concept of that class, let us first recall the classical Bayesian inference framework. We denote by X the underlying observation with probability density function (PDF) p θ (x), where θ represents the unknown parameter defined in the set of states Θ ⊆ R n , n ∈ N, n ≥ 1. Then, a specific prior π over Θ with PDF π(θ) represents the state of knowledge before any data X is observed. After taking into account the observed data, x, and using Bayes's theorem, the posterior distribution is denoted by π x and its density is given by where l(θ | x) and m π (x) denote the likelihood function and the marginal density, respectively.
At this point we also recall the definition of the likelihood ratio order for univariate random variables. Let π 1 and π 2 be two prior distributions over Θ ⊆ R. The former is said to be smaller than the latter in the likelihood ratio order, if the ratio π 2 (θ)/π 1 (θ) increases over the union of the supports of the two PDFs. We denote this occurrence by π 1 ≤ lr π 2 . Roughly speaking, this means that π 2 takes on larger values and more variability than π 1 . For more details on the concept of likelihood ratio order see Müller and Stoyan (2002) and Shaked and Shanthikumar (2007).
We finally recall the concept of distortion functions. These are non-decreasing continuous functions h : [0, 1] → [0, 1] such that h(0) = 0 and h(1) = 1. If π is a univariate prior with cumulative distribution function (CDF) F π , the distorted CDF is given by and represents the CDF of a particular random variable called the distorted random variable, with distribution π h . If the distorted CDF is differentiable, then, taking the derivative in (1.3), we get the distorted density function given by see Section 2.6 in Denuit et al. (2005) for a review of the distortion concept. Although priors can be distorted according to different criteria, Arias-Nicolás et al. (2016) argue for the use of convex and concave distortion functions based on two arguments. First, they represent satisfactorily a change in the weighting of the underlying prior. Thus, a convex (concave) distortion function gives more weight to higher (smaller) values of the variable. Second, they have desirable properties when we compare the original prior with the distorted one. If h 1 is concave and h 2 is convex, it follows from Lemma 1 in Arias-Nicolás et al. (2016) that: (1.5) Inspired by property (1.5), the distortion band, denoted by Γ h1,h2,π , associated with a specific prior π and based on h 1 and h 2 , a concave distortion function and a convex distortion function, respectively, is defined as (1.6) As an immediate consequence, it is shown in Arias-Nicolás et al. (2016) that the posteriors corresponding to π h1 and π h2 are also lower and upper bounds for the class of all posteriors π x , in the likelihood ratio order sense, i.e., for all π ∈ Γ h1,h2,π then π h1,x ≤ lr π x ≤ lr π h2,x . (1.7) Other interesting properties of the distorted band are studied in detail in Arias-Nicolás et al. (2016), where the authors show that Γ h1,h2,π fulfills all the requirements that Berger (1994) discussed about the choice of a class of priors. First, its elicitation and interpretation is easy. Second, the prior uncertainty can be reflected by using different metrics. Finally, the range of quantities of interest can be computed by just looking for the extremal distributions generating the class. In conclusion, the distortion band is regarded as a "neighborhood" of the prior π. For further information regarding recent applications of the distorted band see Joshi et al. (2018), Sánchez-Sánchez et al. (2019) and Barrera et al. (2019). Also see Huang and Mi (2018) for applications of the likelihood ratio order in the Bayesian framework.
Note that the distorted band model defined in (1.6) is valid for prior distributions defined over Θ ⊆ R. As a natural extension, we propose in this paper a model for multiparameter distributions, i.e., for distributions such that the parameter set is Θ ⊆ R n , n > 1. Our first attempt is to extend the two key concepts used in the model (1.6), namely the likelihood ratio order and the distortion function, to the multivariate setting. Regarding the likelihood ratio order, this can be easily reached by using the classical multivariate extension introduced in Karlin and Rinott (1980a) (see also Whitt (1980) and Shaked and Shanthikumar (2007)) that will be recalled later on. Regarding the extension of distortion function, although the univariate concept is widely used in different fields (see e.g., Quiggin (1982), Yaari (1987), Schmeidler (1989), Denneberg (1990), Wang (1995Wang ( , 1996 and Goovaerts and Laeven (2008)), there is no unique way to extend it to the multivariate setting (see, e.g., Valdez and Xiao (2011), Di Bernardino and Rullière (2013) and Navarro et al. (2016)), for different multivariate extensions. Instead of facing the dilemma of choosing among these equally plausible candidates, we propose a new approach in this paper, based on weighted distributions. Weighted distributions, first introduced by Fisher (1934) and explicitly defined in Rao (1963) (see also Rao (1985)) provide a method to adjust a probability distribution in a way that is different from the one based on distortion functions and that can be unambiguously extended to the multivariate case (see, e.g., Jain and Nanda (1995) and Navarro et al. (2006Navarro et al. ( , 2011). Given a prior π with density function π(θ), θ ∈ Θ ⊆ R n , let w be a weight function, i.e., a non-negative function w : R n → R + such that the expectation E π [w(θ)] is strictly positive and finite. A weighted random vector π w is a random vector with density function given by (1.8) Note that for n = 1, in the case of an absolutely continuous prior π with CDF F π , the distorted CDF associated to a differentiable distortion function h has a density function f π h that can be written as a weighted density function, where the weight function ω(θ) = h [F π (θ)] depends on F π (this is noted, for example, in Furman and Zitikis (2008) and Blazej (2008)). In other words, when we consider absolutely continuous random variables, weighted distributions are more general objects than distorted distributions. Moreover, if this is the case, the distortion h is convex (resp. concave) if and only if the weight function w is increasing (resp. decreasing). In this new framework, we construct a weighted band (rather than a distorted band) by making a perturbation over the prior π by considering two weight functions: w 1 (decreasing) and w 2 (increasing). Under these assumptions, it is well-known (for n = 1) that π w1 ≤ lr π ≤ lr π w2 . (1.10) Observe that although distorted distributions and weighted distributions are different in nature, they are used in a similar way to change the weighting of the underlying prior. Thus, an increasing (decreasing) weight function gives more weight to higher (smaller) values of the variable.
The multivariate weighted band is introduced in Section 2, along with its main properties. In particular, we show how this multivariate bands inherits some of the good properties of the univariate distorted band, including the relative easiness in specifying it and interpreting the posterior distributions. This fact is very uncommon in the multivariate case. In Section 3 the Hellinger metric and the Kullback-Leibler divergence are introduced to incorporate the degree of uncertainty in the elicitation process. We present an application about failures of train doors in Section 4, discussing also how to obtain a sample from the class of posterior distributions. Concluding remarks and some ideas for future researches are finally presented in Section 5.
In this paper, for any random variable Z and an event A, we denote by [Z|A] the random variable whose distribution is the conditional distribution of Z given A. Given a random variable X with distribution function F , we define the quantile function as F −1 X (p) = inf{x : F X (x) ≥ p}, for all real values p ∈ (0, 1). The symbol = st means equality in law. A set U ⊆ R n is called upper (lower) if y ∈ U whenever y ≥ (≤)x and x ∈ U. Given two real numbers x and y, we denote x ∨ y = max{x, y} and x∧y = min{x, y}. For two real vectors x and y, x∨y and x∧y mean the componentwise maximum and the componentwise minimum, respectively. Finally, we use the terms increasing and decreasing in a wide sense, that is, a function g : R n → R is increasing (decreasing) if g(x) ≤ (≥)g(y) for all x ≤ y. Let X be a n-dimensional random vector and let I ⊂ {1, . . . , n}, then X I denotes a random vector constructed by using only the coordinates of X in I.

A multivariate class of priors: the weighted band
In order to ease the exposition, we will only consider random vectors with absolutely continuous distributions having densities with respect to a product measure on R n . However, it is possible to define all key concepts for discrete distributions and the results are essentially the same. Let's start by recalling the definition of some multivariate stochastic orders that we will use in the paper.
Definition 2.1. Let X and Y be two n-dimensional random vectors with CDFs F and G, respectively. Then, it is said that X is smaller than Y in the usual multivariate stochastic order, denoted by X ≤ st Y, if P {X ∈ U} ≤ P {Y ∈ U}, for all upper sets U ⊆ R n .
Roughly speaking, X is less likely than Y to take on large values, where "large" means any value in an upper set U, for any upper set U. As an immediate consequence, just taking in account that the complement of any upper set is a lower set, and vice versa, we can also compare the CDFs, i.e., (2.1) Note that condition (2.1) is also sufficient only in the univariate case. The multivariate stochastic order has a characterization in terms of the expectations of increasing transformations, i.e., X ≤ st Y if, and only if, for all increasing real functions φ on R n and provided the expectations exist.
Definition 2.2. Let X and Y be two random vectors with PDFs f and g, respectively.
Then, it is said that X is smaller than Y in the multivariate likelihood ratio stochastic order, denoted by X ≤ lr Y, if for every x and y in R n . (2. 3) The following implication is also well known: The definition of multivariate total positivity of order 2 will be of key importance for our purpose. Definition 2.3. A function l : R n → R + , (n ∈ N, n ≥ 2) is said to be multivariate totally positive of order 2 (MT P 2 ), (T P 2 when n = 2) if l satisfies (2.5) Additionally, a n-dimensional random vector X with PDF f is said to be MT P 2 if its density f is MT P 2 or, equivalently, if X ≤ lr X.
We will mention briefly some interesting results concerning MT P 2 and ≤ lr concepts that we will use later on.
Lemma 2.4. The product of MT P 2 functions is still MT P 2 . Additionally, a function l : R n → R + defined by l(x) = n i=1 g(x i ), where g i , i = 1, . . . , n, are univariate non-negative functions, is MT P 2 .
Lemma 2.5. Let X and Y be two random vectors with PDFs f and g, respectively. If either X or Y is MT P 2 then X ≤ lr Y if and only if g(x)f (y) ≤ g(y)f (x), ∀x ≤ y, or, equivalently, if g(x)/f (x) is increasing in the union of their supports. Lemma 2.6. Let X and Y be two random vectors such that X ≤ lr Y, then for any I ⊂ {1, . . . , n} we have X I ≤ lr Y I .
It is worth mentioning that the relation X ≤ lr Y always implies that the ratio g(x)/f (x) is increasing in x. The converse is only true in the univariate case.
Both concepts MT P 2 and ≤ lr are explicitly defined in Karlin and Rinott (1980a) but they are also implicit in many classical papers in the literature, see for instance Lorentz (1953), Karlin (1968), Sarkar (1969), Rinott (1973), Holley (1974), Kemperman (1977) and Preston (1974). Additionally, an excellent review of all previous concepts and multivariate stochastic orders can be found in Müller and Stoyan (2002) and Shaked and Shanthikumar (2007). It is also remarkable the equivalence between log-supermodular and MT P 2 functions, where a function f : . This equivalence leads to an interesting analytical characterization of strictly positive MT P 2 functions, see for example Topkis (1978Topkis ( , 1998.
Although at first glance MT P 2 property seems to be a technical condition, we find in Karlin and Rinott (1980a) a large list of multivariate distributions that satisfy the MT P 2 condition. For example, as a direct consequence of Lemma 2.4, it is apparent that independent random variables have a joint MT P 2 density. Also, the MT P 2 concept is fundamental for many probability inequalities that have important applications to multivariate analysis, multiple hypothesis testing and approximating probabilities, see for example Karlin andRinott (1980a,b, 1981), Pearlman and Olkin (1980), Glaz and Johnson (1984), Sarkar and Chang (1997) and Sarkar (1998Sarkar ( , 2008. To conclude, it is also well-known that MT P 2 is a positive dependence property that implies the classical positive associated concept. If X is MT P 2 then the covariance Cov(φ(X), ψ(X)) ≥ 0, provided φ and ψ are simultaneously monotone increasing or decreasing, see Esary et al. (1967), Sarkar (1969) and Karlin and Rinott (1980a).
As we have mentioned in the introduction, we find in the theory of weighted distributions a useful tool to incorporate the uncertainty about specification of a prior belief. Let π be a multivariate prior with density function π(θ), θ ∈ Θ ⊆ R n and let w : R n → R + be a weight function. The weighted prior π w having density function π w (θ) defined in (1.8) represents a weighted density to measure the uncertainty about the specific prior density.
A weighted density can be defined according to different criteria, as we will show later on. We restrict our attention to increasing and decreasing weight functions for two reasons. First, as explained before, for n = 1 they represent a generalization of the convex and concave distortion functions used in the distorted band. And second, they enable an ordering when we compare the specific prior with the weighted one as we can see in the following result, which extends to n ≥ 2 the property (1.10).
Lemma 2.8. Let π be a specific MT P 2 prior belief. Let w be an increasing (decreasing) weight function. Then π ≤ lr (≥ lr )π w .
Proof. Let us consider an increasing weight function w and θ 1 , θ 2 ∈ Θ such that θ 1 ≤ θ 2 . The product π w (θ 1 )π(θ 2 ) satisfies that where the equalities follow directly from expression (1.8) and the inequality comes from the fact that w is increasing. The proof concludes just using Lemma 2.5. For a decreasing weight function the proof holds using a similar argument.
It follows from (2.4) and Lemma 2.8 that, when the prior belief is MT P 2 and w is increasing (decreasing), the weighted prior belief π w is more (less) likely than the prior belief π to take on larger values. This result extends the property (1.10) from the case n = 1 to n ≥ 2. Moreover, for n = 1 this result plays an equivalent role to Lemma 1 in Arias-Nicolás et al. (2016) and Theorem 1 of Blazej (2008). Now, let us suppose that the decision-maker is able to represent the changes in a prior belief, π, by a decreasing weight function, w 1 , and an increasing weight function, w 2 . A direct consequence of this fact, jointly to Lemma 2.8, allows us to define two weighted priors, π w1 and π w2 , such that π w1 ≤ lr π ≤ lr π w2 . Then, we present the following neighborhood band for π inspired by these inequalities.
Definition 2.9. Let π be a specific MT P 2 prior belief. We will define the weighted band Γ w1,w2,π associated with π based on w 1 and w 2 , a decreasing weight function and an increasing weight function, respectively, (weighted band, for short), as As a consequence of Lemma 2.8, it is evident that π ∈ Γ w1,w2,π . Therefore the weighted band can be considered as a particular "neighborhood" band of π, where the weighted priors are the lower and upper bound distributions of the class. From Definition 2.9 uncertainty could be introduced just through an upper (lower) bound by considering w 2 (w 1 ) the identity function.
In order to give a meaningful understanding of the weighted band, we provide the following interpretations. First of all, from (2.4) and (2.1) it is apparent that the weighted band is a subclass of the following multivariate class of priors: It is worth mentioning that the class given in (2.6) is a multivariate generalization of the classical band class defined in Basu and DasGupta (1995) for univariate priors. Additionally, the weighted band has also some interesting interpretations in terms of prior probability sets. For example, given a prior π ∈ Γ w1,w2,π we obtain that Rinott and Scarsini (2006) and Shaked and Shanthikumar (2007) for further information in this sense.
It is worth mentioning that the likelihood ratio order does not apply, in general, when comparing two priors π 1 and π 2 in Γ w1,w2,π . Each of them is just ordered w.r.t. π w1 and π w2 . However, there are infinitely many priors in the weighted band as it is shown in the following result. The next proposition connects somehow the weighted band with the classical contamination class.
Proof. In order to prove that π ∈ Γ w1,w2,π we will first prove that π w1 ≤ lr π . From Definition 2.2 and expression (1.2) a straightforward computation shows that where the equalities follow from the expression of the mixture density and the inequality from the fact that π 1 and π 2 belong to Γ w1,w2,π . Similarly, it is possible to prove that π ≤ lr π w2 .
Since Definition 2.9 is based on w 1 and w 2 , there are many possible bands just by considering different increasing and decreasing weight functions. It is apparent the choice of those weight functions cannot be arbitrary and it should adjust perfectly to the problem we are studying. Some useful weight functions can be found in Jain and Nanda (1995), Nanda and Jain (1997) and Navarro et al. (2006Navarro et al. ( , 2011. In those papers, the authors study properties related to reliability measures, orderings, characterizations and dependency analysis. Of special interest is the multivariate size biased distribution derived from the weight function which represents sampling procedures where a vector θ = (θ 1 , . . . , θ n ) has a sampling probability proportional to θ i , for i = 1, . . . , n. Another classical case is given by which leads to equilibrium distributions in reliability. Also, other interesting weight functions are given by where, if g i 's are non-negative increasing (decreasing) functions, then w will be also increasing (decreasing). For example, the weight functions are increasing (decreasing) when a i > 1 (a i < 1), for i = 1, . . . , n.
Particular attention about the effect of the weight functions in the dependence structure is discussed in Navarro et al. (2006). For example, it is clear from Lemma 2.4 that if π, l(θ|x) and w are MT P 2 , then the weighted prior and the weighted posterior are also MT P 2 . Additionally, if π has independent components and w is of the form w(θ) = n i=1 g i (θ i ), then the weighted prior has also independent components. Finally, we present two bivariate weight functions that we will use later on. We will use them for mathematical convenience. We will be able to compute the conditional distributions to estimate the characteristics of the weighted priors and posteriors using Markov chain Monte Carlo (MCMC) methods in the real example where it is apparent that, if a, b < 1 and c > 0, then w 1 is decreasing and, if a , b > 1 and c > 0, then w 2 is increasing.
In order to clarify the previous ideas we present the following example.
Example 2.11. Suppose that the specific prior belief π over Θ = R + ×R + is a bivariate random vector having independent exponential marginal distributions Exp(λ i ), i = 1, 2, with joint PDF given by Let consider the weight function w defined in the left-hand side of (2.7) as where a i > 0, i = 1, 2. A straightforward computation shows that the weighted prior π w is a bivariate random variable having independent gamma marginal distributions G(a i , λ i ), i = 1, 2, with joint PDF given by Let us consider w 1 and w 2 defined as in (2.9), taking (a 1 , a 2 ) = (1/4, 1/4) and (a 1 , a 2 ) = (7/4, 7/4), respectively, and also consider the weighted band Γ w1,w2,π . By taking the values of the hyperparameters λ 1 = 2 and λ 2 = 3, the weighted PDFs π w1 (θ) (in blue) and π w2 (θ) (in green) are represented in Figure 1, where we can see how they differ from the baseline prior PDF (a) and from the baseline prior CDF (b) (in yellow). From the inclusion given in (2.6), the CDF of the specific prior lies between the CDFs of the weighted versions.
The next result shows that posterior distributions inherit the likelihood ratio order under MT P 2 likelihood functions in the parameters. That proposition will be the key to study the robustness in practice as we will see later on.
Proposition 2.12. Let π be a specific MT P 2 prior and let Γ w1,w2,π be a weighted band associated with π based on w 1 and w 2 . Given the observed data x, if l(θ | x) is MT P 2 in θ, then for all π ∈ Γ w1,w2,π we obtain that π w1,x ≤ lr π x ≤ lr π w2,x .
Proof. We will first prove that π w1,x ≤ lr π x . From Definition 2.2 and expression (1.2) a straightforward computation shows that where the inequality follows from the fact that l(θ | x) is MT P 2 in θ and π w1 ≤ lr π . Using a similar argument it is shown that π x ≤ lr π w2,x .
Proposition 2.12 says that the posterior distributions of the lower and upper bound of the weighted band are also lower and upper bounds for the family of all posterior distributions in the ≤ lr order sense. It is apparent that the posteriors π w1,x and π w2,x can be also interpreted as two weighted distributions of the specific posterior π x based on w 1 and w 2 , respectively. Remark 2.14. If we consider another model with a MT P 2 likelihood function, denoted by l * (θ | x), such that the ratio l(θ | x)/l * (θ | x) is increasing in θ in the union of their supports, it is apparent just using jointly Lemma 2.4 and 2.5 that the posterior distribution associated with the specific prior π and the new model l * (θ, x), denoted by π * x , satisfies that π w1,x ≤ lr π * x ≤ lr π w2,x . Therefore, in some way the weighted band also provides sensitivity to the model specification. This fact could be the start point for a future research line.
Remark 2.15. For the one-parameter model, i.e., when θ ∈ Θ ⊆ R, we only need a decreasing weight function w 1 and an increasing weight function w 2 to apply Proposition 2.12. As explained before, this procedure provides a band that is different in nature from the distorted band defined in Arias-Nicolás et al. (2016).
At this point one may wonder if assumptions in Proposition 2.12 are very strict. To answer this question we first emphasize that density priors having independent components are MT P 2 and most of the applications in literature are concerned with those priors. Second, as it is argued in Natvig (1995, 1997), we find situations where some particular MT P 2 priors can be more realistic than one assuming independent components. Finally, with respect to the likelihood function we find some well-known models that satisfy the MT P 2 property as we can see in the following two examples and in the final application.
Example 2.16. Let X 1 , . . . , X n be an i.i.d. random sample from the Pareto income represents the unknown parameter and x 0 and α are the mode and shape parameters, respectively. It is well-known that the likelihood function is given by where I [x0,+∞) and x (1) are the indicator function of [x 0 , +∞) and the sample minimum, respectively. From Lemma 2.4,in order This last follows easily by computing ∂ 2 ln(g(α, x 0 ))/∂α∂x 0 = n/x 0 and using Lemma 2.7.
To conclude, Corollary 2.18 will allow us to quantify and interpret the uncertainty induced by the partial knowledge of the prior. Let us suppose we have a functional H that maps the underlying distribution X to a real number of interest. This functional obviously inherits the dependency of the parameter. We will denote by H X (θ) that number. Under the previous assumption we obtain the following result.

Corollary 2.18. Let X be the underlying random variable and let H be a functional of interest such that H X (θ) is non-decreasing in θ.
Given π, a specific MT P 2 prior, and the corresponding weighted band Γ w1,w2,π based on w 1 and w 2 , if l(θ | x) is MT P 2 in θ, then the univariate random variables obtained by mapping the posterior distributions by the functional H X (θ) satisfy (2.10) Proof. From Proposition 2.12 and using (2.4) we obtain that The rest of the proof follows easily from Theorem 6.B.16 in Shaked and Shanthikumar (2007).
Under the assumptions of Corollary 2.18 we can obtain bounds for some characteristic of interest in the Bayesian framework. For example, using the characterization of the stochastic order given in (2.2) and taking φ as the identity function, we obtain that the predictive expectations of H X satisfy (2.11) Also, using expression (1.A.12) in Shaked and Shanthikumar (2007), we obtain that quantiles of H X are also ordered (2.12) It is remarkable to observe the range of quantities of interest which can be computed just looking for the extremal distributions generating the weighted class.

Using metrics to measure uncertainty
In robust Bayesian analysis, probability metrics are used to incorporate uncertainty in the elicitation process by considering an error in the specification, see e.g. Basu and DasGupta (1995). An excellent review of the most common probability metrics and divergences and the relationships among them is given in Gibbs and Su (2002). For one-parameter distributions, Arias-Nicolás et al. (2016) use the Kolmogorov and Kantorovich metrics due to their mathematical tractability when comparing a specific prior and its distorted version. However, in our multiparameter context, we find in the Hellinger distance and the Kullback-Leibler divergence excellent tools to evaluate the degree of uncertainty between multivariate distributions and their weighted versions.
We first recall the definition of the Hellinger metric. Given two multivariate random vectors X and Y over Ω ⊆ R n with PDFs f X and f Y , respectively, the Hellinger metric is defined by which is used to quantify the similarity between two probability distributions. On the other hand, the Kullback-Leibler divergence (KL) (also called relative entropy) is defined by which is a measure of how the PDF of X is different from the corresponding one of Y.
Let π be a specific prior and let w, w 1 and w 2 be different weight functions. From the fact that posteriors of the weighted versions can be also interpreted as weighted distributions of the specific posterior having the same weight function, we easily obtain the distances given in Table 1. Expressions (3.1) and (3.2) have been widely used in the literature in many contexts. For example, in Bayesian experimental inference it is a common goal to maximize the expected KL divergence between the prior and the posterior. In other words, the KL divergence is a measure of the information gained when one revises one's beliefs from the prior probability distribution to the posterior one. The log-score suffers from the shortcoming of focusing on the tails of the distribution by heavily penalizing observations that would have been predicted with low probability. However, the Hellinger metric is a surrogate divergence for the Total-Variation and is more robust than the KL divergence in practice. Further information about the use of these measures in Bayesian inference can be found in Hooker and Vidyshankar (2014) and more recently in Jewson et al. (2018).
Example 3.1. Let us consider the prior and the weighted function defined in Example 2.11. We will compute the Hellinger metric and the KL divergence to measure the uncertainty induced by the weight function. From Table 1 we obtain that where equalities follow jointly from the expectation of the product of two independent random variables, from the expression of the r-th moment of an exponential distribution X ∼ Exp(λ) given by Γ(r + 1)/λ r and, finally, from the well-known fact that the transformation − ln(X) has a Gumbel distribution with expectation γ + ln(λ), where γ is the Euler-Mascheroni constant. It is worthwhile noting that both the Hellinger metric and the KL divergence do not depend on the hyperparameters. Finally, by considering w 1 and w 2 with (a 1 , a 2 ) = (1/4, 1/4) and (a 1 , a 2 ) = (7/4, 7/4), respectively, we obtain that H(π, π w1 ) = 0.432, H(π, π w2 ) = 0.14, KL(π, π w1 ) = 1.71 and KL(π, π w2 ) = 0.697.
Of course, it is not easy in general to find a closed expression for the Hellinger metric and KL divergence. However, just observing Table 1 they can be estimated by using simulation methods as we will see later on. It is remarkable they only depend on the specific prior and its posterior and the corresponding weight functions.
Finally, it is natural to wonder how to choose weight functions and how to elicit their parameters. Obviously, the choice of the weight functions and their parameters depends on the problem at hand and the level of uncertainty about the priors. First, we should make sure to understand how the weight function modifies the underlying prior density. In general, increasing (decreasing) weight functions gives more weight to higher (smaller) values of the variable but that can be done in many ways. Note that any increasing or decreasing function is a potential weight function for our purpose. We would like to emphasize that there are particular weight functions associated with classical multivariate distributions in the literature, see for example Kim (2008) and references therein. To summarize, weight functions can induce asymmetry, heavy tailed distributions, truncated distributions, etc. Finally, in order to choose the parameters we should fix a reasonable distance in terms of the Hellinger metric and/or the KL divergence.

A real example about train door reliability
In this section we will illustrate our method analyzing real data in reliability. Stemming from a consulting project, we consider failure data from 40 underground trains associated with the door opening system. All trains were delivered to an European transportation company between November 1989 and March 1991 and all of them were put in service from 20th March 1990 to 20th July 1992. Failure monitoring ended on 31st December 1998. When a failure took place, both odometer reading and failure date were recorded, along with the code of the failed component. Those data were first analyzed in Pievatolo et al. (2003) without making any distinction among seven different failure modes. They were analyzed a second time in Pievatolo and Ruggeri (2010) where the authors determined that two failure modes (mechanical and electrical) were the most relevant ones. Those components were failing for many different causes, implying that the two point processes describing the failures were actually obtained superposing the processes due to each cause. The regularity of the pattern and a theorem due to Grigelionis (see, e.g., Thompson (1988), p. 69) about the superposition of many processes into one justified the use of a nonhomogeneous Poisson process. For our purpose, we will consider one of those two failure modes, namely the mode associated with electrical opening commands where 530 failures were recorded from 19th September 1991 to 31st December 1998. For further information about data see Pievatolo and Ruggeri (2010). Some related works about Bayesian inference and reliability can be found in Ríos Insua et al. (2019).
Our interest is focused on finding a model to describe the failure history of electrical opening commands and predicting the number of failures in future time intervals in an Bayesian framework.

The model
Based on the interpretation of the model as a complex system, it is argued in Pievatolo and Ruggeri (2010) that the total number of failures that occur in the electrical opening system in the interval (0, t], denoted by N (t), follows a nonhomogeneous Poisson process (NHPP) with a common intensity function, λ(t), and an increasing and invertible mean value function, such that m(∞) = ∞. From the plot of the cumulative number of failures versus failure time, see Figure 11.2 in Pievatolo and Ruggeri (2010), the authors postulate three different forms of the intensity function λ(t), where all of them assume an improvement in the rate of occurrence of failures. For our purpose we will choose one of those models, namely the popular power law process (PLP) with intensity function Our choice is based on the attractive physical explanation of both of its parameters M and β and the fact that the PLP model is the mostly used NHPP in reliability due to its simplicity. In our model, we assume 0 < β < 1, indicating a reliability growth information, excluding the cases of constant (β = 1) and decaying (β > 1) reliability. More details on the statistical analysis of repairable systems and NHPPs can be found in Thompson (1988), Kingman (1993), Rigdon and Basu (2000), Aven and Jensen (2000) and Ríos Insua et al. (2012) and a comprehensive catalogue of intensity functions is given in McCollin (2014).

The likelihood function
Let N (t) be a PLP process with intensity function λ(t|θ) given in (4.1). Suppose that the vector of observed failure times t * = (T 1 , . . . , T n ) recorded in the interval (0, T ], where T is a known value, satisfies T 1 < . . . < T n , then, from Theorem 5.4 in Ríos Insua et al. (2012), the likelihood function is given by ln(T i )).
From Lemma 2.7, just computing the partial derivatives of the logarithm of l(θ|t * ) given by it is apparent that l(θ | t * ) is MT P 2 in θ if, and only if, T ≤ 1. Then, by changing the scale of the time from the original record T i to T i /T , i.e., considering T as the unit, and denoting the new vector of observed failure times by t = (T 1 /T, . . . , T n /T ), we obtain a new likelihood function expressed as It is clear from Lemma 2.4 that the likelihood function l(θ | t) given in (4.3) is MT P 2 in θ.
Remark 4.1. In order to validate our model we will first consider T the 31st December 1997. We will predict the number of failures in 1998 and we compare it with the actual value.

The specific prior and posterior
It is assumed that the specific prior belief π over Θ = R + × R + is a bivariate random vector having independent exponential marginal distributions Exp(λ) and Exp(μ), associated with the parameters M and β, respectively, with joint PDF given by where the hyperparameters λ and μ are assumed to be known; namely they are obtained from the initial estimates M 0 = 495.5 and β 0 = 0.79. Those estimates are based on an expert judgment combining the regression analysis between the empirical cumulative number of failures versus the failure times and the maximum likelihood estimates, MLEs, when using only some old trains which were not included in the final study. The values M 0 and β 0 are very reasonable, denoting the latter a clear, although not excessive, reliability growth, whereas the former expresses the expected number of failures in the unit interval since m(1|θ) = M . From the initial values M 0 and β 0 , we take the values λ = 1/M 0 and μ = 1/β 0 due to the fact that E π [θ] = (1/λ, 1/μ). It is apparent from Lemma 2.4 that π(M, β) is MT P 2 in θ = (M, β).
On the other hand, just considering the likelihood function given by (4.3) and using expression (1.2) to update the state of knowledge, we easily obtain that the posterior distribution is a bivariate random vector having independent gamma marginal distributions G(n + 1, 1 + λ) and G(n + 1, μ − n i=1 ln( Ti T )), associated with the parameters M and β, respectively, with joint PDF given by

The weighted band
Now, we introduce a perturbation scheme on the prior distribution by considering the weighted band Γ w1,w2,π where w 1 and w 2 are defined by expression (2.8). Then, The exact distributions associated with both weighted priors, and having PDFs π w1 (θ) and π w2 (θ), are unknown. However, after a straightforward computation in (4.4), we can easily compute the conditional distributions, as shown in Table 2. We notice that the conditional distributions of the weighted prior π w2 are given by a mixture of gamma distributions.

The posterior weighted distributions
Also the joint posterior distributions do not have closed-form expressions. However, just taking in account the conditional distributions given in Table 2, we can also compute their conditional distributions, as shown in Table 3. We notice again that the conditional distributions of the weighted posterior π w2,t are given by a mixture of gamma distributions. The conditional distributions shown in Table 2 and 3 can be used to estimate the characteristics of the weighted priors and posteriors using Markov chain Monte Carlo (MCMC) methods, using the Gibbs sampling method. In general, w 1 has more effect on π than the corresponding w 2 as a consequence of the positive skewness of the specific prior. Finally, it is apparent from Table 2 and 3 that w 1 and w 2 induce a dependence structure in both weighted priors and weighted posteriors.

Choosing the degree of uncertainty
At this moment, a natural question is how to choose the parameters a, b and c and a , b , c for w 1 and w 2 of the two weight functions, respectively. One possibility is to require that the resulting weighted priors are symmetrically close to the specified prior in terms of Hellinger metric or KL divergence. Due to the complexity of our real problem, we decided to consider a great amount of uncertainty in order to evaluate the effect on the final decision. Having set a goal of symmetric Hellinger distance approximately equal to 0.7, we found, after some trials, that it can be achieved taking a = 0.8, b = 0.4 and c = 0.17 and a = 3.8, b = 3.4 and c = 1.17 as we can see in Table 4. We first notice that the Hellinger metric is a symmetric measure. The maximum uncertainty 1 in the Hellinger metric is achieved when one density assigns probability zero to every set to which the other density assigns a positive probability, and vice versa. With respect to the KL divergence, we recall it is an asymmetric measure. Then, we can only interpret that the amount of information lost when π w1 is used to approximate π is larger than the corresponding information lost when we use π w2 . We notice that all measures in Table 4 have been estimated from a sample of the exact distribution of the specific prior using jointly the Markov chain Monte Carlo (MCMC) method and Table 1. Priors π, π w1 π, π w2 π w1 , π w2 Hellinger metric 0.69916 0.69590 0.99982 KL divergence 66.724 7.552 27.101 From the exact distribution of the specific posterior and using again the MCMC method and Table 1, we obtain in Table 5 the Hellinger metric and the KL divergence for the posterior weighted distributions.
Posteriors π t , π w1,t π t , π w2,t π w1,t , π w,t Hellinger metric 0.72191 0.00713 0.77096 KL divergence 5.1661 0.02869 4.43343 Finally, in order to graphically see the effect of the weighted functions, we represent in Figure 2 (a) and (b) the histograms and the CDFs for the prior distributions π w1 (in blue), π (in yellow) and π w2 (in green), respectively. We also represent in Figure  2 (c) and (d) the corresponding ones for the posterior distributions. As expected, the uncertainty decreases when we incorporate experimental data. At first glance, the contribution to uncertainty of the weighted function w 1 is greater than the corresponding induced by w 2 . This result is coherent with a greater improvement in the failure rate with respect to the expected one by the PLP model, as we will see later on.

Forecasts for the expected number of failures
As we previously mentioned, we have first considered 31st December 1997 as the (ending) calendar day T . We are interested in predicting the expected number of failures in future time intervals of the form [T, T +u]. Such functional, conditional on θ, is easily computed by the mean value function of the PLP model given in expression (4.2) and takes the form where T is represented by the unit in the above expression and h is computed proportionally in the time scale, i.e., h = u/T . A straightforward computation shows that H X (θ) is increasing in θ. Then, recalling that both π and l(θ|t) are MT P 2 in θ and using Corollary 2.18, we obtain that From the above orderings we obtain bounds for some characteristics of interest in the Bayesian framework. First, from expression (2.11), we know the predictive expectations of the number of failures of the weighted posterior distributions are bounds for the corresponding expectation of the specific posterior, i.e., Second, from (2.12) the endpoints of the 95% Bayesian credible quantile-based intervals are ordered, i.e., From the mentioned fact that the posterior weighted distributions are unknown, the bounds in the above inequalities have been estimated from the conditional distributions given in Table 3 using the Gibbs sampling. Table 6 shows the effect of the weighted functions on the prediction of the expected number of failures in 1998, i.e., when T represents the 31st December 1997 and u = 1. We notice that we ran the corresponding MCMC algorithm for 10000 iterations using a burn-in of 1000. The iteration numbers were chosen after experimentation to deliver stable results over multiple runs.  We first note that weighting the prior information causes a change in the final Bayesian prediction. Therefore, the model can not be considered robust from a Bayesian point of view. From another point of view, a similar conclusion is pointed out in Pievatolo and Ruggeri (2010). It is also interesting to observe that the weighted prior π w1 provides a more accurate result than the PLP model and the latter is close to the result predicted by the weighted prior π w2 . A reason for this behavior is that the PLP model overestimates the number of failures in 1998 and does not reflect the "true" failure process closely enough. It is also remarkable that the credibility interval for the decreasing weighted posterior contains the true value which solves somehow the problem detected in Pievatolo and Ruggeri (2010) about the sharp change in slope of the curve of cumulative number of failures in the last years. Finally, we know from Proposition 2.10 that all priors of the form π = (1 − )π w1 + π w2 (obtained as a mixture of π w1 and π w2 ) belong to the class Γ w1,w2,π , for all 0 ≤ ≤ 1. Since Γ w1,w2,π is a convex class of distributions and π is continuous (see Lemma 3.1 in Ruggeri and Sivaganesan (2005)) it follows that any value in any interval of interest obtained for the weighted priors can be expressed as a value from a particular prior for some .
To end up, and using a similar procedure than above, we present in Table 7 forecasts after forecasting horizons of one, two and three years beyond different ending periods, T , which is set on 31st December of the indicated years, denoted by year-i, i = 1, 2 and 3.
T True 95% Post. 95% Post. 95% Post. value credibility mean credibility mean credibility mean Int. π w1,t π w1,t Int. π t π t Int. π w2,t π w2,t 1992-1 83  Just observing all predictions, it is clear that uncertainty decreases when the sample size increases. Also, in only two cases the true value does not belong to any credibility interval, namely when the ending periods are on 31st December 1995 and 31st December 1996 and we estimate the expected number of failures in 1998, i.e., in forecasting periods of three and two years, respectively. As we have previously analyzed in Table 6, the forecasts in 1998 slightly improve when the recording period is on 31st December 1997. Finally, Figure 3 summarizes graphically the information about posterior means contained in Table 7. In general, it is interesting to note that the weighted function w 1 provides more accurate results than the PLP model when this latter overestimates enough the expected number of failures. On the contrary, when the PLP underestimates enough the expected number of failures the weighted function w 2 will lead to better results.

Concluding remarks
In this paper we have provided a methodology to induce uncertainty in the choice of a specific prior by using a particular class of priors called the weighted band. We have followed the typical approach of Bayesian robustness, as outlined in Ríos Insua and Ruggeri (2000) which provides a thorough review of the motivations and the methods which lead to consider classes of priors. In such work classes of likelihoods or sampling models are considered as well, but most of interest of the researchers (and ours, as well) has concentrated on priors. Stemming from the practical impossibility of specifying an exact, unique prior distribution, Bayesian robustness proposes to weaken the request considering a distribution which is deemed as a good approximation to the prior beliefs and then building a neighborhood around it to express the uncertainty about such prior. Our work goes in such direction since it embeds a prior in a class where weight functions, with adequate properties, are used to give more or less weight to part of the parameter space. This theoretical construction corresponds also to practical situations, like the one we have considered in our work about train door failures. Of course many classes of priors can be considered (see Ríos Insua and Ruggeri (2000)) and the choice should be based on the problem at hand and the available expert knowledge (as an example, consider the class based on generalized moments presented in Betrò et al. (1994)). Whereas the choice of a class of priors could be motivated by practical problems where the experts are unable to specify a unique prior distributions, more theoretical studies are interested in specifying probability distributions over prior distributions, like in Bayesian nonparametrics. The idea of combining Bayesian nonparametrics and Bayesian robustness lead to some works, like Ruggeri (2010Ruggeri ( , 2014. Our methodology combines practical motivations and theoretical contributions involving stochastic orders. Although it is inspired by the distorted band introduced by Arias-Nicolás et al. (2016), it provides a different way of inducing uncertainty in the univariate model and can be easily generalized for multiparameter distributions. The proposed prior class is also relatively easy to specify, an inheritance from the selected weight functions used to specify the distorted band. These weight functions also help when interpreting the resulting class of posterior distributions.
We have also shown how the proposed methodology has not only a theoretical interest but it can be also applied to real problems. On purpose, we have chosen a complex problem in reliability which had been already investigated elsewhere with not completely satisfactory results. Here we have shown how a weighted priors on the parameters of the very popular power law process can lead to better forecasting of the expected number of failures.
The models considered in Pievatolo and Ruggeri (2010) pose new challenges, also from a methodological viewpoint, which could be addressed in future works. In the current paper we have considered only failure dates whereas the available data consist also of odometer readings upon failures. The nonhomogeneous Poisson process considered in Pievatolo and Ruggeri (2010) is one of the few examples of bivariate intensity function in reliability. Those authors performed an informal sensitivity analysis with respect to the model considering three different baseline univariate intensity functions for the failure times. Sensitivity with respect to changes in the model has been rarely considered in the Bayesian framework, apart from informal approaches. The reasons are manifold: computational complexity and major focus on the most critical aspect of the Bayesian approach, i.e. the prior choice, are the most relevant ones.
The extension of our approach to a class of models, instead of priors, is very challenging, especially when considering a class of Poisson processes, possibly with bivariate intensity functions. Also the notion of class of Poisson processes should be considered carefully.
The train door failure data suggest another methodological approach: the extension of our approach to hierarchical models. Data are from 40 trains and each train could have its own model with exchangeable parameters, leading to a hierarchical model. Such model has a clear physical interpretation: differences in construction, operation and maintenance make trains more similar rather than equal. In this case the distorted class would be on the common prior for the distinct parameters of each model.