Objective Bayesian Analysis for the Student- t Linear Regression

. In this paper, objective Bayesian analysis for the Student- t linear regression model with unknown degrees of freedom is studied. The reference priors under all the possible group orderings for the parameters in the model are derived. The posterior propriety under each reference prior is validated by considering a larger class of priors. Simulation studies are carried out to investigate the frequentist properties of Bayesian estimators based on the reference priors. Finally, the Bayesian approach is applied to two real data sets.


Introduction
In traditional linear regression models, error terms are commonly assumed to follow a normal distribution. However, when the data have thicker tails than the normal distribution, the Student-t distribution represents an attractive alternative to model this behavior. Moreover, the Student-t regression model can significantly reduce the influence of outliers, leading to a more robust analysis. See, for example, West (1984) and Lange et al. (1989). In particular, the degree of freedom of the t distribution, say ν, determines the degree of robustness of analysis. Specifically, the smaller the number of ν is, the more robust the analysis tends to be. Thus, the problem of estimating the parameter ν has attracted much attention in the literature. There are some frequentist approaches to deal with this problem, see Zellner (1976), Singh (1988), Liu and Rubin (1995), Wang and Ip (2003), among others.
In this paper, we are concerned with Bayesian inferences, in which case the choice of the prior for the parameter ν becomes very challenging. When the number of degrees of freedom is considered as discrete, Jacquier et al. (2004) proposed a truncated uniform π(ν) = λ exp{−λν}, ν > 0.
The hyperparameter λ was suggested to be chosen based on the prior information about the problem at hand. However, estimating the number of degrees of freedom in this case depends strongly on the value of λ. In order to make the prior more objective, Fonseca et al. (2008) introduced two objective priors based on formal rules: the independence Jeffreys prior , ν > 0, and the Jeffreys-rule prior π J (ν) ∝ π IJ (ν) ν + 1 ν + 3 where ψ(a) = d da {log Γ(a)} and ψ (a) = d da {ψ(a)} are the digamma and trigamma functions, respectively. It is shown in Fonseca et al. (2008) that the posterior under the prior π IJ (ν) is proper, while it is corrected in Vallejos and Steel (2013) that the posterior under the prior π J (ν) is actually improper.
In addition to the Jeffreys priors, another kind of important objective priors are the reference priors proposed in Bernardo (1979) and Berger and Bernardo (1992). A formal definition for one block of parameters is given in Berger et al. (2009). Following this, Wang and Yang (2016) showed that there are only two type of reference priors deriving from all six one-at-a-time reference priors of (β, σ, ν) for the linear model with Studentt errors with unknown degrees of freedom ν. They came to the conclusion that the posteriors under the reference priors are all improper. In this paper, we systematically investigate the reference priors for all the possible group orderings of the parameters. By considering a large class of priors, we obtain the posterior propriety for each reference prior. As a byproduct, we draw a different conclusion with respect to the posterior propriety for the priors in Wang and Yang (2016). In addition, we compare all reference priors from frequentist coverage and mean squared error loss.
The rest of the paper is organized as follows. In Section 2, the reference priors under different group orderings for the parameters in the Student-t linear regression model are derived. In Section 3, the posterior propriety for each reference prior is validated by considering a larger class of priors. In Section 4, the frequentist properties of Bayesian estimators of ν based on the objective priors are presented. In Section 5, the proposed Bayesian approach is applied to two real data sets. Some concluding remarks are given in Section 6.

The model and priors
Consider the following linear regression model, where y = (y 1 , · · · , y n ) is the n × 1 vector of response variables, X = [x 1 , · · · , x n ] is the n × p matrix of explanatory variables with full column rank, = ( 1 , · · · , n ) is the error vector and i 's are independent and identically distributed according to the Student-t distribution with location zero, scale parameter σ and degrees of freedom ν.
Recently, Wang and Yang (2016) derived all one-at-a-time reference priors for six group orderings of (β, σ, ν). Interestingly, there are only two different one-at-a-time reference priors. For the orderings of {β, σ, ν}, {σ, β, ν} and {σ, ν, β}, their reference priors are identical and of the form, While for the orderings of {β, ν, σ}, {ν, β, σ} and {ν, σ, β}, their reference priors are all the same, given by In addition to the one-at-a-time group orderings, it is possible to consider other orderings for the parameters (β, σ, ν). As Ghosh and Mukerjee (1992) stated, changing the group ordering of the parameters may yield priors with different properties. We summarize the reference priors for other orderings in the following theorem, whose proofs are straightforward and thus omitted here.
(c) The reference prior for the ordering {(β, ν), σ} is of the form (d) The reference prior for the ordering {ν, (β, σ)} is given by (e) The reference prior for the ordering {(β, σ), ν} is Clearly, all of the reference priors aforementioned can be written in a unified form for some constant α > 0 and some function of π(ν).

Propriety of posterior distributions
In this section, we investigate the important problem whether the posterior distributions are proper under the reference priors obtained in Section 2. To facilitate the proofs, we consider priors for the parameters (β, σ 2 , ν) instead of (β, σ, ν). The prior for (β, where a > 0 is constant, and π(ν) is the 'marginal' prior of ν.
The following theorem gives a detailed characterization for the order of the marginal posterior of ν, which plays an important role in obtaining the posterior propriety.
Lemma 3.1. For the Student-t linear model (2.1) and the prior in (3.1), assume that n ≥ p + 1 and 0 < a ≤ 1. Let f (· | ν) denote the density function of the Gamma distribution Ga(ν/2, ν/2). The marginal likelihood of ν is In order to prove Lemma 3.1, we first present a lemma, which can be found in Fernández and Steel (1999).
Proof of Lemma 3.1. Similar to Fernández and Steel (1999), the lower and upper bounds of f Y (y) are both proportional to and λ 1 , . . . , λ n are independent and identically distributed as the Gamma distribution Ga(ν/2, ν/2), whose density function is f (· | ν).
It can be shown that, up to a set of measure zero, First, we derive an upper bound for the integral in (3.4). Using the inequality of (3.2), when integrating with respect to λ (1) , we have (2) .
From this, when integrating with respect to λ (2) , the upper bound is proportional to Inductively, after integrating with respect to λ (3) , . . . , λ (n−p−1) , the upper bound is proportional to . Thus, when integrating with respect to λ (n−p) , the upper bound is proportional to Note that n ≥ p + 1 and 0 < a ≤ 1, hence (n−p)ν 2 + 1 − a > 0. It follows that the upper bound for the integral in (3.5) is Inductively, after integrating with respect to λ (n−p+1) , . . . , λ (n−1) , the upper bound is proportional to Thus, when integrating with respect to λ (n) on (0, ∞) in the end, the upper bound is proportional to Therefore, the upper bound of g(ν) is proportional to (3.6) multiplied by the factor Second, we find an lower bound for the integral in (3.4). By the inequality on the left side of (3.2), when integrating with respect to λ (1) , there is Consequently, when integrating with respect to λ (2) , the lower bound is proportional to The subsequent derivations are similar to that for the upper bound g u (ν), then we can obtain a lower bound for g(ν), which is If a = 1, then Till now, we have obtained the order of g(ν) as ν → 0.
By Lemma 3.1, we can obtain the following result.
In order to study the propriety of the posterior under the priors π R5 and π R6 , we give the following theorem, whose proof is analogous to Theorem 1 in Vallejos and Steel (2013). According to Theorem 3.2, we have the following result. (2.1), assume that n ≥ p + 1, then under the reference priors π R5 and π R6 , the posterior distributions are both improper.

Frequentist properties
We now investigate the frequentist performance of Bayesian estimators of ν based on those reference priors in Section 3, as well as the discrete prior (say, π V W ) proposed in Villa and Walker (2014). Two simulation studies are carried out here.

Independent and identically distributed Student-t sample
In the first simulation study, as in Villa and Walker (2014), we draw an independent and identically distributed sample from a standard Student-t distribution with location parameter μ = 0 and scale parameter σ = 1. Under this case, the available reference priors are only π R1 , π R2 and π IJ . Two different sample sizes are considered, which are n = 30 and n = 100. The simulations are performed for ν = 1, 2, . . . , 25, where the number of replications is taken as 5000. The frequentist mean squared error and the frequentist coverage probability of 95% credible interval are included to compare the estimators of ν based on different priors.
The square root of the relative mean squared error of ν, MSE(ν)/ν, for the posterior medians from the four priors when n = 30 and n = 100 are shown in Figures 1 (a)  and 1 (b), respectively. For the case of n = 30, when the true value of ν is not greater than 13, the performance of π R1 , π R2 and π IJ outperforms that of π V W in an obvious way. In addition, the prior π R2 behaves slightly better than the others at this time. While as the value of ν gets large, the relative mean squared errors based on the prior π V W become smallest. For the case of n = 100, the situation is analogous. Figure 2 shows the frequentist coverage probabilities of 95% credible intervals for ν. In terms of coverage, the priors π R1 , π R2 and π IJ perform relatively robust, the coverage probabilities are close to the nominal level 0.95 whether n = 30 or n = 100. While for Figure 1: Square root of relative mean squared error of estimators of ν based on the independent Jeffreys prior π IJ (solid), the reference prior π R1 (dotted), the reference prior π R2 (dashed), and Villa and Walker's prior π V W (dash-dotted). Panel (a) is for n = 30, panel (b) is for n = 100. Figure 2: The frequentist coverage of the 95% credible intervals for ν based on the independent Jeffreys prior π IJ (solid), the reference prior π R1 (dotted), the reference prior π R2 (dashed) and the Villa and Walker's prior π V W (dash-dotted). Panel (a) is for n = 30, panel (b) is for n = 100. the prior π V W , there is a drop for small values of ν. In addition, as ν gets larger, the coverage probabilities under π V W are undesirably approaching 1.
Also, it is assumed that x ij 's are independent and identically distributed as standard normal.
We employ the Metropolis-Hasting (M-H) within Gibbs sampling to generate samples of posterior distributions. Note that t distribution can be considered as a scale mixture of normals, the full conditional distributions for Gibbs sampling are given by While for the conditional distribution (4.4), the M-H algorithm is used to generate random numbers. Following the idea of Kang et al. (2018), the proposal distribution is taken as the truncated normal distribution N (0<ν<300) (μ ν , τ 2 ν ), where μ ν = x − q (x)/q (x), τ 2 ν = −1/q (x), and q(v) = log(π(ν | λ 1 , . . . , λ n )). Specifically, the sampling algorithm is as follows: • Generate a candidate parameter value ν cand from N (0<ν<300 and p(·) is the density function of N (0<ν<300) (μ ν , τ 2 ν ). That is to say, the candidate value ν cand is accepted with probability α.
In the simulation, the size for Gibbs sampling is 5000 after 1000 burn-in samples. The square root of the relative mean squared errors of ν for the posterior medians based on the priors π IJ , π R1 , π R2 , π R3 and π V W are shown in Figure 3. The simulation results Figure 3: Square root of relative mean squared error of estimators of ν under Studentt regression based on the independent Jeffreys prior π IJ (solid), the reference priors π R1 (dotted), π R2 (dashed), π R3 (long-dashed), and the Villa and Walker's prior π V W (dash-dotted). Panel (a) is for n = 30, panel (b) is for n = 100.
for the reference prior π R4 are similar to π R3 , which are not reported here. For n = 30 or n = 100, generally speaking, the performance of π IJ , π R1 , π R2 and π R3 is better than π V W when the true values of ν are not greater than 11. Furthermore, the prior π R2 behaves best under this case. While for large values of ν, the behavior of π V W shows its superiority. (b) show the frequentist coverage probabilities of 95% credible intervals for ν based on the five priors for n = 30 and n = 100, respectively. It can be seen that the performance of the priors π IJ , π R1 , π R2 and π R3 is similar to each other, and the corresponding coverage probabilities are relatively close to the nominal level. While the behavior of π V W in this circumstance is analogous to that in the first simulation study. That is, there is a drop for small values of ν, and the coverage probabilities are undesirably close to 1 for large values of ν.

Figures 4 (a) and
Based on the two simulation studies, in the light of the relative mean squared error, it can be seen that the prior π R2 behaves best for small values of ν, while the prior π V W outperforms the others for large values of ν. Therefore, it is suggested to use the reference priors if there is evidence that the model is a Student-t.

Real data analysis
To show how the objective priors discussed above work in practice, we consider two data sets from financial markets. The first data set relates to the U.S. Treasury bond prices.

Application to the US treasury bond prices
The first data set is from Siegel (1977), which has been previously analyzed by Sheather (2009) and Yang and Yuan (2017). As in those previous analyses, we take the bid price as the dependent variable and the coupon rate as the regressor variable. It is shown in Yang and Yuan (2017) that the normal model does not fit the data well and some heavy-tailed distributions may be appropriate. Yang and Yuan (2017) then analysed the data by using scale mixtures of normal regression models including the Student-t regression model. However, it should be pointed out that the degrees of freedom in their analysis are assumed to be known as 1. Now, if we set ν = 1 in the Student-t regression model (2.1), then it can be seen that the independent Jeffreys prior and the corresponding reference priors in π R1 , π R2 , π R3 and π R4 for (β, σ) are the same, which is Using the prior (5.1), we can obtain that a Bayesian estimator (i.e., the posterior median) of σ 2 is 0.2650. , π IJ and π V W for the real data in Section 5.1.
Next, we analyze the data by using the model (2.1) with unknown degrees of freedom. Table 1 displays posterior summaries based on the reference priors π R2 and π IJ as well as the prior π V W . It can be seen that the priors π R2 and π IJ lead to similar results for the four parameters, and the posterior medians of ν, 0.7940 and 0.8254, are close to 1 assumed in Yang and Yuan (2017). In addition, the Bayesian estimators of σ 2 under π R2 and π IJ are 0.2049 and 0.2124, respectively. As can be expected, the estimators of σ 2 are smaller than that under the case where the degrees of freedom are specified as 1. While for the discrete prior π V W , the Bayesian estimator of ν is 1, and the corresponding 95% credible interval is only a single point 1. The estimator of σ 2 under π V W is 0.2785, which is the biggest among the three priors.

Application to the Brazilian IBOVESPA
To further compare the performance of the objective priors on real data when the "true" degrees of freedom may be slightly larger, in this subsection, we consider the daily closing prices of the IBOVESPA in Brazilian stock market. Specifically, the data set contains 100 observations from April 20 to September 13, 2001, which is part of a wider sample used in Abanto-Vallea et al. (2012). Throughout, we work with the daily returns as a percentage, y d = 100 × (log(P d ) − log(P d−1 )), where P d is the closing price on day d. For the transformed data, a direct calculation yields that the kurtosis is much larger than 3. Thus, we use the following t-regression model to fit the data: where ε d 's are independent and identically distributed according to the t distribution with location zero, scale parameter σ and degrees of freedom ν.
The results based on the priors π R2 and π IJ , as well as the prior π V W , are listed in Table 2. It can be seen from Table 2 that the Bayesian estimators of the parameters μ, ν and σ 2 are close to each other for the priors π R2 and π IJ , which are somewhat different from that based on the prior π V W . Further, the credible intervals based on π R2 are relatively shorter compared with the other priors, especially for π V W .

Concluding remarks
Following the seminal work of Fonseca et al. (2008), in this paper, we have derived all the reference priors for the Student-t linear regression model with unknown degrees of freedom. And the posterior propriety under a larger class of priors is then validated. The frequentist property of Bayesian estimators based on the reference priors is investigated by simulation study. The findings of this paper indicate that it is feasible to use the reference analysis for estimating the unknown degrees of freedom ν from both theoretical and practical viewpoints.
To conclude, it should be pointed out that the posterior propriety in this paper is based on the condition that n ≥ p + 1. Nowadays, there is a necessity to be able to deal with large amount of data (in terms of covariates). For the high-dimensional cases, there is some work along this topic in the literature. For example, Clarke and Ghosal (2010) considered the posterior properties based on reference priors for exponential families with increasing dimension. The problem for high-dimensional Student-t regression deserves a deeper study, which will be left for our future endeavor.