Bayesian modelling of time-varying conditional heteroscedasticity

Conditional heteroscedastic (CH) models are routinely used to analyze financial datasets. The classical models such as ARCH-GARCH with time-invariant coefficients are often inadequate to describe frequent changes over time due to market variability. However we can achieve significantly better insight by considering the time-varying analogues of these models. In this paper, we propose a Bayesian approach to the estimation of such models and develop computationally efficient MCMC algorithm based on Hamiltonian Monte Carlo (HMC) sampling. We also established posterior contraction rates with increasing sample size in terms of the average Hellinger metric. The performance of our method is compared with frequentist estimates and estimates from the time constant analogues. To conclude the paper we obtain time-varying parameter estimates for some popular Forex (currency conversion rate) and stock market datasets.


Introduction
For datasets observed over a long time period, stationarity turns out to be an over-simplified assumption that ignores systematic deviations of parameters from constancy. Thus time-varying parameter models have been studied extensively in the literature of statistics, economics and related fields. For example, the financial datasets, due to external factors such as war, terrorist attacks, economic crisis, political events etc. exhibit deviation from time-constant stationary models.Accounting for such changes is crucial as otherwise time-constant models can lead to incorrect policy implications as pointed out by Bai (1997). Thus functional estimation of unknown parameter curves using time-varying models has become an important research topic recently. In this paper, we analyze popular conditional heteroscedastic models such as AutoRegressive Conditional Heteroscedasticity (ARCH) and Generalized ARCH (GARCH) from a Bayesian perspective in a time-varying set up. Before discussing our new contributions in this paper, we provide a brief overview of some previous works in these areas.
In the regression regime, time-varying models have garnered a lot of recent attention ; see, for instance, Fan and Zhang (1999), Fan and Zhang (2000), Hoover et al. (1998, Lin and Ying (2001), Ramsay and Silverman (2005), Zhang et al. (2002) among others. The models show time-heterogenous relationship between response and predictors. Consider the following two regression models Model I: y i = x T i θ i + e i , Model II: y i = x T i θ 0 + e i , i = 1, . . . , n, where x i ∈ R d (i = 1, . . . , n) are the covariates, T is the transpose, θ 0 and θ i = θ(i/n) are the regression coefficients. Here, θ 0 is a constant parameter and θ : [0, 1] → R d is a smooth function. Estimation of θ(·) has been considered by Hoover et al. (1998), Cai (2007)) and Zhou and Wu (2010) among others. One popular way to decide if there is an evidence to favor time-varying models over the time-constant analogue is to perform hypothesis testing: see, for instance, Zhang and Wu (2012), Zhang and Wu (2015), Chow (1960), Brown et al. (1975), Nabeya and Tanaka (1988), Leybourne and McCabe (1989), Nyblom (1989), Ploberger et al. (1989), Andrews (1993) and Lin and Teräsvirta (1999). Zhou and Wu (2010) discussed obtaining simultaneous confidence bands (SCB) in model I, i.e. with additive errors. However their treatment is heavily based on the closed-form solution and it does not extend to processes defined by a more general recursion. For time-varying AR, MA or ARMA processes, the results from time-varying linear regression can be naturally extended. However, such an extension is not obvious for conditional heteroscedastic (CH hereafter ) models. These are, by the simple definition of the evolution is difficult to estimate even in the time-constant case. But one cannot possibly ignore its usefulness in analyzing and predicting financial datasets. These models (even the simple time-constant ones) have remained primary tools for analyzing and forecasting certain trends for stock market datasets since Engle (1982) introduced the classical ARCH model and Bollerslev (1986) extended it to a more general GARCH model. But with the rapid dynamics of market vulnerability the simple classical time-constant models fail in terms of estimation or prediction due to over-compensating the past data. Several references point out the necessity of extending these classical models to a set-up where the parameters can change across time for example Stȃricȃ and Granger (2005), Engle and Rangel (2005) and Fryzlewicz et al. (2008a). Consider the simple tvARCH(1) model Similar models can be defined for tvGARCH(1,1) as well where σ 2 i has an additional recursive term involving σ 2 i−1 X i = σ i ζ i , ζ i ∼ N (0, 1), σ 2 i = µ 0 (i/n) + a 1 (i/n)X 2 i−1 + b 1 (i/n)σ 2 i−1 .
When the two recursive parameters in a GARCH models sum upto 1, i.e. a 1 + b 1 = 1 it is usually called an integrated GARCH (iGARCH; or bubble garch/explosive garch by some authors) process which by the means of above display can also be extended towards a time-varying analogue i.e. b 1 (·) = 1 − a 1 (·). A wide range of financial datasets exhibit iGARCH phenomena.
In the parlance of time-varying parameter models in the CH setting, numerous works discussed the CUSUM-type procedure, for instance, Kim et al. (2000) for testing change in parameters of GARCH(1,1). Kulperger et al. (2005) studied the high moment partial sum process based on residuals and applied it to residual CUSUM tests in GARCH models. Interested readers can find some more change-point detection results in the context of CH models in James Chu (1995), Chen and Gupta (1997), , Kokoszka et al. (2000) or Andreou and Ghysels (2006).
A time-varying framework and a pointwise curve estimation using M-estimators for locally stationary ARCH models was provided by Dahlhaus and Subba Rao (2006). Since then, while several pointwise approaches were discussed in the tvARMA and tvARCH case (cf. Dahlhaus and Polonik (2009), Dahlhaus and Subba Rao (2006), Fryzlewicz et al. (2008a)), pointwise theoretical results for estimation in tvGARCH processes were discussed in Rohan and Ramanathan (2013) and Rohan (2013) for GARCH(1,1) and GARCH(p,q) models respectively. In a series of recent works Karmakar et al. (2020+); Karmakar (2018) such models were discussed in wide generality. But the focus remained frequentist and the main goals accomplished there was to build simultaneous inference. One strong criticism for the CH type models remained that one needs relatively large sample size (n ∼ 2000) to achieve nominal coverage levels. The recursive definition of the models and a subsequent kernel-based method of estimating makes it difficult to achieve satisfying results for relatively smaller sample sizes. This motivated us to explore a Bayesian way of building and estimating these models and use the posteriors to construct posterior estimates of the coefficient curves θ(·) In this paper, we develop a Bayesian estimation method for time-varying analogues of ARCH, GARCH and iGARCH models. We model the time varying functional parameters using cubic Bsplines. In the literature of general varying coefficient modeling, spline bases are a popular choice for its convenience and flexibility (Hastie and Tibshirani, 1993;Gu and Wahba, 1993;Cai et al., 2000;Biller and Fahrmeir, 2001;Huang et al., 2002;Huang and Shen, 2004;Amorim et al., 2008;Fan and Zhang, 2008;Yue et al., 2014;Franco-Villoria et al., 2019). Specific to the literature of time varying volatility modeling, B-spline based models have also been explored (Engle and Rangel, 2008;Audrino and Bühlmann, 2009;Liu and Yang, 2016).
Our contributions in this paper are twofold. Towards the methodological development, note that the tvARCH, tvGARCH and tviGARCH models require complex shape constraints on the coefficient functions. We achieve those by imposing different hierarchical structures on B-spline coefficients. The imposed structures are designed in such a way to be able to develop efficient sampling algorithm based on gradient based Hamiltonian Monte Carlo (HMC) (Neal et al., 2011). A strong motivation towards implementing such a Bayesian methodology was to circumvent the requirement of a huge sample size which is almost essential for frequentist and kernel-based estimations as pointed out frequently in the literature of ARCH/GARCH models.
From a theoretical point of view, note that there are very sparse literature on obtaining posterior concentration rate for dependent data even with a very simple model. To the best of our knowledge, ours is first such attempt towards these models under Gaussian-link. We establish posterior contraction rate with respect to average Hellinger metric. To the best of our knowledge, this is the first result on posterior contraction rate for these models. The main challenge is to construct exponentially consistent tests for these class of models. Our techniques in construction of such tests rely on recently developed tools from Jeong et al. (2019);Ning et al. (2020). In a nutshell, we first establish posterior contraction rates with respect to average log-affinity and then show that the same rate holds for average Hellinger metric. The frequentist literature on inference about time-varying needs very stringent moment assumption and local stationarity assumptions which are often difficult to verify. Moreover, for econometric datasets existence of even fourth moment is often questionable.
The rest of the paper is organized as follows. Section 2 describes the proposed Bayesian model in detail. Section 3 discusses an efficient computational scheme for the proposed method. We calculate posterior contraction rate in Section 4. In Section 5 we study the performance of our proposed method in the light of. Section 6 deals with real data application of the proposed methods for the three separate models and concludes with a brief interpretation of the results. We wrap the paper up with discussions, concluding remarks and possible future directions in Section 7. The supplementary materials contain theoretical proofs and some additional results.

Modeling
We elaborate on the models and our Bayesian framework for time-varying analogues of three specific cases that are popularly used to analyze econometric datasets.

tvARCH Model
Let {X i } satisfy the following time-varying ARCH(p) model for X i given where the parameter functions µ(·), a i (·) satisfy In a Bayesian regime we put priors on µ(·) and a i (·). To respect the shape-constraints as imposed by P we reformulate the problem. With B j as the B-spline basis functions, let The prior induced by above construction are P-supported. The verification is very straightforward. In above construction, We have P j=1 M j ≤ 1 if and only if δ 0 = −∞, which has probability zero. On the other hand, we also have µ(·) ≥ 0 as we have exp(β j ) ≥ 0. Thus, the induced priors, described above are well supported in P.

tvGARCH Model
Let {X i } satisfy the following time-varying GARCH(p,q) model for X i given Additionally we impose the following constraints on parameter space for the time-varying parameters, Note that, the condition on the AR parameters imposed by (2.6) is a somewhat a popular condition in time-varying AR literature. See Dahlhaus and Subba Rao (2006); Fryzlewicz et al. (2008b); Karmakar et al. (2020+) for example. Different from these references, we additionally do not assume existence of any unobserved local-stationary process that are close to the observed process.
To proceed with Bayesian computation, we again put priors on the unknown functions µ(·), a i (·) and b j (·)'s such that they are supported in P 1 . Again the restrictions imposed by (2.6) are respected. The complete description of prior is Here B j 's are the B-spline basis functions. The parameters δ j 's are unbounded. The verification of support condition 2.6 for the proposed prior is similar.

tviGARCH Model
Although the GARCH(1,1) remains as one of the most popular model to analyze econometric datasets, empirical evidence shows that these datasets regularly rise suspicion to the parameter space restriction i a i + j b j < 1. Note that we used a time-varying analogue of this restriction for the tvGARCH modelling in Section 2.2. This often creates a very stringent condition as the validity of i a i (t) + j b j (t) < 1 is questionable. The special case for a to,e-constant GARCH model where this restriction fails is called an iGARCH model in the literature. We consider the following time-varying analogue of iGARCH.
We impose the following constraints on parameter space for the time-varying parameters, The prior functions that allow us to reformulate the problem keeping it consistent with (2.9) is described below: 3 Posterior computation and Implementation

tvARCH structure
The complete likelihood L of the proposed Bayesian method is given by . We develop efficient Markov Chain Monte Carlo (MCMC) algorithm to sample the parameter β, θ and δ from the above likelihood. The computation of derivatives allow us to develop an efficient gradientbased MCMC algorithm to sample these parameters. We calculate the gradients of negative log-likelihood (− log L) with respect to the parameters β, θ and δ. The gradients are given below, where 1 {j=k} stands for the indicator function which takes the value one when j = k.

tvGARCH / tviGARCH structure
The complete likelihood L 2 of the propose Bayesian method of (2.5) is given by We calculate the gradients of negative log-likelihood (− log L 2 ) with respect to the parameters β, θ, η and δ. The gradients are given below, While fitting tvGARCH(p, q), we assume for any t < 0, X 2 t = 0, σ 2 t = 0. Thus, we need to additionally estimate the parameter σ 2 0 . The derivative of the likelihood concerning σ 2 0 is calculated numerically using the jacobian function from R package pracma. For the tviGARCH, the derivatives are similar so we avoid computing them for the sake of the brevity.
Based on these gradient functions, we develop gradient-based Hamiltonian Monte Carlo (HMC) sampling. Note that, parameter spaces of θ kj 's have bounded support. We circumvent this by mapping any Metropolis candidate falling in outside of the parameter space back to the nearest boundary. HMC has two parameters, required to be specified. These are leap-frog step and stepsize parameter. It is difficult to tune both of them simultaneously. We choose to tune the step size parameter to maintain an acceptance range between 0.6 to 0.8. After every 100 iterations, the steplength is adjusted (increased or reduced) accordingly if it falls outside. Neal et al. (2011) showed that higher leapfrog step is better for estimation accuracy at the expense of greater computation. To maintain a balance between accuracy and computational complexity, we keep it fixed at 30 and obtain good results.

Large-sample properties
We now focus on obtaining posterior contraction rates for our proposed Bayesian models. The posterior consistency is studied in the asymptotic regime of increasing number of time points n. We study the posterior consistency with respect to the average Hellinger distance on the coefficient functions which is where f 1 = n i=1 P κ 1 (X i |X i−1 ) and f 2 denotes the corresponding likelihoods.
κ 0 -probability for every sequence M n → ∞, then the sequence n is called the posterior contraction rate.
All the proofs are postponed to the supplementary materials. The proof is based on the general contraction rate result for independent non-i.i.d. observations (Ghosal and Van der Vaart, 2017) and some results on B-splines based finite random series. The exponentially consistent tests are constructed leveraging on the famous Neyman-Pearson Lemma as in Ning et al. (2020). Thus the first step is to calculate posterior contraction rate with respect to average log-affinity T . We also consider following simplified priors for α j and τ i to get better control over tail probabilities, (4.1)

tvARCH model
Let κ = (µ, a 1 ) stands for the complete set of parameters. For sake of generality of the method, we put a prior on K 1 and K 2 with probability mass function given by, for i = 1, 2. These priors have not been considered while fitting the model as it would require computationally expensive reversible jump MCMC strategy. The contraction rate will depend on the smoothness of true coefficient functions µ and a and the parameters b 13 and b 23 from the prior distributions of K 1 and K 2 . Let κ 0 = (µ 0 , a 01 ) be the truth of κ.

Assumptions(A): There exists constants
Theorem 1. Under assumptions (A.1)-(A.3), let the true functions µ 0 (·) and a 10 (·) be Hölder smooth functions with regularity level ι 1 and ι 2 respectively, then the posterior contraction rate with respect to the distance d 2 1,n is where b ij are specified in (4.2).

tvGARCH model
Let κ = (µ, a 1 , b 1 ) stands for the complete set of parameters. For sake of generality of the method, we put a prior on K 1 , K 2 and K 3 with probability mass function given by, for i = 1, 2. These priors have not been considered while fitting the model as it would require computationally expensive reversible jump MCMC strategy. The contraction rate will depend on the smoothness of true coefficient functions µ and a and the parameters b 13 and b 23 from the prior distributions of K 1 and K 2 . Let κ 0 = (µ 0 , a 01 ) be the truth of κ.

tviGARCH model
Let κ = (µ, a 1 ) stands for the complete set of parameters. For sake of generality of the method, we put a prior on K 1 and K 2 with probability mass function given by, for i = 1, 2. These priors have not been considered while fitting the model as it would require computationally expensive reversible jump MCMC strategy. The contraction rate will depend on the smoothness of true coefficient functions µ and a and the parameters b 13 and b 23 from the prior distributions of K 1 and K 2 . Let κ 0 = (µ 0 , a 01 ) be the truth of κ.

Simulation
We run simulations to study performance of our proposed Bayesian method in capturing the true coefficient functions under different true models. The hyperparameters c 1 and c 2 of the normal prior are all set 100, which makes the prior weakly informative. We consider 4, 5 and 6 equidistant knots for the B-splines when n = 200, 500 and 1000 respectively. We collect 10000 MCMC samples and consider the last 5000 as post burn-in samples for inferences. We shall compare the estimated functions with the true functions in terms of the posterior estimates of functions along with its 95% pointwise credible bands. The credible bands are calculated from the MCMC samples at each point t = 1/T, 2/T, . . . , 1. The AMSE is defined as where theσ 2 i is computed as per the model considered. For example, for a tvGARCH(1,1) model we haveσ 2 i =μ(i/n) +â(i/n)X 2 i−1 +b(i/n)σ 2 i−1 whereμ(·),â(·) andb(·) are the estimated curves from the posterior. Usually we have reported everything for mean of the posterior curves but this can also be done with median mapping.

tvARCH case
We start by considering the following tvARCH(1) model from 2.2. Three different choices for n have been considered, n = 200, 500 and 1000. The true functions are, We compare the estimated functions with the truth for sample size 1000 in Figures 1. Table 1 illustrate the performance of our method with respect to other competing methods.
Note that, estimation of GARCH, due to the additional b i (·) parameter curves is a significantly more challenging problem and often requires much higher sample size to have reasonable estimation. We show by the means of the following pictures in Figure 2 that the estimation looks reasonable even for smaller sample sizes. The AMSE score comparisons are shown in Tables 2. The performance of our method is also contrasted with other competing methods.

tviGARCH case
Finally we consider the tviGARCH(1,1) model (cf. 2.8) a special case of GARCH. Note that due to the constraint a 1 (·) + b 1 (·) = 1 we only consider the mean function and AR(1) function for plotting. For this case, our true functions are as follows The frequentist computation for tviGARCH method is carried out based on a kernel based estimation scheme along the same line as Karmakar et al. (2020+). The estimated plots along with the 95% credible intervals are shown in Figure 3 for three sample sizes n = 200, 500, 1000 and the AMSE scores in Table 3.    To summarize our estimated functions approximate well true functions for all the cases. We also find that the credible bands are getting tighter with increasing sample size. Thus estimates are improving as sample size increases as shown in Figures 1 to 3. AMSEs of our Bayesian estimates are at least better for all the cases as in Tables 1 to 3. For tviGARCH, AMSE* is considered due to the huge and somewhat incomparable values of AMSE due to non-existent variance.

Real data application
Towards applying our methods on real-life datasets we stick to econonmetric data for varying time horizon. Theses datasets show considerable time-variation justifying our models to be suitable for understanding how the parameter functions have evolved. Typically we model the log-return data of the daily closing price of these data to avoid unit-root scenario. The log-return is defined as follows and is close to the relative return where P i is the closing price on the i th day. Conditional heteroscedastic models are popularly used for model building, analysis and forecasting. Here we extend such models to a more sophisticated and general scenario by allowing the cofficient functions to vary. In this section we analyze two datasets: USD to JPY conversion and NASDAQ, a popular US stock market data. We analyze the NASDAQ data through tvGARCH(1,1) and tviGARCH(1,1) models and USDJPY conversion rate data through tvARCH(1) models. We just fit one lags for these models as multiple lag fits are similar and larger lags seem to be insignificant. This result is consistent with the finding in Karmakar et al. (2020+), Fryzlewicz et al. (2008b) etc. Moreover, as Fryzlewicz et al. (2008b) claims, stock indices and Forex rates are more suited to GARCH and ARCH type of models respectively for its superior predictive performance. Each of these datasets were collected upto 31 July 2020. We exhibit our results for last 200,500 and 1000 days which capture last 6 months, around 1.5 years and around 3 years of data respectively. All these datasets were collected from www.investing.com. Note that these datasets are usually available for weekdays barring holidays and typically there are about 260 datapoints every single year.

USDJPY data: tvARCH(1) model
We obtain the following Figure 4 that shows our estimation for fitting a tvARCH(1) model on the USD to JPY conversion data for last 200,500 and 1000 days ending at 31 July, 2020. The AMSE are also computed and contrasted with other competing methods in Table 4. Figure 4 depicts the estimated functions with 95% credible bands for different sample sizes. One can see the bands become much shorter for larger sample sizes. The mean coefficient function µ(·) is generally timevarying for all three cases as one cannot fit a horizontal line through the 95% posterior bands. There seems to be a rise in the mean value around 100 days ago from July 31, 2020 which is right around the time the COVID-19 pandemic hit the world. With the analysis of n = 1000 days, we see that the volatility is quite high around October, 2016 which coincides with presidential election time of 2016. The AR(1) coefficient does not show huge time-varying property. We also tabulate the AMSE for the three sample sizes in Table 4 and one can see for smaller sample size such as n = 200, the proposed Bayesian tvARCH achieves significantly lower score but when the sample size grows then the performance becomes similar.  6.2 NASDAQ data: tvGARCH(1,1) model As hs become standard we use time-varying version of smaller order GARCH. We obtain the following Figure 5 for fitting a tvGARCH(1,1) model on the NASDAQ data for the last 200,500, 1000 days ending at 31 July, 2020. One can see the a 1 (·) values are generally low and the b 1 (·) values are higher which is consistent with how these outcomes turn out for time-constant estimates for econometric datasets. One can also see the role sample size plays in curating these time-varying estimates. For n = 200, the b 1 (·) achieves high value of 0.6 around mid-March 2020 but for higher sample sizes it shows values as high as 0.8. One can also note the striking similarity for the analysis of last 500 and 1000 days which is fairly consistent with the idea that estimation is more stable for such CH type models with higher sample size. Nonetheless the estimates for n = 200 seem quite smooth as well which can be seen as a benefit of our methodology. Table 5 provides comparison of AMSE scores across the three methods for three sample sizes. The Bayesian tvGARCH(1,1) performs relatively better than other methods and estimated curves have smaller credible bands with growing sample size. The behaviour of the mean function also shows higher volatility around the pandemic. 6.3 NASDAQ data: tviGARCH(1,1) model In Figure 5 the sum of estimated coefficient functions a(·) + b(·) is close to 1 for a significant time-horizon. This motivates us to also fit tviGARCH(1,1) to analyze the same NASDAQ data. The estimated functions are presented in Figure 6 for the last n = 200, 500 and 1000 days. Table  6 compares the AMSE scores for the same three methods as before with varying sample sizes. The estimated mean and AR(1) functions of Figure 6 change a little from the estimated functions of tvGARCH(1,1) fit in Figure 5. Moreover, the effect of the three sample size is clear here with n = 1000 showing very precise bands and can reveal an interesting time-varying pattern.
In terms of AMSE, one can see in Table 6 that the frequentist methods did worse than even the time-constant versions. The time-constant estimates were computed using the rugarch package in R. The Bayesian tviGARCH method provides significantly better AMSE uniform over all sample Here also the mean function also shows higher volatility around the time when pandemic struck us. Volatility due to the presidential election in 2016 can also be observed here.

Discussion and Conclusion
In this paper we consider a Bayesian estimation framework for time-varying conditional heteroscedastic models. Our prior specifications are amenable to Hamiltonian Monte Carlo for efficient computation. One of the key motivation towards going to a Bayesian regime was to achieve reasonable estimation even for small sample size. Our simulation coverage show good performance for all three models tvARCH, tvGARCH, tviGARCH for both small and large sample sizes. Importantly, in all three of the cases we were able to establish posterior contraction rates. These calculations are, to the best of our knowledge, the first such work in even simple dependent models let alone the complicated recursions that these conditional heteroscedastic models demand. Moreover, the assumptions on the true functions and the number of moments needed were very minimal. An interesting future theoretical work would be to calculate posterior contraction rate with respect to empirical 2 -distance which is a more desirable metric for function estimation. While analyzing real data we see that the parameter curves vary significantly for the intercept terms but not that much for AR or CH parameters. The associated codes to fit the three models will be made available at https://github.com/royarkaprava. As a future work, it will be interesting to explore multivariate time-varying volatility modeling (Tse and Tsui, 2002;Kwan et al., 2005) through a Bayesian framework similar to ours. Another interesting time-heterogeneity that we plan to explore through the glass of a Bayesian framework is regime-switching CH models where instead of the smooth time-varying functions the parameters change abruptly. Finally note that, our focus on this paper was not about future forecasting. Forecasting for time-varying model is extremely tricky since it requires 'estimation' of the future parameter values. Although in-filled asymptotics can help in this regard, still the literature so far is very sparse in this direction for both Bayesian and frequentist regimes.

Proof of Theorems
We study the frequentist property of the posterior distribution in increasing n regime assuming that the observations are coming from a true density f 0 characterized by the parameter κ 0 . We follow the general theory of Ghosal et al. (2000) to study posterior contraction rate for our problem. In Bayesian framework, the density f is itself a random measure and has distribution Π which is the prior distribution induced by the assumed prior distribution on κ. The posterior distribution of a neighborhood U n = {f : d(f, f 0 ) < n } around f 0 given the observation X (n) = {X 0 , X 1 , . . . , X n } is f (X (n) )dΠ(κ)

General proof strategy
The posterior consistency would hold if above posterior probability almost surely goes to zero in F (n) κ 0 probability as n goes to zero, where F (n) κ 0 is the true distribution of X (n) . Recall the definition of posterior contraction rate; for a sequence n if Π n (d(f, f 0 )|X (n) ≥ M n n |X (n) ) → 0 in F (n) κ 0probability for every sequence M n → ∞, then the sequence n is called the posterior contraction rate. If the assertion is true for a constant M n = M , then the corresponding contraction rate becomes slightly stronger.
Note that for two densities f 0 , f characterized by κ 0 and κ respectively, the Kullback-Leibler divergences are given by .
Assume that there exists a sieve in parameter space such that Π(W c n ) ≤ exp(−(C n + 2)n 2 n ) and we have tests χ n such that f 0 (X n ) dΠ(κ) ≥ Π n ( 1 n KL(κ 0 , κ) < n ) exp(−C n n 2 n )}. We can bound the posterior probability from above by, Taking expectation with respect to κ 0 , first term go to zero by construction of χ n . The second term E κ 0 1{S c n } goes to zero due to Lemma 8.21 of Ghosal and Van der Vaart (2017) for any sequence C n → ∞. We would require that Π n { 1 n KL(κ 0 , κ) < n } ≥ exp(−n 2 n ). Then for the third term, Thus we adhere to the following plan Plan 4. The proof had three major parts as follows: (i) (Prior mass Condition) We need Π n { 1 n KL(κ 0 , κ) < n } ≥ exp(−n 2 n ), (ii) (Sieve) Construct the sieve W n such that Π(W c n ) ≤ exp(−(C n + 2)n 2 n ) and (iii) (Test construction) Construct exponentially consistent tests χ n .
We first study the contraction properties with respect to d 2 (f, f 0 ) = r 2 n (f, f 0 ) = − 1 n log √ f f 0 and then show that the same rate holds for average Hellinger 1 n d 2 H (f, f 0 ). Note that L n can be taken as L n = M 2 n .

Proof of Theorem 1
For the sake of technical convenience we show our proof by fixing lag order p at p = 1, however the results are easily generalizable for any fixed order p. All the proofs go through for higher lags with the same technical tools.

KL Support
The likelihood based on the parameter space κ is given, P κ (X 0 ) n i=1 P κ (X i |X i−1 ). Let Q κ,t (X i ) be the distribution of X i with parameter space κ. We have where σ 2 i = µ(i/n) + a 1 (i/n)X 2 i−1 and σ 2 0i = µ 0 (i/n) + a 01 (i/n)X 2 i−1 . Then Our first goal is to estimate these two quantities in terms of the distances between µ(·), µ 0 (·) and a 1 (·), a 01 (·). In the light of E κ 0 (I + II) = E κ 0 (I) + E κ 0 (II), we bound the expectation of I and II separately.
Bounding the first term I: Note that, by the means of mean value theorem, for a random variable σ 2 * t between σ 2 i and σ 2 0i , where σ 2 * i > ρ for the first term and σ 2 * i > ρX 2 i−1 is used for the second term due to the assumption (A.2). Thus the first term I satisfies in the light of assumption (A.3). Bounding the second term II: For the second term we proceed as follows: where we use the fact that µ(i/n) > ρ and a(i/n) > ρ due to closeness of (µ, a) and (µ 0 , a 0 ). Consequently, we have the deterministic inequality Taking expectation under truth we arrive at similar upper bound for E κ 0 (II) as E κ 0 (I) since Thus we conclude that,

Posterior contraction in terms of average negative log-affinity
In this section, we focus on the requirements to calculate posterior contraction rate as in Section 4.We first show posterior consistency in terms of average negative log-affinity which is defined as r 2 n (f 1 , f 2 ) = − 1 n log f between f 1 and f 2 . Here, we have f 1 = n i=1 P κ 1 (X i |X i−1 ). Then we show that, having r 2 n (f 1 , f 0 ) 2 n implies that our distance metric d 2 2,n Proceeding with the rest of the proof of Theorem 1, we use the results of B-Splines, The Hölder smooth functions with regularity ι can be approximately uniformly up to order K −ι with K many B-splines. Thus we have n max{K −ι 1 1n , K −ι 2 2n }. We start by providing a lower bound of the prior probability as required by (i) in Plan 4. We also have the result (8.6) and the prior probabilities Π( α−α 0 ∞ n , γ −γ 0 ∞ n ) K 1n +K 2n n based on the discussion of A2 from Shen and Ghosal (2015). the rate of contraction cannot be better than the parametric rate n −1/2 , and so log(1/ n ) log T . Thus (i) requires that in terms of pre-rate¯ n as (K 1n + K 2n ) log n n¯ 2 n . Now we construct a sequence of test functions having exponentially decaying Type I and Type II error. In our problem, we consider following sieve where A n , B n are at least polynomial in n and σ 2 0 is the standard deviation of X 2 0 and K n = max{K 1n , K 2n }. We take ρ n = O(n −a ) with a < 1, A n = O(n a 1 ), B n = O(n a 2 ) with a 2 > a 1 for technical need.
We need to choose these bounds carefully so that we have Π(W c n ) ≤ exp(−(1 + C 1 )n 2 n ), which depend on tail properties of the prior. We have, where α K 1n is the vector of full set of coefficients of length K 1n and γ K 2n is the vector of coefficients of length K 2n . The quantity Π[α K 1n / ∈ [ρ n ), A n ] K 1n can be further upper bounded by K 1n Π(α 1 / ∈ [ρ n , A n )]) ≤ K 1n exp{−R 1 n a 3 }, for some constant R 1 , a 3 > 0 which can be verified from the discussion of the assumption A.2 of Shen and Ghosal (2015) for our choice of prior which exponential. On the other hand, for some constant R 2 , a 4 > 0 which can be verified from the proof of Roy et al. (2018). Hence, Π(W c n ) F 1 (K 1n ) + F 2 (K 2n ) + (K 1n + K 2n ) exp{−Rn a 5 }. The two functions F 1 and F 2 in the last expression stand for the tail probabilities of the prior of K 1 and K 2 . We can calculate their asymptotic order as, Hence, we calculate pre-rate from the following equation for some sequence H n → ∞, K 1n (log n) b 13 + K 2n (log n) b 23 H n n¯ 2 n , log(K 1n + K 2n ) + H n n¯ 2 n n a 5 . (8.8) Now, we construct test χ n such that n /2 sup κ∈Wn:r 2 n (κ,κ 0 )>Ln 2 n E κ (1 − χ n ) e −Lnn 2 n for some L n > C n +2. To construct the test, we first construct the for point alternative H 0 : κ = κ 0 vs H 1 : κ = κ 1 . The most powerful test for such problem is Neyman-Pearson test φ 1n = 1{f 1 /f 0 ≥ 1}. For r 2 n > L n 2 n , we have It is natural to have a neighborhood around κ 1 such the Type II error remains exponentially small for all the alternatives in that neighborhood under the test function φ 1n . By Cauchy-Schwarz inequality, we can write that In the above expression, the first factor already exponentially decaying. The second factor can be allowed to grow at most of order e cn 2 n for some positive small constant c. We, in fact establish that the second factor is bounded by our choice of the sieve through the following claim.

Posterior contraction in terms of average Hellinger
Write Reyni divergence as Combining the bounds for I and II we arrive at (8.15) In the next part we modify the construction the sieve to suit the tvGARCH structure.
Proof. Within the sieve again we use a variant of above inequality.
where Q n = 1−An/Bn ρn > 1. The RHS is increasing in i and thus only need to find a bound for i = n. If A n , B n and ρ n are chosen in such a way that Q n n 1/n . Based on that r 1 , r 2 , r 3 and r 4 can be chosen. We also choose A n = B n (1 − n 1/n ρ n ). Note that, with how we choose ρ n = n −a above, n 1/n ρ n < 1 for n > 1/a which means for large n, such choices of A n , B n are valid. Finally we make the choices for radii as r i ρ n /n 2 for 1 ≤ i ≤ 4.

Proof of Theorem 3
We prove Theorem 3 along very similar lines as outlined for Theorem 2. The only difference pops up in the KL difference step where existence of second moment is necessary but unfortunately for the tviGARCH case, the variance is infinite. Thus we handle the KL bound in a different way.