Time-varying auto-regressive models for count time-series

Count-valued time series data are routinely collected in many application areas. We are particularly motivated to study the count time series of daily new cases, arising from COVID-19 spread. First, we propose a Bayesian framework to study time-varying semiparametric AR(p) model for count and then extend it to propose a time-varying INGARCH model considering the rapid changes in the spread. We calculate posterior contraction rates of the proposed Bayesian methods with respect to average Hellinger metric. Our proposed structures of the models are amenable to Hamiltonian Monte Carlo (HMC) sampling for efficient computation. We substantiate our methods by simulations that show superiority compared to some of the close existing methods. Finally we analyze the daily time series data of newly confirmed cases to study its spread through different government interventions.


Introduction
Modeling count time series is important in many applications such as disease incidence, accident rates, integer financial datasets such as price movement, etc. This relatively new research stream was introduced in Zeger (1988) and interestingly he analyzed another outbreak namely the US 1970 Polio incidence rate. This stream was furthered by Chan and Ledolter (1995) where Poisson generalized linear models (GLM) with an autoregressive latent process in the mean are discussed.
A wide range of dependence was explored in Davis et al. (2003) for simple autoregressive (AR) structure and external covariates. On the other hand, a different stream explored integer-valued time series counts such as ARMA structures as in (Brandt and Williams, 2001;Biswas and Song, 2009) or INGARCH structure as done in Zhu (2011Zhu ( , 2012c. However, from a Bayesian perspective, the only work to the best of our knowledge is that of Silveira de Andrade et al. (2015) where the authors discussed an ARMA model for different count series parameters. However, their treatment of ignoring zero-valued data or putting the MA structure by demeaned Poisson random variable remains questionable. None of these works focused on the time-varying nature of the coefficients except for a brief mention in .
Our goals are motivated by both the application and methodological development. To the best of our knowledge, ours is the first attempt to model possibly autoregressive count time series with time-varying coefficients which can be regarded as the time-varying analog of Fokianos et al. (2009). We tend to avoid the usual GLM route having exponential link such as Fokianos and Tjøstheim (2011) since it looses the interpretability of the functions. On the contrary, we impose shape restrictions on the coefficient functions. The mean function stands for the overall spread and the autoregressive coefficients stand for the effect of different lags. We are particularly motivated to study the spread of COVID-19 in New York City (NYC) from 23rd January to 14th July using the daily count data of new cases. In terms of our motivating data application, we wish to identify which lags are significant in our model which can be directly linked to the period of time symptoms did not show up. We find that some higher-order lags like 6, 7, and 8 are also significant. These findings are in-line with several research articles discussing the incubation length for the novel coronavirus with a median of 6-7 days and 98% below 11 days. For example, see Lauer et al. (2020b). We also find that after the lockdown or stay-at-home orders it takes about 12-16 days to reach the peak and then the intercept coefficient function starts decreasing. This is also an interesting find which characterizes the fact that the number of infected but asymptomatic cases is large compared to the new cases reported. Additional to the time-varying AR model proposal, we also offer an analysis via a TVBINGARCH model that assumes an additional recursive term in the conditional expectation (cf. (2.7)). This extension offers some more comprehensiveness in the modeling part as even BINGARCH with small orders can help us get rid of choosing an appropriate maximum lag value. Since for a Poisson model, the mean is the same as the variance, this can also be thought of as an extension of the GARCH model in the context of count data. First introduced by Ferland et al. (2006), these models were thoroughly analyzed in Zhu (2012cZhu ( ,a, 2011Zhu ( , 2012b; Ahmad and Francq (2016). Our proposal for the time-varying TVBINGARCH model adapts to the non-stationarity theme and also can be viewed as a new contribution. Finally, we contrast the time-varying AR and the GARCH for both simulations and real-data applications under different metrics of evaluation.
In our present context, the number of affected can be covered by the popular SIR model in this case, however, they assume additional structure on how these numbers evolve and then tries to estimate the rate. Instead, we do not assume any such specific evolution and offer a general perspective of how the trend behaves. This can shed light on how external factors played a role.
Given the rapidly evolving nature of the pandemic, the patterns and number of new affected cases are changing rapidly over different geographical regions. The rapid change in the observed counts make all earlier time-constant analysis inappropriate and builds a path where we can explore methodological and inferential development in tracking down the trajectory of this spread. Thus, we propose a novel semiparametric time-varying autoregressive model for counts to study the spread and examine the effects of these interventions in the spread based on the time-varying coefficient functions. A time-varying AR(p) process consists of a time-varying mean/intercept function along with time-varying autoregressive coefficient functions. We further generalize it to a time-varying Bayesian integer-valued generalized autoregressive conditional heteroscedasticity (TVBINGARCH) model where the conditional mean depends also on the past conditional means.
Regression models with varying coefficient were introduced by Hastie and Tibshirani (1993).
They modeled the varying coefficients using cubic B-splines. Later, these models has been further explored in various directions Gu and Wahba (1993); Biller and Fahrmeir (2001); Fan and Zhang (2008);Franco-Villoria et al. (2019); Yue et al. (2014). Spline bases have been routinely used to model the time-varying coefficients within non-linear time series models (Cai et al., 2000;Huang et al., 2002;Huang and Shen, 2004;Amorim et al., 2008). We also consider the B-spline series based priors to model the time-varying coefficient functions. We develop efficient computational algorithms for the proposed models. Apart from developing a computationally tractable hierarchical model, we also establish posterior contraction rates of the proposed models. To the best of our knowledge, the posterior contraction rate result of this paper is the first for the time-varying Markov model based on minimal assumptions under Poisson-link. Our posterior contraction rate is with respect to the average Hellinger metric. The primary theoretical hurdle is to construct exponentially consistent tests in a time-varying Markov setup. Our proposed test construction is inspired by Jeong et al. (2019); Ning et al. (2020). We construct the test relying on the Neyman-Pearson lemma with respect to negative average log affinity distance and calculate contraction rates. Then we show that the same rate holds for the average Hellinger metric as well. We also discuss a pointwise inferential tool by drawing credible intervals. Such tools are important to keep an objective perspective in terms of the evolution of the time-varying coefficients without restricting it to some specific trend models. See   (Karmakar (2018) for an earlier version) for a comprehensive discussion on time-varying models and their applications.
The rest of the paper is organized as follows. Section 2 describes the proposed Bayesian models in detail. Section 3 discusses an efficient computational scheme for the proposed method.
We calculate posterior contraction rates in Section 4. The performance of our proposed method in capturing true coefficient functions are studied in Section 5 and we show excellent performance over other existing methods. Section 6 deals with an application of the proposed method on COVID-19 spread for NYC. Then, we end with discussions and possible future directions in Section 7.
Section 8 contains detail theoretical proofs.

Modeling
In this paper, our motivating data is the daily count of newly confirmed cases of COVID-19 of New York City (NYC). Figure 1 illustrates the logarithm of the absolute values of fitted residuals for different methods on the COVID-19 spread data. We find that the time-varying AR (tvAR) model fits the data much better than other routinely used time-constant methods. Given the current knowledge of the incubation period of the virus (Lauer et al., 2020a), we fit models until lag 10 for the autoregressive part. For the conditional heteroscedastic models, we consider the order as (1,1), which is a standard choice for such models. This motivates us to consider the non-stationary time-series models such as usual tvAR regression. However, we emphasize that these existing methods are suitable only for continuous-valued variables and not for count-valued series.
This motivation sets up the stage to discuss Poisson autoregression with time-varying coefficients. Let {X t } be a count-valued time series. In this paper, we consider two different structures for the conditional mean of {X t } given the history of the process. The first modeling framework is in the spirit of time-varying auto-regressive models. Next, we consider a more general modeling structure adding an additional recursive term in the intensity parameter.

Time-varying auto-regressive model for counts
The linear Poisson autoregressive model (Zeger, 1988;Brandt and Williams, 2001) is popular in analyzing count valued time series. Due to the assumed non-stationary nature of the data, we propose a time-varying version of this model. The conditional distribution for count-valued (2.1) We call our method time-varying Bayesian Auto Regressive model for Counts (TVBARC). The rescaling of the time-varying parameters to the support [0,1] is usual for in-filled asymptotics.
Due to the Poisson link in (2.1), both conditional mean and conditional variance depend on the past observations. The conditional expectation of X t in the above model (2.1) is E(X t |F t−1 ) = µ(t/T ) + p i=1 a i (t/T )X t−i , which needs to be positive-valued. To ensure that, we impose the following constraints on parameter space for the time-varying parameters, Note that, the conditions imposed (2.2) on the parameters are somewhat motivated by the sta- imposed for all the literature mentioned above. Additionally, the above references heavily depend on local stationarity: namely, for every rescaled time 0 < t < 1, they assume the existence of añ X i process which is close to the observed process. One key advantage of our proposal is it is free of any such assumption. Our assumption of only the first moment is also very mild for theoretical exploration in Section 4. Moreover, except for a very general linear model discussed in , to the best of our knowledge, this is the very first analysis of the time-varying parameter for count time-series modeled by Poisson regression. Thus we choose to focus on the methodological development rather than proving the optimality of these conditions. When p = 0, our proposed model reduces to routinely used nonparametric Poisson regression model as in Shen and Ghosal (2015).
To proceed with Bayesian computation, we put priors on the unknown functions µ(·) and a i (·)'s such that they are supported in P 1 . The prior distributions on these functions are induced through basis expansions in B-splines. Suitable constraints on the coefficients are imposed to ensure the shape constraints as in P 1 . Detailed description of the priors are given below, Here B j 's are the B-spline basis functions. The parameters δ j 's are unbounded. Based on the constraints on the parameter space we consider following prior for α j 's and τ i 's, The priors induced by above construction are P 1 -supported. The verification is very straightforward. In above construction, We have P j=1 M j = 1 if and only if δ 0 = −∞, which has probability zero. On the other hand, we also have µ(·) ≥ 0 as we have α j ≥ 0. Thus, the induced priors, described in (2.3)− (2.6) are well supported in P 1 .

Time-varying generalized autoregressive conditional heteroscedasticity model for counts
In the previous model, both conditional mean and conditional variance depend on the past observations. However, Ferland et al. (2006) proposed integer valued analogue of generalized autoregressive conditional heteroscedasticity model (GARCH) after observing that the variability in number of cases of campylobacterosis infections also changes with level. Given the complexity of lag selection of TVBARC, we also introduce the following time-varying version of the integer valued generalized autoregressive conditional heteroscedasticity model (INGARCH) for counts.
The conditional distribution for count-valued time-series X t given F t−1 = {X i : i ≤ (t − 1)} and (2.7) We call our method time-varying Bayesian Integer valued Generalized Auto Regressive Conditional Heteroscedastic (TVBINGARCH) model. We impose following constraints on the parameter space similar to Ferreira et al. (2017), This constraint ensure a unique solution of the time-varying GARCH process as discussed in Ferreira et al. (2017). Now, we modify the proposed prior from the previous subsection to put prior on the functions µ(·), a i (·) and b j (·) such that they are supported in P 2 . Using the B-spline bases, we put following hierarchical prior on the unknown functions, where λ 0 is the rate parameter for X 0 . We primarily focus on the special case where p = 1, q = 1.
Based on the constraints on the parameter space we consider following prior for α j 's and τ i 's, Similar calculations from the previous subsection also show that the above hierarchical prior is well-supported in P 2 . We only consider TVBINGARCH(1,1) which is commonly used for the GARCH class of models. A major drawback of TVBARC is proper lag selection. To alleviate this, we propose the TVBINGARCH framework. As in the stationary case, TVBINGARCH(1,1) can be viewed as TVBARC with infinite order. Then the higher values in b 1 (·)'s is an indication that there might be important higher lags in TVBARC.

Posterior computation
In this section, we discuss the Markov Chain Monte Carlo (MCMC) sampling method for posterior computation. Our proposed sampling is dependent on the gradient-based Hamiltonian Monte Carlo (HMC) sampling algorithm (Neal et al., 2011). Hence, we show the gradient computations of the likelihood with respect to different parameters for TVBARC(p) and TVBINGARCH(p, q) in the following two subsections.

TVBARC structure
The complete likelihood L 1 of the proposed Bayesian method in (2.1) is given by . We develop efficient MCMC algorithm to sample the parameter β, θ and δ from the above likelihood.
The derivatives of above likelihood with respect to the parameters are easily computable. This helps us to develop an efficient gradient-based MCMC algorithm to sample these parameters. We calculate the gradients of negative log-likelihood (− log L 1 ) with respect to the parameters β, θ and δ. The gradients are given below, where 1 {j=k} stands for the indicator function which takes the value one when j = k.

TVBINGARCH structure
The complete likelihood L 2 of the propose Bayesian method of (2.7) is given by We calculate the gradients of negative log-likelihood (− log L 2 ) with respect to the parameters β, θ, η and δ. The gradients are given below, While fitting TVBINGARCH(p, q), we assume for any t < 0 X t = 0, λ t = 0. Thus, we need to additionally estimate the parameter λ 0 , the Poisson rate parameter for X 0 . The derivative of the likelihood concerning λ 0 is calculated numerically by differentiating from the first principles.
Hence, it is sampled using the HMC algorithm too.
As the parameter spaces of θ ij 's and η kj 's have bounded support, we map any Metropolis candidate, falling outside of the parameter space back to the nearest boundary point of the parameter space. To obtain a good acceptance rate, we tune our HMC sampler periodically. There are two tuning parameters in HMC namely the leapfrog step, and the step size parameter. The step size parameter is tuned to maintain an acceptance rate within the range of 0.6 to 0.8. The step size is reduced if the acceptance rate is less than 0.6 and increased if the rate is more than 0.8. This adjustment is done automatically after every 100 iterations. However, we choose to pre-specify the leapfrog step at 30 and obtain good results. Due to the increasing complexity of the parameter space in TVBINGARCH, we consider updating all the parameters involved in a i (·)'s, b k (·)'s, and λ 0 together.

Large-sample properties
In this section we obtain posterior contraction rates for the two proposed models. For clarity of presenting the assumptions under which these results are true, we will make the conditions in (2.2) and (2.8) more specific. To study posterior contraction properties, the priors for α j and τ i are modified little bit to ensure better control over tail probabilities,

TVBARC structure
We start by studying large sample properties of the simpler AR model in (2.1). For simplicity, we fix order p at p = 1 for this section however the results are easily generalizable for any fixed order p. The posterior consistency is studied in the asymptotic regime of increasing sample size T . Let κ = (µ, a 1 ) stands for the complete set of parameters. For sake of generality of the method, we put a prior on K 1 and K 2 with probability mass function given by, Poisson and geometric probability mass functions appear as special cases of the above prior density for b i3 = 1 or 0 respectively. These priors have not been considered while fitting the model as it would require computationally expensive reversible jump MCMC strategy. We study the posterior consistency with respect to the average Hellinger distance on the coefficient functions which is Here P stand for the conditional Poisson density defined in (2.1). The contraction rate will depend on the smoothness of true coefficient functions µ and a and the parameters b 13 and b 23 from the prior distributions of K 1 and K 2 . Let κ 0 = (µ 0 , a 10 ) be the truth of κ.
Assumptions (A): There exists constants 0 < M µ < M X such that, 3) is imposed to ensure strict positivity of parameters and is standard in time-varying literature that deals with such constrained parameters. The posterior contraction rate at κ 0 ∈ A with respect to the metric d 1,T on A is a sequence T → 0 such that P κ 0 Π(κ : where A denotes the function spaces of (µ, a 1 ) satisfying Assumptions (A) and the observation X (T ) = {X 0 , X 1 , . . . , X T }.
Theorem 1. Under assumptions (A.1)-(A.3), let the true functions µ 0 (·) and a 10 (·) be Hölder smooth functions with regularity level ι 1 and ι 2 respectively, then the posterior contraction rate with respect to the distance d 2 1,T is where b ij are specified in (4.2). For the proof, the first step is to calculate posterior contraction rate with respect to average log-affinity r 2 and then show that T . The average log-affinity provides a unique advantage to construct exponentially consistent tests leveraging on the famous Neyman-Pearson Lemma as has also been used in Ning et al. (2020) for a multivariate linear regression setup under group sparsity.
The proof is postponed to Section 8. The proof is based on the general contraction rate result from Ghosal and Van der Vaart (2017) and some results on B-splines based finite random series.

TVBINGARCH structure
Next, we discuss the more comprehensive tvBINGARCH model (2.7). To maintain simplicity in the proof, we again assume p = 1, q = 1. Similar to the previous subsection, we put a prior on the number of Bspline bases, K i with probability mass function given by, with b i1 , b i2 > 0 and 0 ≤ b i3 ≤ 1 for i = 1, 2, 3. Let us assume that ψ = (µ, a 1 , b 1 ) be the complete set of parameters. We study the posterior consistency with respect to the Hellinger distance on the coefficient functions which is Here P stands for the conditional Poisson density defined in (4).
For this structure, we modify the assumptions as Assumption (B.3) is imposed to ensure strict positivity of parameters and is standard in timevarying literature that deals with such constrained parameters.
Theorem 2. Under assumptions (B.1)-(B.3), let the true functions µ 0 (·), a 10 (·) and b 10 (·) be Hölder smooth functions with regularity level ι 1 , ι 2 and ι 3 respectively, then the posterior contraction rate with respect to the distance d 2 2,T is The proof follows from a similar strategy as in Theorem 1. An outline of the proof can be found in the Section 8.

Simulation studies
In this section, we study the performance of our proposed Bayesian method in capturing the true coefficient functions. We compare both TVBARC and TVBINGARCH methods with some other competing models. It is important to note that, this is to the best of our knowledge first work in Poisson autoregression with a time-varying link. Thus, we compare our method with the existing time-series models with time-constant coefficients for count data and time-varying AR with Gaussian error. We also examine the estimation accuracy of the coefficient functions for estimating the truth.
The hyperparameters c 1 and c 2 of the normal prior are all set 100, which makes the prior weakly informative. The hyperparameters for Inverse-Gamma prior d 1 = 0.1, which is also weakly informative. We consider 6 equidistant knots for the B-splines based on comparing the AMSE scores. We choose the knot number after which the AMSE score does not change significantly. We whereλ S t is the posterior estimate of λ t at S-th postburn sample.

Case 1: TVBARC structure
Here, we consider two model settings p = 1; X t ∼ Poisson(µ(t/T ) + a 1 (t/T )X t−1 ) and p = 2; X t ∼ Poisson(µ(t/T ) + a 1 (t/T )X t−1 + a 2 (t/T )X t−2 ) for t = 1, . . . , T . Three different choices for T have been considered, T = 100, 500 and 1000. The true functions are, We compare the estimated functions with the truth for sample size 1000 in Figures 2 and Figure 3 for the models p = 1 and p = 2 respectively. Tables 1 and 2 illustrate the performance of our method with respect to other competing methods.

Case 2: TVBINGARCH structure
For the tvBINGARCH case, we only consider one simulation settings p = 1, q = 1; X t ∼ Poisson(µ(t/T )+ a 1 (t/T )X t−1 + b 1 (t/T )λ t−1 ). Two different choices for T have been considered, T = 100 and 200, with p = 1, q = 1. The performance of our method is compared to other competing methods in Tables 3. Figure 2 to 4 shows that our proposed Bayesian method captures the true functions quite well for both of the two simulation experiments. We find that the estimation accuracy improves as the sample size increases. As the sample size grows, the 95% credible bands are also getting tighter, implying lower uncertainty in estimation. This gives empirical evidence in favor of the estimation consistency which has also been verified theoretically in Section 4. The average mean square error (AMSE) is always the lowest for our method in Tables 1, 2 and 3.

COVID-19 spread at NYC
We collect the data of new affected cases for every day from 23rd January to 14th July from an open-source platform {https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset}.
The end date 14th July is chosen as around that time NYC started the process of re-opening. The data on daily new cases are illustrated in Figure 5. We were particularly interested in NYC data as this city remained an epicenter in US for about a month. With the help of government interventions and sustained lock-down, the recovery was significant in about 3 months. Such a time-varying nature of the data motivated us to retrospect as how the mean trend and AR trend behave which can also shed some insight about effects of lockdown or the contagious spread.

Based on the findings on the incubation of the virus in Lauer et al. (2020a) and others, it is
understood that the symptoms often take some time after the virus affects through contagion. Our idea is to consider different models with varying number of lags for this. We consider TVBARC(1), TVBARC(10) and TVBINGARCH(1,1) here. The results for the TVBARC(1) are illustrated in Figure 6. We see that during the spike in daily new cases the function a 1 (·) is the highest. Figure 7 depicts the estimated mean and coefficient functions from a TVBARC(10) model. We find that the estimated a 1 (·) functions show a similar trend. On top of that, we see that a 6 (·), a 7 (·) and a 8 (·) have also some effect. Finally we fit our TVBINGARCH(1,1) which might be considered TVBARC with infinite order. Figure 8 depicts the estimated functions, the mean {µ(·)}, AR (1) {a 1 (·)} and CH(1) {b 1 (·)} coefficient functions. In Table 4, we compare the AMSE scores across different models. For all the models, we consider 12 equidistant knots based on the AMSE scores as discussed in Section 5. preprint (Roy and Karmakar, 2020) through an empirical early-stage analysis of the spread in different cities and countries.
The effect of Lag 6, 7, and 8 can be attributed to the incubation period of the virus. It can also lead to the finding that there was a weekly periodicity which is probably due to shorter testing/administrative facilities being available during the weekend. Note that our choice of fitting an TVBARC(10) model is more general than separately fitting a seasonal/periodic time-series model. Another important finding is coming from the overall trend of a 1 (·). It starts to decrease when the number of cases starts going down. However later on it varies around 0.6 can be attributed to the fact that the number of new cases did not vary much and remained around the same level from the middle of May. The credible bands look very small around the mean function which is probably due to the large magnitude of the estimated function.     ysis of NYC data shows that there is a time-varying effect of Lag 6, 7, and 8. Some preliminary analysis on COVID data using our model based on the data until April 24 are archived in our unpublished pre-print Roy and Karmakar (2020). There are some more interesting findings related to significant lags for different countries.
As future work, it will be interesting to include some country-specific information such as demographic information, geographical area, the effect of environmental time-series, etc in the model. These are usually important factors for the spread of any infectious disease. We can also categorize the different types of government intervention effects to elaborate more on the specific impacts of the same. In the future we wish to analyze the number of deaths, number of recovered cases, number of severe/critical cases, etc. for these diseases as those will hopefully have different dynamics than the one considered here and can provide useful insights about the spread and measures required. For computational ease, we have considered the same level of smoothness for all the coefficient functions. Fitting this model with different levels of smoothness might be able to provide more insights. Lag selection is a difficult task for time-varying auto-regressive models. One potential future direction would be to put sparsity inducing prior to the time-varying coefficient functions in TVBARC for automatic lag detection. Other than building time-varying autoregressive models for count-valued data using the hierarchical structure from this article, one interesting future direction is to extend this model for vector-valued count data. In general, it is difficult to model multivariate count data. There are only a limited number of methods to deal with multivariate count data (Besag, 1974;Yang et al., 2013;Roy and Dunson, 2019). Building on these multivariate count data models, one can extend our time-varying univariate AR(p) to a time-varying vector-valued AR(p). On the same note, even though we imposed Poisson assumption for increased model interpretation, in the light of the upper bounds for the KL distance, it is not a necessary criterion and can be applied to a general multiple non-stationary count time-series.
Extending some of the continuous time-series invariance results for nonlinear non-stationary and multiple series from  to a count series regime will be an interesting challenge. Finally, we wish to undertake an autoregressive estimation of the basic reproduction number with the time-varying version of compartmental models in epidemiology.

Proof of Theorems
We study the frequentist property of the posterior distribution is increasing T regime assuming that the observations are coming from a true density f 0 characterized by the parameter κ 0 . We follow the general theory of Ghosal et al. (2000) to study the posterior contraction rate for our problem.
In the Bayesian framework, the density f is itself a random measure and has distribution Π which is the prior distribution induced by the assumed prior distribution on κ. The posterior distribution of f (X (T ) )dΠ(κ)

General proof strategy
The posterior consistency would hold if above posterior probability almost surely goes to zero in F (T ) κ 0 probability as T goes to zero, where F (T ) κ 0 is the true distribution of X (T ) . Recall the definition of posterior contraction rate; for a sequence κ 0probability for every sequence M T → ∞, then the sequence T is called the posterior contraction rate. If the assertion is true for a constant M T = M , then the corresponding contraction rate becomes slightly stronger.
Note that for two densities f 0 , f characterized by κ 0 and κ respectively, the Kullback-Leibler divergences are given by .
Assume that there exists a sieve in parameter space such that Π(W c T ) ≤ exp(−(C T + 2)T 2 T ) and we have tests χ T such that We can bound the posterior probability from above by, Taking expectation with respect to κ 0 , first term go to zero by construction of χ T . The second term E κ 0 1{S c T } goes to zero due to Lemma 8.21 of Ghosal and Van der Vaart (2017) for any sequence C T → ∞. We would require that Π T { 1 T KL(κ 0 , κ) < T } ≥ exp(−T 2 T ). Then for the third term, Thus we need three thing to calculate posterior contraction rate.
We first study the contraction properties with respect to and then show that the same rate holds for average Hellinger 1 T d 2 H (f, f 0 ). Note that L T can be taken as L T = M 2 T . With the above general structure, we now proceed to prove individual theorems focusing on the TVBARC and the TVINGARCH cases.

Proof of Theorem 1
For the sake of technical convenience we show our proof for time-varying AR model with 1 lag only. All the proofs go through for higher lags with the same technical tools.

KL Support
The likelihood based on the parameter space κ is given, P κ (X 0 ) T t=1 P κ (X t |X t−1 ). Let Q κ,t (X t ) be the distribution of X t with parameter space κ.

Posterior contraction in terms of average negative log-affinity
In this section, we focus on the requirements to calculate posterior contraction rate as in Section 8.1.We first show posterior consistency in terms of average negative log-affinity which is de- between f 1 and f 2 . Here, we have f 1 = T i=1 P κ 1 (X i |X i−1 ). Then we show that, having r 2 T (f 1 , f 0 ) 2 n implies that our distance metric d 2 2,T (f 1 , f 0 ) 2 n . Proceeding with the rest of the proof of Theorem 1, we use the results of B-Splines, µ−µ 0 ∞ ≤ α − α 0 ∞ , where α = {α j } and a 1 − a 10 ∞ ≤ γ − γ 0 ∞ , where γ j = θ 1j M 1 , such that γ j < 1.
The Hölder smooth functions with regularity ι can be approximately uniformly up to order K −ι with K many B-splines. Thus we have T max{K −ι 1 1T , K −ι 2 2T }. We need to lower bound the prior probability as required by (i). We have the result (8.6) and based on the discussion of A2 from Shen and Ghosal (2015). The rate of contraction cannot be better than the parametric rate T −1/2 , and so log(1/ T ) log T . Thus (i) requires that in terms of pre-rate¯ T , we need In our problem, we consider following sieve as required by (ii) where A T , B T are at least polynomial in T and λ 0 is the mean of X 0 and K T = max{K 1T , K 2T }.
We take ρ T T −a with a < 1, A T T a 1 , B T T a 2 with a 2 > a 1 for technical need. Note that, for κ ∈ W T , we have E κ (X t ) < B T . We need to choose these bounds carefully so that , which depend on tail properties of the prior. We have, is the vector of full set of coefficients of length K 1T and γ K 2T is the vector of coefficients of length K 2T . The quantity can be further upper bounded by K 1T Π(α 1 / ∈ [ρ T , A T )]) ≤ K 1T exp{−R 1 T a 3 }, for some constant R 1 , a 3 > 0 which can be verified from the discussion of the assumption A.2 of Shen and Ghosal (2015) for our choice of prior which exponential. On the other hand, T a 4 } for some constant R 2 , a 4 > 0 which can be verified from the proof of Roy et al. (2018). The gamma prior of λ 0 has exponential tail similar to α 1 and thus can be ignored as K 1T grows with T . Since B T > A T , the tail of λ 0 can be upper bounded by tail of α 1 Hence, Π(W c T ) F 1 (K 1T ) + F 2 (K 2T ) + (K 1T + K 2T ) exp{−RT a 5 }. The two functions F 1 and F 2 in the last expression stand for the tail probabilities of the prior of K 1 and K 2 . We can calculate their asymptotic order as, Hence, we calculate pre-rate from the following equation for some sequence H T → ∞, Now, we construct test χ T such that To construct the test as required in (iii), we first construct the test for point alternative H 0 : κ = κ 0 vs H 1 : κ = κ 1 . The most powerful test for such problem is Neyman-Pearson test It is natural to have a neighborhood around κ 1 such the Type II error remains exponentially small for all the alternatives in that neighborhood under the test function φ 1T . By Cauchy-Schwarz inequality, we can write that In the above expression, the first factor already exponentially decaying. The second factor can be allowed to grow at most of order e cT 2 T for some positive small constant c. We show that E κ 1 (f /f 1 ) 2 is bounded for every κ such that We have, in the light of AM-GM inequality, T Towards uniformly bounding the summand in the above display, we write where, λ = µ(t/T ) + a 1 (t/T )X t−1 ,λ 1 = µ 1 (t/T ) + a 11 (t/T )X t−1 and E X t−1 ,κ denotes unconditional expectation over X t−1 under the density f with parameter κ. Let us define r 1 = µ − µ 1 ∞ and Assuming µ(t/T ) − µ 1 (t/T ) and a 1 (t/T ) − a 11 (t/T ) very small, we can write (µ(t/T ) + a 1 (t/T )X t−1 ) T +1 (µ 1 (t/T ) + a 11 (t/T )X t−1 ) T = 1 + µ(t/T ) − µ 1 (t/T ) + (a 1 (t/T ) − a 11 (t/T ))X t−1 µ 1 (t/T ) + a 11 (t/T )X t−1 T (µ(t/T ) + a 1 (t/T )X t−1 ) ≈ 1 + T µ(t/T ) − µ 1 (t/T ) + (a 1 (t/T ) − a 11 (t/T ))X t−1 µ 1 (t/T ) + a 11 (t/T )X t−1 (µ(t/T ) + a 1 (t/T )X t−1 ) (8.10) For the above approximation to hold, we need µ(t/T )−µ 1 (t/T )+(a 1 (t/T )−a 11 (t/T ))X t−1 µ 1 (t/T )+a 11 (t/T )X t−1 to be small. To verify that, observe that As we have ρ T = T −a with a < 1, it follows directly. Thus (8.9) before E X t−1 ,κ applying on (8.10) The bound in (8.11) is obtained by applying a combination of the following inequalities µ(t/T ) + a 1 (t/T )X t−1 > ρ T or > ρ T X t−1 , |µ(t/T ) − µ 1 (t/T )| < r 1 and |a 1 (t/T ) − a 11 (t/T )| < r 2 . Taking q = T r 2 2 /ρ T , last part becomes E(e qX t−1 ) after taking exectation over (8.11). We have E(e qX 0 ) = e λ 0 (e q −1) < e B T (e q −1) = e Q for Q = B T (e q − 1) =⇒ (e q − 1) = Q/B T , B T is the upper bound for λ 0 in the sieve). We will show E(e qX 1 ) < Q under the above choice of r 1 and r 2 . Then by recursion it holds for all t. We use the result e q − 1 ≤ 2q for q < 1.
Hence, E(e qX 1 ) < e Q . Recursively, for all t, we can show E(e qXt ) < e Q .
Our primary goal of showing E κ 1 (f /f 1 ) 2 < ∞ can be fulfilled if Q is a constant, independent of T . To ensure Q is independent of T we need B T (e q − 1) is constant. It suffices to make qB T constant as qB T < B T (e q − 1) < 2qB T . Thus, for r 2 ≤ √ ρ T √ T B T and in the light of (8.11) r 1 ≤ The test function χ T satisfying exponentially decaying Type I and Type II probabilities is then obtained by taking maximum over all tests φ jT 's for each ball, having above radius. Thus χ T = max j φ jT . Type I and Type II probabilities are given by P 0 (χ T ) ≤ j P 0 φ jT ≤ D T P 0 φ jT and sup κ∈W T :r 2 T (κ,κ 0 )>L T 2 T P (1 − χ T ) ≤ exp(−T L T 2 T ). Hence, we need to show that log D T T 2 T , where D T is the required number of balls of above radius needed to cover our sieve W T . We have log D T ≤ log D(r 1 , α ∞ ≤ A T , min(α) > ρ T , · ∞ ) + log D(r 2 , γ ∞ ≤ 1 − A T B T , min(γ) > ρ T , · ∞ ) ≤ K 1T log(3K 1T A T /r 1 ) + K 2T log(3K 2T /r 2 ) (8.12) Given our choices of A T , B T and ρ T , the two radii r 1 and r 2 are some fractional polynomials in T .
Thus log D T (K 1T + K 2T ) log T , which is required to be T 2 T as in the prior mass condition due to (i).
Combining all these conditions, we would require K 1T as we can upper bound max t b 11 (t) by (1 − Mµ have a lower bound for A T for technical need. We take ρ T ≈ T −a with a < 1, A T = B T (1 − exp(log T /T )ρ T ), B T ≈ T a 2 for sufficiently large T such that exp(log T /T )ρ T < 1. Within the sieve again we use a variant of above inequality. Note that within the sieve E(X t ) ≤ B T and E(λ t ) ≤ B T .