Dynamic quantile linear models: a Bayesian approach

A new class of models, named dynamic quantile linear models, is presented. It combines dynamic linear models with distribution free quantile regression producing a robust statistical method. Bayesian inference for dynamic quantile linear models can be performed using an efficient Markov chain Monte Carlo algorithm. A fast sequential procedure suited for high-dimensional predictive modeling applications with massive data, in which the generating process is itself changing overtime, is also proposed. The proposed model is evaluated using synthetic and well-known time series data. The model is also applied to predict annual incidence of tuberculosis in Rio de Janeiro state for future years and compared with global strategy targets set by the World Health Organization.


Introduction
The paper proposes a broad new class of models by combining two innovative areas developed during the last quarter of the twentieth century, namely dynamic linear models and quantile regression. In the 1970's, a new idea to model time series data was proposed by Harrison and Stevens (1976), which can be viewed as the use of regression models with parameters varying across time. Around the same time, Koenker and Bassett (1978) introduced the quantile regression model, a robust statistical procedure which generalizes the L 1 regression. Quantile regression is advantageous compared to standard mean regression since it provides richer information about the covariate effects, is robust to heteroscedasticity and outliers, and can accommodate the non-normal errors often encountered in practical applications.
The inferential approaches of dynamic linear models and quantile regression are, nevertheless, completely distinct. While the first contribution follows the Bayesian paradigm, the second resorts to optimization techniques to solve the stated minimization problem, and its inference is theoretically based on large sample theory. In the next paragraphs, we state the main novelties in both consolidated areas.
A simple minimization problem yielding the ordinary sample quantiles in the location model was shown to generalize naturally to the linear model, generating a broad new class of models introduced in the 1970s by Koenker and Bassett (1978), named "quantile regression models". Bayesian inference for quantile regression proceeds by forming the likelihood function based on the asymmetric Laplace distribution (Yu and Moyeed, 2001). Kozumi and Kobayashi (2011) proposed a Gibbs sampling to sample from the posterior distribution of the unknown quantities using the normal-exponential mixture representation for the asymmetric Laplace distribution. The recent literature has introduced Bayesian quantile regression in: Tobit models (Yu and Stander, 2007), censored models (Reich and Smith, 2013), binary models (Benoit and Van den Poel, 2012), ordinal models (Rahman, 2016), hierarchical linear models (Yu, 2015), longitudinal data models (Luo et al., 2012;Alhamzawi and Ali, 2018;Rahman and Vossmeyer, 2018), time series models (Cai et al., 2012), spatial models (Reich et al., 2011) and spatiotemporal models (Reich, 2012;Neelon et al., 2015).
Dynamic linear models (DLMs) were introduced by Harrison and Stevens (1976) and extended to generalized linear models by West et al. (1985). DLMs are part of a broad class of models with time varying parameters, useful for modeling and forecasting time series and regression data (West and Harrison, 1997;Durbin and Koopman, 2002;Migon et al., 2005;Petris et al., 2009;Prado and West, 2010). The advent of stochastic simulation techniques stimulated applications of the state space method to model complex stochastic structures, like dynamic spatiotemporal models (Gamerman et al., 2003), dynamic survival models (Bastos and Gamerman, 2006), dynamic latent factor models (Lopes et al., 2008), and multiscale models (Ferreira et al., 2011;Fonseca and Ferreira, 2017). Real applications of the method have been also appeared in hydrology (Ravines et al., 2008;Fernandes et al., 2009), intraday electricity load (Migon and Alves, 2013), finance (Zhao et al., 2016), and many other areas.
We extend the dynamic linear models to a new class, named dynamic quantile linear models, where a linear function of the state parameters is set equal to a quantile of the response variable at time t, y t , similar to the quantile regression of Koenker (2005). This method is suited to high-dimensional predictive modeling applications with massive data in which the generating process itself changes over time. Our proposal keeps the most relevant characteristics of DLMs such as: (i) all relevant information sources are used: history, factual or subjective experiences, including knowledge of forthcoming events; (ii) everyday forecasting is generated by a statistical model and exceptions can be taken into account as an anticipative or retrospective base; (iii) what happened and what if analyses are easily accommodated; and (iv) the model can be decomposed into independent components describing particular features of the process under study.
The relative ease with which Markov chain Monte Carlo (MCMC) methods can be used to obtain the posterior distributions, even in complex situations, has made Bayesian inference very useful and attractive, but at the expense of losing the sequential analysis of both states and parameters. Therefore, some fast computing alternatives exploring analytical approximations have been welcome, as in Da-Silva et al. (2011) andSouza et al. (2018). In this paper, we introduce the inference via MCMC methods, and also via an alternative approach based on normal approximations and Bayes linear estimation (West et al., 1985), which besides being computationally faster than MCMC, recovers the sequential analysis of the data. Our approximation procedure provides the marginal likelihood sequentially as new data arrive, which is essential to perform sequential model monitoring and model selection (West, 1981).
The remainder of the paper is organized as follows. Section 2 explores in more details the dynamic quantile linear model. Section 3 presents our efficient MCMC algorithm and the sequential approach for dynamic quantile linear modeling. Section 4 illustrates the proposed method with synthetic data, and also presents some results of the wellknown time series of the annual flow of the Nile River at Aswan from 1871 to 1970. The model is also applied to time series data on tuberculosis in Rio de Janeiro state, Brazil, from 2001 to 2015. We also predict the incidence for future years and compare the results with those of the global tuberculosis strategy targets established. Finally, Section 5 concludes with discussion of the paper and some possible extensions.

Dynamic quantile linear model
The τ -th quantile of a random variable y t at time t can be represented as a linear combination of explanatory variables, where Q τ (y t ) is the τ -quantile of y t , formally defined as Q τ (y t ) = inf{y * : P (y t < y * ) ≥ τ }, for 0 < τ < 1. F t is a p × 1 vector of explanatory variables at time t, and θ (τ ) t is a p × 1 vector of coefficients depending on τ and t. From now on, the superscript τ will be omitted in order to keep the notation as simple as possible.
For a given time t, Koenker and Bassett (1978) defined the τ -th quantile regression estimator of θ t as any solution of the quantile minimization problem where ρ τ (.) is the loss (or check) function defined by ρ τ (u) = u(τ − I(u < 0)), with I(·) denoting the indicator function. Minimizing the loss function ρ τ (·) is equivalent to maximizing the likelihood function of an asymmetric Laplace (AL) distribution, as pointed out by Yu and Moyeed (2001). However, instead of maximizing the likelihood, as in Yu and Moyeed (2001), we derive the posterior distribution of the τ -th quantile regression coefficients at time t using the AL distribution. Therefore, regardless of the distribution of y t , it is enough to assume that: where μ t = F t θ t ∈ R is a location parameter, φ −1/2 > 0 is a scale parameter, and τ ∈ (0, 1) is a skewness parameter representing the quantile of interest. In dynamic modeling, one goal is also to obtain the predictive distribution. This can be done in a robust fashion using a grid of values of τ ∈ (0, 1) to describe the full predictive distribution of y t . Nevertheless, in this paper, we focus on providing precise inference about the linear predictor μ t for each τ -th quantile.
In a dynamic linear model, the states at time t depend on the states at time t − 1 according to an evolution equation θ t = G t θ t−1 + w t , where G t is a (p × p) matrix describing the evolution parameters and w t is a zero mean Gaussian error with variance matrix W t . Therefore the proposed dynamic quantile linear model (DQLM) is defined as: where N p (μ, Σ) denotes a multivariate Gaussian distribution with p-dimensional mean vector μ and p × p covariance matrix Σ.
In the next section, we present two methods of inference for the proposed model. In the first one, we extend the algorithm proposed by Kozumi and Kobayashi (2011) to Bayesian (static) quantile regression and propose an efficient MCMC algorithm to sample from the posterior distribution of the unknown quantities of the dynamic quantile linear model in equation (2.2). The second one is a computationally cheaper alternative that explores analytical approximations and Bayes linear optimality. An advantage of the last one is that it provides the marginal likelihood sequentially as new data arrive.

Posterior inference for the DQLM
The main tasks involved in state space model inference are estimation of the states and prediction of future values based on the current information. The conditional densities π(θ s | D t ), where D t = {D t−1 ∪ I t ∪ y t } represents all information until time t, are calculated for different values of s and t. The filtering corresponds to the case s = t, state prediction to s > t and smoothing to s < t, for t = 1, 2, . . . , T.
The inference in DLM assumes the data come sequentially in time. The expressions for updating from the filtering density π(θ t−1 | D t−1 ) to π(θ t | D t ) are easily obtained following the Bayesian argument. Given the state posterior distribution at time t − 1, denoted by π(θ t−1 |D t−1 ), we obtain the prior distribution π(θ t |D t−1 ) using the state parameters' time evolution equation. As soon as we get new information, we can obtain by Bayes' theorem the posterior distribution at time t. The one step ahead predictive distribution is obtained by π(y t | D t−1 ) = π(y t , θ t |D t−1 )dθ t .
The retrospective analysis instead consists of estimating the state sequence at times 1, . . . , t, given D t . It is solved by computing the conditional distribution of (θ 1 , . . . , θ t ) given D t . As for filtering, smoothing can be implemented via a recursive algorithm.
Moreover, for a DLM including a possibly multidimensional unknown parameter in its specification, the joint posterior distribution of the parameter and unobservable states, in general, is not available in closed form. The inclusion of the states in the posterior distribution usually simplifies the design of an efficient sampler. In fact, drawing the posterior distribution of the parameter given the states is almost invariably easier than drawing it from the marginal. In addition, efficient algorithms to generate the states conditionally on the data and the unknown parameter are available, such as forward filtering backward sampling (FFBS) in the normal case (Frühwirth-Schnatter, 1994).
However, if one needs to update the posterior distribution after one or more new observations become available, then one has to run MCMC all over again, which can be extremely inefficient (Petris et al., 2009, chapter 4, p. 149). On-line analysis and simulation-based sequential updating of the posterior distribution of states and unknown parameters are best dealt with by employing sequential techniques, such as Sequential Monte Carlo (SMC) methods (Doucet et al., 2001).
Based on those ideas, we present two different algorithms to make inferences in the proposed dynamic quantile linear model in equation (2.2). The first method consists of an efficient MCMC algorithm to sample from the posterior distribution of the unknown quantities of the model. The second one is a faster alternative which performs a sequential analysis as new data arrive. Kotz et al. (2001) presented a location-scale mixture representation of the AL distribution that allows finding analytical expressions for the conditional posterior densities of the model. In this way, if a random variable follows an AL distribution, i.e. y t | μ t , φ, τ ∼ AL μ t , φ −1/2 , τ , then we can write y t using the following mixture representation:

Efficient MCMC algorithm
where a τ = 1−2τ τ (1−τ ) and b τ = 2 τ (1−τ ) are constants that depend only on τ , with Ga(α, β) denoting the gamma distribution with mean α/β and variance α/β 2 . Therefore, the dynamic quantile linear model in equation (2.2) can be rewritten as the following hierarchical model: for t = 1, . . . , T . To allow for some flexibility in the model in equation (3.2), we can even assume that φ −1/2 changes with time, which can be done in logarithmic scale according to a random walk or by using discounted variance learning through the multiplicative gamma-beta-gamma model (West and Harrison, 1997, chapter 10, p. 357).
The posterior distribution of the parameters in the model in equation (3.2) is given by The quantity I t represents all external information at time t. If there is no external information at time t, then I t = ∅. All prior information is summarized in D 0 = I 0 , containing all hyperparameters associated with the prior distributions. The unknown quantities are defined as follows: We can sample from the posterior distribution in equation (3.3) through a MCMC algorithm. Our starting point is the efficient Gibbs sampler for Bayesian (static) quantile regression proposed by Kozumi and Kobayashi (2011). The dynamic coefficients are then sampled using a FFBS algorithm (Carter and Kohn, 1994;Frühwirth-Schnatter, 1994;Shephard, 1994). The FFBS is a two-step efficient block sampler that draws the states jointly, given the parameters for linear Gaussian state-space models. The data are sequentially processed to update numerical summaries of the filtering densities π(θ t |D t ) at time t (forward filtering) and then the joint distribution is simulated via the implied backward compositional form (backward sampling). In this case, using standard results about the multivariate Gaussian distribution, it is easily proved that the random vector (Θ, D T ) has a Gaussian distribution so, the marginal and conditional distributions are also Gaussian distributions.
2. Next, use the FFBS method to sample from π(θ | ·): (i) Forward filtering: for t = 1, . . . , T calculate In place of assuming a Wishart prior for W −1 t it is also possible to use a discount factor δ ∈ (0, 1) subjectively assessed, controlling the loss of information. In this case, the only difference is that R t is recalculated according to a discount factor δ such as The full conditional distribution of U t is obtained using Lemma 3.1, which shows that the generalized inverse Gaussian distribution (GIG) is conjugate to the normal distribution in a normal location-scale mixture model. Lemma 3.1. Let y = (y 1 , . . . , y n ) be a normal random sample with likelihood function π(y|U ) = n i=1 N (y i |a + bU, cU ) and suppose that the prior distribution for U is The proof of this result is immediate and is omitted in the text. Moreover, note that a Ga(α, β) distribution is a particular case of a GIG with χ = 0, κ = 2β and λ = α.

Approximate dynamic quantile linear model
On the other hand, considering that data arrive sequentially, we propose an efficient and fast sequential inference procedure obtained with a closed-form solution, in order to update inference about unknown parameters online. This is useful, for example, in financial applications where one has to estimate the term structure of interest rates hourly as new data continuously arrive. In particular, the main interest in the approximate method proposed is that besides being faster than MCMC algorithms, it does not require assessing chain convergence.
The approach also explores the mixture representation of the AL distribution described in equation (3.1). Hence, posterior computation can be conveniently carried out using conventional Bayesian updating, conditional on the gamma random variable U t . We also use a normal approximation to U t 's distribution in the logarithm scale, introducing explicit dynamic behavior, once again, generalizing the model presented in equation (3.2).
The normal approximation of the gamma distribution, described in Bernardo (1981), is presented below and the proof is presented in Appendix A (Gonçalves et al., 2019).
Lemma 3.2. Using the Kullback-Leibler divergence, in a large class of transformations, we have: Therefore, an approximate conditional normal dynamic quantile regression is obtained as: Ga(a, b). The model in equation (3.4) is more flexible than the original one in equation (3.2) because it allows u t to change with time according to a random walk. Furthermore, the scale parameter here is introduced in all the model equations.
The model is completed assuming the following independent initial information: The model in equation (3.2) can be viewed as a particular case of the one proposed in equation (3.4) assuming that W u,t = 0, ∀t, m u,0 = −1/2 and C u,0 = 1 and ignoring the multiplicative factor included here just to facilitate analytical expressions. The inference procedure is described below. First we present all results conditional on both u t and φ and later we integrate out those quantities.

Normal conditional model
We start the inference procedure by exploiting the advantage that, conditional on u t and φ, we have normal distributions in one-step forecasting and posterior distributions of θ t at time t, so all the properties of a normal model can be used here. The dependence on u t appears first due to the one-step forecast distribution at time t and it will appear in the posterior distribution as time passes. Let us define u 1:t = (u 1 , . . . , u t ). Theorem 3.2 presents the main steps in the inference procedure conditional on u t .

Theorem 3.2. Assuming that the states' posterior distribution at time
and the conditions defining the model in equation (3.4), it follows that the prior distribution of θ t and the conditional predictive distribution for any time t, given u t and φ, are respectively: The conditional joint covariance between y t and θ t , given D t−1 , u 1:t , φ, is easily obtained as R t F t completing the joint normal prior for θ t and y t . Therefore, the posterior density of θ t follows as: It is worth pointing out that the mean and variance of the predictive and the state posterior distributions are functions of u t , as reinforced by the notation used. However, since u t is unknown for all t, we must find those distributions marginal on them. We will do this sequentially in the one-step forecast distribution in equation (3.5) for each time t.

Marginalizing on u t
From now on, we will rewrite the time evolution equation in (3.4) as Assuming the posterior distribution at time t − 1, In particular, we have that u t | D t−1 , W u,t , φ ∼ N (a u,t , φ −1 R u,t ) and the result (ii) of Lemma 3.2 leads to this distribution in the original (gamma) scale as: where α t = φR −1 u,t and β t = exp(−a u,t )φR −1 u,t . Thus, the one-step forecast distribution in equation (3.5) can be seen as a normalgamma mean-variance mixture, with the following different features from those stated in equation (3.1): (i) U t in this case is gamma distributed with shape parameter different from 1; and (ii) q t (u t ) is a linear function of u t with non-null linear coefficient. These comments lead us to a recent class of distributions, a variant of the AL, as described in Theorem 3.3. Theorem 3.3. The one-step ahead forecast distribution, conditional on φ and marginalized on u t arises as the convolution of an independent normal and a generalized asymmetric Laplace distribution (GAL). It can be represented as: We will refer to this as the N GAL distribution.
A brief presentation of the N GAL distribution, its moments, the characteristic function, and the proof of Theorem 3.3 are presented in Appendix B. Although, the N GAL distribution suffers from a lack of closed-form expressions for its probability density and cumulative distribution functions, they can be efficiently determined using numerical integration as discussed in Appendix B. In particular, the one-step ahead forecast mean and variance marginal on u t , can be easily obtained through properties of conditional mean and variance as: The recurrences for posterior mean and variance may also be derived using approaches that do not invoke the normal assumption, since they possess strong optimality properties that are derived when the distributions are only partially specified in terms of means and variances. The Bayes linear estimation procedure, presented in West and Harrison (1997, Chap. 4), provides an alternative estimate that can be viewed as an approximation of the optimal procedure. Theorem 3.4 presents the main steps in the inference procedure, now marginal on u t .
Theorem 3.4. The joint distribution of θ * t and y t is partially described using its first and second moments, as follows: The joint covariance between y t and θ * t , given D t−1 and φ, is obtained using the first-order Taylor series approximation exp(u t ) ≈ exp(a u,t )[1 + u t − a u,t ].
Through the Bayes linear estimation procedure, we get:

t A t and we can easily return to the normality assumption.
The discount factor strategy can be used in place of W * t . Although one of the attractive features of this approach is that estimation and forecasting can be applied sequentially as new data become available, one can use the backward-recursive algorithm and get the smoothed estimates: (3.9)

Estimating φ
The steps of the method described above are conditional on φ. In the case where φ is unknown, a practical solution is to use a plug-in estimator for φ obtained from the maximum a posteriori estimation.
The posterior distribution of φ given the observed data is given by: The density and the cumulative distribution function can be obtained numerically using the convolution form or the inversion of the characteristic function. In particular, using the convolution to represent the density function, we get: where p ζ (.) is the density of the GAL distribution, and p (.) is the density of a normal distribution with mean 0 and variance c. Also, using the inversion of the characteristic function, we get: where ϕ(.) is the characteristic function of the N GAL distribution.
The integrals in equations (3.11) and (3.12) can be evaluated numerically using current quadrature methods. For example, Kuonen (2003) discussed numerical integration using Gaussian quadrature and adaptive rules, which dynamically concentrate the computational work in the sub regions where the integrand is most irregular, and the Monte Carlo method. They concluded that adaptive Gauss-Kronrod quadrature performed best in their examples.

Applications
To illustrate the performance of the proposed model and inference procedures, we apply the method to some synthetic data and well-known time series data. Then, we apply it to the incidence of tuberculosis in Rio de Janeiro, in which it is important to assess if public health policies are effective, not only in reducing the trend in the number of cases, but also the variability of total cases. Moreover, the upper quantiles may be useful to detect an epidemic.
Although the approximate method presents less computing burden and keeps the relevant sequential analysis of the data, we interchange the use of the MCMC approach with the approximate method in the following applications.
The choice of the quantile to be tracked depends on the specific aims of the problem. In each application we arbitrarily fixed a small quantile (10%, 25%), the median, and a large quantile (75%, 90%) to be monitored, with the purpose of illustrating the method for different scenarios. However, in some specific contexts, there is a practical rationale behind this choice. For example, in the Value-at-Risk (VaR) estimation, used by financial institutions and their regulators as the standard measure of market risk, the τ -th quantile is often set to 0.01 or 0.05. Or in survival analysis, since the mean residual life function of a survival time S, given by the expected remaining lifetime given survival up to time t τ , is E(S − t τ | S > t τ ), this is especially useful when the tail behavior of the distribution is of interest.

Artificial data examples
In order to assess the efficiency of the proposed sequential procedure and the convergence of the MCMC estimation, two artificial datasets were generated. The proposed model was fitted to these datasets and its estimates were then compared to the true values used in the dataset generation process. In the first study, we generated a time series with temporal trend and seasonal component with one harmonic from a Gaussian DLM and estimated the linear predictor for 3 different quantiles and compared the results obtained from each inferential method. In the second one, the focus was to empirically evaluate the ability of our proposed model to estimate the parameters with a non-normal artificial dataset, for which a data transformation can be considered.
In both studies a non-informative prior distribution was assumed for the parametric vector with: m 0 = 0, C 0 = 10 5 , n φ = 0.001, s φ = 0.001. We took two approaches to deal with the evolution variance: (i) we set a Wishart prior distribution for W −1 t ; and (ii) we applied a discount factor δ = 0.95, setting W t = C t (1 − δ)/δ (West and Harrison, 1997, p. 51).
The results shown hereafter for the MCMC algorithm correspond to 110,000 MCMC sweeps, after a burn-in of 10,000 iterations and chain thinning by taking every 10th sample value. As the default FFBS-based approach produces results that use the posterior based on the full dataset, for comparison purposes the results for the approximate method are based in the smoothing equations in (3.9).
The DQLM was fitted for τ = 0.1, 0.5 and 0.9 and both inference approaches proposed in the paper were considered. In the MCMC algorithm, we assumed an inverse Wishart prior distribution for W with hyperparameters n w = 8 and S w = 0.1I 4 . Convergence for all the parameters was achieved and Figures 2 and 3 in Appendix C show, respectively, the trace plots with the posterior distribution of parameters θ t 's and the histograms of the posterior densities of some elements of the covariance matrix W.
Moreover, Figure 1 reports the inefficiency factor (INEFF) for some of the parameters, defined as the ratio between the numerical variance of the posterior sample mean and the variance of the sample mean, assuming uncorrelated draws from the target distribution. The larger INEFF is, the less efficient the sampling scheme will be (Gamerman and Lopes, 2006, p. 126). The INEFF values are below 10 for each parameter, which indicates that the sampler is mixing well, mainly for the lower quantile and the median. Figure 2 present the posterior summary of the level and seasonal components and the linear predictor F t θ t for each quantile (from left to right τ = 0.1, 0.5, 0.9), with the generated time series (points). The posterior mean is represented by the solid line and the 95% credible region by the shaded area.  Figure 3 presents the plot of the estimated values of the linear predictor for each quantile in the MCMC approach versus the proposed approximate algorithm. The lengths of the segments represent the 95% credibility interval obtained by the MCMC approach. We conclude that both methods produce similar results, but while MCMC takes about 5 minutes for 5,000 sweeps for a specific quantile, the approximate method takes seconds. Both algorithms were implemented in the R programming language, version 3.4.1 (R Core Team, 2017), in a computer with an Intel(R) Core(TM) i7-7700 processor with 3.60 GHz. This is first evidence of the relevance of the approximate DQLM to deal with a scalable dataset.

Non-Gaussian artificial dataset
To illustrate how the method works with a non-Gaussian dataset, we generated an artificial dataset from a first order dynamic gamma regression with mean μ t , scale parameter φ and a canonical link function η t = log(μ t ) = F t θ t . In particular, we generated T = 100 observations assuming F t = (1, x t ), where x t is an auxiliary variable at time t, G t = I 2×2 , W = 0.01I 2×2 , for all t = 1, . . . , T and φ = 50. Each auxiliary variable x t was generated independently from a uniform distribution defined in the interval (2,4).
The model in equation (3.2) was fitted to the original data and to the log transformed data, for the quantiles 0.10, 0.50, and 0.90. We particularly choose here to perform the inference using only the MCMC algorithm. Although quantile regression is invariant to monotonic transformations, that is, the quantiles of the transformed variable are the transformed quantiles of the original variable (Koenker, 2005, chapter 2, p. 34), the estimates of all the involved quantities were noticeably better when the data were transformed.
In order to validate the former result, some simulation studies were developed. Several samples were generated from the gamma model. A simple static model is proposed in this exercise, with F t = 1 and W t = 0, for all t = 1, . . . , T . Fifty replications of  samples of sizes T = 100 and T = 250 were generated using three different levels of skewness. Figure 4 reports the empirical nominal coverage of the 95% credibility intervals measured in percentages and the relative mean absolute error (RMAE) for the posterior mean of the quantiles for each case. The RMAE decreases when we apply the AL to the logarithm of the sample observations and also as the sample size increases or the distribution becomes more symmetrically distributed. The desired nominal level of 95% is best achieved when the logarithm of the sample is considered, mainly as the level of asymmetry decreases. The improvement in the results when using the log transformation is more noticeable as the skewness decreases.
Those results encourage us, as future research, to explore in this context the idea of using the AL distribution for random effects in the link function, instead of applying it to the transformed response variable.

Real data example: Nile River flow
In this section, we revisit a classic univariate time series dataset previously analyzed in the literature. We apply the proposal to the annual flow of the Nile River at Aswan from 1871 to 1970 (Cobb, 1978). This series shows level shifts, so we considered here a model which includes change points or structural breaks. The dataset is available in the R software (R Core Team, 2017).
The dataset in Figure 5 corresponds to the measurements of the annual flow of Nile River at Aswan (Egypt) from 1871 to 1970. The time series shows level shifts. The construction of the first dam at Aswan started in 1898 and the second big dam was completed in 1971, which caused enormous changes in the Nile flow and in the vast surrounding area. The application of a quantile model for this kind of dataset could be interesting to determine, for example, the return period of a flood. The return period R(y) is defined to be the mean value of a geometrically distributed random variable with probability parameter P (y > y * ) = 1 − τ . Then, R(y) = 1/(1 − τ ). In order to capture these possible level changes, we consider here a model that does not assume a regular pattern and stability of the underlying system, but can include change points or structural breaks.
A simple way to account for observations that are unusually far from their one stepahead predicted value is to describe the evolution error using a heavy-tailed distribution. The Student-t distribution family is particularly appealing in this respect for two reasons: (i) the Student-t distribution admits a simple representation as a scale mixture of normal distributions, which allows one to treat a DLM with t-distributed observation errors as a Gaussian DLM, conditional on the scale parameters; and (ii) the FFBS algorithm can still be used. Thus, we consider the DQLM with evolution characterized by the Student-t distribution, given by: (4.1) Through its degree-of-freedom parameter ν, which may also vary on time, different degrees of heaviness in the tails can be attached. This class of models is discussed in Petris et al. (2009).
In this example, we fitted a first-order polynomial dynamic model, thus we assumed F t = G t = 1 in model in equation (3.2). We fitted the model in equation (4.1) and the one with normal evolution, for the 0.25, 0.5, and 0.75-quantiles.
In both cases, for the variance of the states W , we considered a half-Cauchy prior distribution, as discussed by Gelman (2006), with scale 25, set as a weakly informative prior distribution. Although we could even assume a prior distribution for this, as described in Petris et al. (2009), we assumed ν known a priori and fixed at 2.5. Figure 6 shows the linear predictor for each quantile, with its 95% credibility interval represented by the shaded area, obtained from the normal (first column) and Student-t fits (second column). The model in equation (4.1) results in smoother linear predictors with more accurate credibility intervals. Table 1 presents the posterior mean of the linear predictor for the 0.25, 0.5, and 0.75quantiles for the model with normal and Student-t evolution around the year 1899. The abrupt regime change is better captured in the Student-t than in the normal model for all quantiles.

Tuberculosis cases in Rio de Janeiro
According to the World Health Organization (WHO), tuberculosis (TB) is one of the top 10 causes of death worldwide. Brazil is one of the countries with the highest number of cases in the world and since 2003 the disease has been considered a priority by the Brazilian Ministry of Health. As part of the overall effort to reduce the incidence and mortality rate, the Ministry of Health, through the General Office to coordinate the National Tuberculosis Control Program (CGPNCT), prepared a national plan to end tuberculosis as a public health problem in Brazil, reaching a target of less than 10 cases per 100,000 inhabitants by 2035. The plan to end the TB epidemic is a target under the Sustainable Development Goals that requires implementing a mix of biomedical, public health and socioeconomic interventions along with research and innovation. The World Health Assembly passed a resolution approving with full support the new post-2015 End TB Strategy with its ambitious targets (WHO, 2014).
The state of Rio de Janeiro is located in southeastern Brazil, with over 16 million residents in 2017. The state has one of the highest TB rates in the country. In 2015, there were 13,094 new notified cases, representing 15% of new cases for the whole country. We fitted the proposed DQLM with a trend component for monthly incidence in Rio de Janeiro from January 2001 to December 2015. Figure 8 (a) presents the posterior mean (dashed line) and the 95% credibility interval (region in gray) for the linear predictor for the 0.1, 0.5, and 0.9-quantiles, in the MCMC procedure. Figure 8 (b) presents the posterior summary of the interquantile range (IQR) between 10% and 90%. It is possible to observe a decreasing pattern for the IQR, mainly after 2008. Figure 8: Monthly incidence of TB in Rio de Janeiro state: (a) Posterior mean (dashed line) of the linear predictor for the 0.1, 0.5, and 0.9-quantiles, and their 95% credibility intervals for the 0.1 and 0.9-quantiles; (b) Posterior summary for the interquantile range between the 0.1 and 0.9-quantiles.
One of the targets of the post-2015 global tuberculosis strategy is a 20% reduction in tuberculosis incidence by 2020, compared with 2015, a 50% reduction by 2025, an 80% reduction by 2030 and a 90% reduction in tuberculosis incidence by 2035. In particular, the interest here is to predict the incidence from January 2016 to December 2020, in order to detect in terms of the median, if the target can be achieved.
In the MCMC inference procedure, samples from θ T +k , with k a non-negative integer, are obtained by propagating the samples from the posterior distribution through the evolution equation in (3.2). In the approximate method, this is done after estimating φ through maximum posterior estimation. Hence, we get: where a * T +k = G T +k a * T +k−1 and C * T +k = G T +k R * T +k−1 G T +k can be recursively calculated.
In order to capture the steep decline of new cases that represents the government target, we considered beside a linear forecast function in the original scale, a linear forecast for the log-transformation to the dataset and two higher-order models. First, we considered a quadratic growth DLM (West and Harrison, 1997, chapter 7, p. 223), which is obtained by specifying in equation (2.2) F t = (1, 0, 0) , G t = ⎛ ⎝ 1 1 1 0 1 1 0 0 1 ⎞ ⎠ and W t a diagonal matrix.
Then, we considered a local approximation by a second-order Taylor series expansion of the nonlinear Gompertz-type forecast functions (Harrison et al., 1977). The Gompertz function involves three parameters and is defined by the equation: To deal with the nonlinear form, the quadratic polynomial DLM, previously described, was used to update the corresponding three parameters, so the form in equation (4.3) was converted back and the full nonlinear form was used for the forecast function. Figure 9 presents the forecast from 2016 to 2020 of the median of the monthly incidence of TB per 100 thousand inhabitants for both alternatives using the MCMC approach, for the four models previously described. The dashed line represents the posterior mean, the region in gray is the 95% credibility interval and the red line indicates the TB reduction targets calculated for Rio de Janeiro state using 2015 as the baseline.
According to all four models, the WHO's 2020 target, estimated to be 5.28 cases per month per 100 thousand inhabitants, can be achieved in Rio de Janeiro. The Gompertz model captures the steep decline and it estimates for January 2020 a TB incidence of 5.19 (95%CI = (3.15, 8.29)) TB cases per 100 thousand inhabitants. As in the Gompertz model fitting, we estimated B T < 0 (mean = -0.99, 95%CI =(-1.12, -0.96)) and C T > 0 (mean = 0.04, 95%CI =(-0.11, 0.18)), g(t) defined in equation (4.3) is a non-increasing model converging to A t as t → ∞. Thus, considering a typical stagnation of the disease incidence, the Gompertz model seems to be the most appropriate to capture such longterm forecasts. Figure 9: Temporal predictions of monthly incidence of TB in Rio de Janeiro state from January 2016 to December 2020. The solid line represents the observed time series and the dashed line represents the mean of the predictive distribution for the median (0.5quantile), the region in gray represents the 95% credibility interval and the red line the WHO's reduction target, in each model considered.

Conclusions
In this article we propose a new class of models, named dynamic quantile linear models. For the inference procedure, we develop two approaches: (i) a MCMC algorithm based on Gibbs sampling and FFBS for the model of the location-scale mixture representation of the AL distribution; and (ii) a faster sequential method, based on approximations using Kullback-Leibler divergence and the Bayes linear method. The second inference approach has the advantage of being computationally cheaper. A suggested alternative approach is the use of SMC methods for online state and parameter estimation in the state-space model proposed. In particular, the approach presented in Yang et al. (2018) could be applied to the DQLM in equations (2.2) and (3.2). The approach considers filtering and smoothing methods via SMC for general state space models in the context of state and fixed parameters, providing a correction to the smoothing methods proposed by Carvalho et al. (2010). However, as illustrated by Yang et al. (2018), the filtering and smoothing methods with SMC could have almost the same computational time of a short MCMC, which was stated to make benchmark comparisons with a long MCMC, which requires a lot more computational time than SMC.
We evaluated the DQLM in artificial and real datasets. In the simulation study, we applied our model to a Gaussian example with trend and seasonal components where the DQLM performed well, and the approximate DQLM was a computationally efficient alternative to MCMC. We also applied our model in a non-Gaussian example generated by a gamma model, encouraging the investigation of introducing the AL model in the link function or in the response variable. In the classic real Nile River dataset example, we illustrated our proposal by fitting a model for outliers and structural breaks where the detection of occasional abrupt changes differs depending on the quantile of interest.
The application to the tuberculosis data in Rio de Janeiro, Brazil, illustrates the practical importance of evaluating quantiles instead of the mean in the context of forecasting. It also encourages us to extend the proposal to joint modeling for the quantiles and a dynamic quantile hierarchical model. In particular, our method can be applied to any infectious disease, since it is important to assess whether public health policies are effective, not only in reducing the trend in the number of cases, but also the variability of the number of total cases. Moreover, the upper quantiles can be useful for early detection of an outbreak (epidemic). If the distribution of the number of cases (represented by the quantiles) is much higher than usual, it is a strong indication that attention is required.