Bayesian prediction of jumps in large panels of time series data

We take a new look at the problem of disentangling the volatility and jumps processes in a panel of stock daily returns. We first provide an efficient computational framework that deals with the stochastic volatility model with Poisson-driven jumps in a univariate scenario that offers a competitive inference alternative to the existing implementation tools. This methodology is then extended to a large set of stocks in which it is assumed that the unobserved jump intensities of each stock co-evolve in time through a dynamic factor model. A carefully designed sequential Monte Carlo algorithm provides out-of-sample empirical evidence that our suggested model outperforms, with respect to predictive Bayes factors, models that do not exploit the panel structure of stocks.


Introduction
It has been recognised in the financial literature that jumps in asset returns occur clustered in time and affect several stock markets within a few hours or days, see for example Aït-Sahalia et al. (2015). We work with a large panel of stocks from several European markets and we aim to identify and predict joint tail risk expressed as probabilities of jumps in their daily returns. Our modelling assumptions are based on the well-known paradigms of stochastic volatility (SV) models (Taylor, 1982) combined with Poisson-driven jumps (Andersen et al., 2002). The estimation of SV models is preferably conducted by using likelihood-based approaches (Harvey et al., 1994) while Bayesian methods, in particular, are notably desirable since they deal efficiently with likelihood intractability problems; see for example Jarquier et al. (1994), Chib et al. (2002) and Kastner and Frühwirth-Schnatter (2014) for detailed discussions. This motivates us to utilize the Bayesian approach both for modelling and inference purposes. Our aim is to provide a general modelling approach for the time and cross-sectional dependence of jumps in multiple time series. The resulting Bayesian hierarchical models require careful prior specifications and sophisticated, modern Markov chain Monte To illustrate our motivation, consider the daily returns from the stocks of the STOXX Europe 600 Index over a period 10/1/2007-11/6/2014. The top panel of Figure 1 has been created by empirically identifying a jump when the return exceeds three standard deviations, where the estimators of mean and variance of each series were robustly estimated as the median and the robust scale estimator of Rousseeuw and Croux (1993) respectively; see the supplementary material (Alexopoulos et al., 2021) for the details of the empirical method used for outlier detection. The middle panel depicts a summary statistic of a SV model with jumps estimated separately for each stock return series: each dot denotes a stock return in which that particular day the probability of a jump in the series has been estimated to be greater than .5; by choosing .5 as threshold we avoid to over-or underestimate the number of jumps for each stock at each day, see the supplementary material for more details. Compared with the top panel, it provides a more sparse jump process indicating a successful separation of jumps from persistent moves modelled by the volatility process, a topic discussed in detail by, for example, Aït-Sahalia (2004). The notable feature that inspired this work is that clustering and inter-dependence of jump intensities is evident in the middle panel and, interestingly, days in which jumps have been simultaneously identified in a large number of stocks coincide with days of events that affected the financial markets worldwide as pointed out at the x-axis of the plots. Thus, one may attempt a joint Markovian modelling of jump intensities across time which might reveal some predictability of jumps. Indeed, the results of our proposed joint modelling formulation provide strong evidence for better out-of-sample predictability with estimated jump probabilities higher than .5 depicted in the lower panel of Figure 1.
We propose a modelling approach for the jump processes of SV models in which the time and cross-sectional dependence of the jump intensities are driven by a latent, common across stocks, dynamic factor model. We carefully specify informative prior distributions to the parameters of the factor model so that the implied priors for the jump Figure 1: Each dot indicates a date between 10/1/2007 to 11/6/2014 (x-axis) in which at least one jump has been identified in observed daily log-returns of 571 stocks from the STOXX Europe 600 Index. The plot in the top panel is constructed by using an empirical method for detection of outliers in the data. The dots in the middle and in the bottom panels represent probabilities of jump greater than .5 for models with independent and jointly modelled across stocks intensities respectively. intensities in each stock are in concordance with the well-known (Eraker et al., 2003) a priori expectation for one jump every few months. To avoid identifiability problems of the factor models extra assumptions for the form of the loadings matrix are needed. Nevertheless, it is recognised (Ghosh and Dunson, 2009) that MCMC algorithms developed under identifiability constraints exhibit slow convergence to the target distribution of interest. To improve the mixing properties of the proposed MCMC algorithm we follow Bhattacharya and Dunson (2011) and we do not impose the usual (see, e.g., Aguilar and West, 2000) identifiability constraints on the factor loadings parameters. We also note that there is a considerable amount of literature dedicated to the modelling of time series count data; see for example Fokianos et al. (2009), Jung et al. (2011, Pillow and Scott (2012) and Buesing et al. (2014). Our modelling perspective is applied on unobserved count data and, thus, the aim of the proposed latent factor model is twofold. First, since a few jumps are expected to occur during the observation period we borrow information across all the stocks in order to learn the unobserved processes that drive the evolution of jumps across time and stocks. Second, we exploit the dimension reduction achieved by the latent factors to preserve the scalability of our method.
Our Bayesian inference implementation strategy is based on an MCMC algorithm that alternates sampling of the latent volatility process and its parameters with sampling of the jump process and its parameters. Both these steps require drawing of the high-dimensional paths of the latent volatilities and factors from their full conditional distributions. By noting that the target distributions are proportional to the product of intractable, non-linear likelihood functions with Gaussian priors, we utilize the sampler proposed by Titsias and Papaspiliopoulos (2018) to simultaneously draw the whole path of each latent process at each MCMC iteration. This is an important feature of our methods for the following reasons. First, updating each state of a volatility process separately results in Markov chain samplers with slow mixing (Shephard and Kim, 1994). Second, although Chib et al. (2002) and Nakajima and Omori (2009) use a mixture of Gaussian distributions approximation to also simultaneously update the whole volatility path, their methods rely on an importance sampling step which is quite problematic, see Johannes et al. (2009) for a detailed discussion and Section 4.3 of this paper for a simulation that illustrates this issue.
To assess the predictive performance of the different models we compare the corresponding posterior predictive distributions by utilizing proper scoring rules (Gneiting and Raftery, 2007). We provide a full-fledged quantitative evaluation of the obtained forecasts by calculating logarithmic scores, such as predictive log Bayes factors (Geweke and Amisano, 2010), interval and continuous ranked probability scores (Gneiting and Katzfuss, 2014) as well as root mean squared errors. We estimate the posterior predictive distributions by employing sequential Monte Carlo methods such as the particle filters (Chopin and Papaspiliopoulos, 2020a) and the annealed importance sampling (Neal, 2001) algorithms.
The structure of the remaining of the paper is as follows. In Section 2 we provide a review of the literature related to univariate and multivariate SV models in econometrics as well as the modelling of jumps in financial applications. Section 3 presents our proposed modelling framework and Section 4 describes the methods that we develop to conduct Bayesian inference for the proposed model. In Section 5 we present the computational techniques that we use to assess the predictive performance of the proposed model. Section 6 presents results and insights from the application of our methods on the real dataset and Section 7 concludes with a small discussion.

Related work
There are two main classes of models that have been considered in the literature for the modelling of financial returns. The first class is based on observation-driven models where the time evolution of the variance of the returns is modelled by using past observations and variances. These models are known as autoregressive conditional heteroscedasticity (ARCH) models and have been developed by Engle (1982). A very popular parametrization, the generalized ARCH model, introduced later by Bollerslev (1986) while Bollerslev (1987) and Harvey and Chakravarty (2008) presented modifications that account for the heavy tails of the distributions of the observed returns; see also Harvey (2013) and references therein. Creal et al. (2011) extended this modelling approach in the multivariate case; for the Bayesian analysis of these models see, for example, Vrontos et al. (2000) and Vrontos et al. (2003). The second class is known as parameter-driven models where the time evolution of the variance of the returns is modelled by employing latent stochastic processes. The models that belong to this class are the SV models which constitute the base of our proposed modelling approach.
The SV model proposed by Taylor (1982) in order to account for the time evolution of the volatilities in financial time series while Hull and White (1987) and Chesney and Scott (1989) have employed the SV model as discretization of continuous time SV diffusion processes. As mentioned earlier some early works on the Bayesian estimation of SV models include, but are not limited to, the papers presented by Jarquier et al. (1994), Kim et al. (1998) and Chib et al. (2002). Omori et al. (2007) dealt with efficient Bayesian estimation of the SV model by taking into account the correlation between the observation error and the error of the stochastic variance process; see also Silva et al. (2006) and Delatola et al. (2011) for other extensions of the basic SV model. A more recent contribution on the Bayesian analysis of univariate SV models has been offered by Kastner and Frühwirth-Schnatter (2014) while Zhang et al. (2020) developed a new class of SV models with autoregressive moving average errors. Considering a jump component to model the empirically observed fat tails of stock returns proposed, in his pioneering paper, by Merton (1976) while Bates (1996) and Andersen et al. (2002) combined jump diffusions with SV models. Bayesian estimation of SV models with jumps has been considered, among others, by Chib et al. (2002), Omori (2009) andEraker et al. (2003) and Johannes et al. (2009) where the latter two papers account for jumps in the volatility process as well; see also Bandi and Renò (2016) for more recent research on the occurrence of contemporaneous jumps in the returns and volatilities.
To conduct joint modelling of the observed financial time series, multivariate SV models have been considered more than twenty years ago; see for example Harvey et al. (1994) and Asai et al. (2006) for a review. The existing multivariate SV models assume either constant correlations over time or some form of dynamic correlation modelling through factor models with factors being independent univariate stochastic volatility models. The latter approach has been considered, for example, by Chib et al. (2006), Kastner et al. (2017) and Kastner (2019). In a similar framework Chan and Eisenstat (2018) and Kastner and Huber (2020) deal with Bayesian estimation of time varying parameter vector autoregressive models with stochastic volatility. An alternative multivariate SV modelling approach proposed recently by Dellaportas et al. (2015) where all the elements of the covariance matrix of the observed time series evolve over time.
More closely related to the present paper, Bollerslev et al. (2008), Jacod et al. (2009) and Clements and Liao (2017) study the presence of common or not common jumps in multiple time series. Aït-Sahalia et al. (2015) model the dependencies of jumps by combining the SV model with a multivariate Hawkes process (Hawkes, 1971) that models the propagation of jumps over time and across assets by introducing a feedback from the jumps to their intensities and back. They employ the generalised version method of moments to estimate the parameters of the model. We close this literature review section by noting that tools for the identification of non-observed events that affect the evolution of time series have been also developed by Hamilton (1989) and Billio et al. (2012) that deal with the problem of identifying structural changes in time series of economic indices.

Notation
We denote the univariate Gaussian distribution with mean m and variance s 2 by N (m, s 2 ), and its density evaluated at x by N (x|m, s 2 ); N d (X|M, S) denotes the density of the d-variate normal distribution with mean M and covariance matrix S evaluated at X; Gam(α, β) and IGam(α, β) denote the gamma, with mean α/β, and inverse gamma, defined as the reciprocal of gamma, distributions respectively; U (α, β) the uniform distribution on interval (α, β); Γ(·) denotes the gamma function. Index i is used for stocks and index t for times, e.g., r it is the ith stock return at day t; when we introduce factor models index k denotes factor. Upper case letters denote vectors and matrices, and lower case letters denote scalars. For a collection of variables indexed by time or stock and denoted by a lower case character, the corresponding upper case character denotes their vector or matrix representation, e.g. R denotes the matrix whose elements are r it ; R i = (r i1 , . . . , r iT ), and R t = (r 1t , . . . , r pt ) are then generic rows and columns of R; all vectors are understood as column vectors. A subscript, of the form (t+1) : (t+ ), where is a positive integer, indicates the collection of vectors with subscripts t + 1, . . . , t + ; e.g. R t+1:t+ is the collection of vectors R t+1 , . . . , R t+ . The transpose of a vector or matrix is denoted by the superscript ; e.g. R is the transpose of R; 1{·} denotes an indicator function that takes value 1 if the event in brackets is true and 0 otherwise; for a function f (x) we set ∇ =: (∂f /∂x 1 , . . . , ∂f/∂x d ).

Stochastic volatility with jumps
We model stock returns as ) and it and η it are independent. The number of jumps of the ith stock at time t follow a Poisson distribution, where λ it denotes the corresponding jump intensity, Δ it is a time-increment associated to each return and ξ κ it ∼ N (μ iξ , σ 2 iξ ) is the κth jump size. We explicitly take into account that stock returns are computed over varying time increments Δ it , such as in-between weekends and holidays, hence λ it is the daily jump intensity.
We model the parameters μ i , φ i and σ 2 iη of each log-volatility process and the parameters μ iξ and σ 2 iξ of the jump sizes as independent across stocks. For μ i , φ i and σ 2 iη we follow the related literature (Kim et al. (1998), Omori et al. (2007), Kastner and Frühwirth-Schnatter (2014)) and take μ i ∼ N (0, 10), (φ i + 1)/2 ∼ Beta(20, 1.5) and σ 2 iη ∼ Gam(1/2, 1/2). For μ iξ and σ 2 iξ we choose proper priors. In fact, improper priors will lead to improper posterior. This is due to the fact that according to the model there is positive probability that there are no jumps for the whole period of time, i.e., the event n it = 0 for all t, has positive prior probability, and in such an event, μ iξ and σ 2 iξ are unidentifiable in the likelihood. Instead, we take σ 2 In this specification, E[σ 2 iξ ] = range 2 i /36, hence we match the variance of jump sizes with a quantity that relates to the sample variance of returns. A same reasoning is followed for setting the prior variance of μ iξ and we have found that the choice of the multiplier, here taken 5, is not crucial in the analysis.

Modelling jumps independently across stocks and time
By modelling Λ i as independent random vectors, we obtain p independent univariate SV with jumps models. Additionally, each vector can be taken as one of independent elements, λ it ∼ Gam(δ, c), in which case one obtains the models described, among others, by Bates (1996), Chib et al. (2002) and Eraker et al. (2003). It follows that n it has, marginally with respect to λ it , a negative binomial distribution with density where β = c/(c + Δ it ). The mean of the distribution is δ(1 − β)/β and the variance is δ(1 − β)/β 2 . Taking δ ≤ 1 ensures that the density is monotonically decreasing. This is a feature we are interested in: jumps are meant to capture infrequently large price movements, and a priori we expect that with highest probability there is no jump and it is more likely to have less than more jumps, if any. We can choose c such that the probability of no jump at a given time increment Δ it , is at least γ ∈ (0, 1), a userspecified threshold. Taking for example δ = 1 and c = 50 results in γ = 0.98 with E(λ it ) = 0.02. These choices are consistent with the prior expectation, in financial applications, for one jump every few months; see for example Eraker et al. (2003) for a more detailed discussion.

Joint modelling of jumps
To capture the dependence of the jumps over time and across stocks we model λ it by using a dynamic factor model which is specified as follows. We first transform λ it to y it via which implies that λ it < λ * . The parameter λ * is, thus, the upper bound for the jump intensities of all the stocks at any time point. This is an important feature of our modelling approach since it allows to incorporate the necessary (Chib et al., 2002;Eraker et al., 2003) prior information about the expected number of jumps which are typically described as large but infrequent moves. Furthermore, as discussed below, by carefully choosing this upper bound we are able to construct a prior for the jump intensities which is similar to the Gam(1, 50) distributions that we used in the case of univariate SV models with jumps. We also note that this type of information is not necessary only in Bayesian applications; see for example Honore (1998) for a similar discussion related to maximum likelihood estimation of models with jumps. Then, we model the time-dependence of jumps via K latent factors F t modelled as independent autoregressive processes. The cross-sectional dependence of jumps is captured by a p × K matrix of factor loadings W and a p × 1 vector B. The joint model specification is . It is known that parameters of latent factor models are only identifiable up to certain transformations, such as orthogonal rotations and sign changes (Aguilar and West, 2000). However, latent factor models are also used as low-rank predictive models in which case the out-of-sample prediction is of prior importance; see Bhattacharya and Dunson (2011). This is the perspective we adopt here, where we try to borrow strength from the information in large panels of stocks in order to predict with higher accuracy individual jumps by using a parsimonious factor model. Therefore, our prior specification imposes no identifiability constraints on the parameters of the factor model and are taken as w ik The specification of the hyperparameters of the priors assigned on the parameters A, B and W require extra care because they affect, through (3.5), the imposed prior on λ it . The fact that our modelling formulation of Section 3.2 was based on informative priors Gam(1, 50) provides a way to elicit informative prior specification of the hyperparameters σ 2 w , σ 2 b and μ b . We first set λ * = 0.15 which is the 0.0005 upper quantile of the Gam(1, 50) density. As discussed earlier, this an important characteristic of our modelling approach since it penalizes the estimation of a large number of jumps which is not consistent with the prior information for one or two jumps every 100 days. A small sensitivity analysis, not reported here, confirms that choosing a value considerably greater than 0.15 does not allow the efficient separation of the jumps from the underlying volatility process of the stock daily returns. Moreover, by using values notably smaller than 0.15 we estimate quite low jump intensities and this can result in less accurate prediction of future stock returns; in the supplementary material we compare predictions obtained under different prior assumptions for the expected number of jumps. We have concluded that a value for λ * between 0.1 and 0.2 does not have Prior for λ it Mean Variance Mode Gam(1, 50) 0.020 0.00040 0 Factor model 0.003 0.00004 0.0001 Table 1: Mean, variance and mode of the Gam(1, 50) prior distribution assumed for the jump intensities λ it in the independent model and for the prior induced by the dynamic factor model. a substantial effect on the results presented in the rest of the paper. Then, we choose the hyperparameters σ 2 w , μ b and σ 2 b to be such that the mean, the variance and the mode for the resulting prior on each λ it in (3.5) are comparable with the corresponding quantities of the Gam(1, 50) distribution. Based on calculations presented in the supplementary material we set σ 2 w = 0.5, σ 2 b = 1 and μ b = −5. In Table 1 we display the quantities of interest in the two prior distributions. To evaluate the differences between the two priors we also examine the impact of different choices for their hyperparameters in the forecasts of future observations. See the supplementary material for the related results.

Sampling from the posterior of interest
Our interest lies on the joint posterior of parameters and latent states p(H, N, Ξ, W, Θ, F, A|R). To draw samples from the posterior of interest we design an MCMC algorithm which proceeds as follows. At each MCMC iteration we perform stock-specific updates of H i , N i , Ξ i , W i , Θ i and then we update the latent factors F and their parameters A.
An important characteristic of the designed MCMC algorithm is that before updating the log-volatilities H i and the number of jumps N i we integrate out the jump sizes Ξ i . This results in an efficient simultaneous update of the log-volatilities vector and direct sampling of the full conditional of the number of jumps N i . Algorithm 1 summarizes the steps of the MCMC sampler. The algorithm presents how the full conditional densities of vectors of parameters are sampled by exploiting the conditional independence structure of the hierarchical model defined by equations (3.1)-(3.3) and (3.5)-(3.8). In Sections 4.1 and 4.2 we will describe in detail the more elaborate steps 4, 6, 7 and 12 of Algorithm 1 whereas the details of the other sampling steps are given in the supplementary material.
We also emphasize that by switching off certain steps of Algorithm 1 we obtain a novel MCMC sampler for existing models. More precisely, Bayesian inference for univariate SV with jumps models can be conducted if the step in the 10th line is skipped and the step in the 12th line replaced with the step of drawing the independent jump intensities of the model. Switching off the steps in lines 6-10 and 12 results in an MCMC algorithm for univariate SV without jumps models. By removing the steps in lines 4-9 of Algorithm 1 we obtain a sampler for a Cox model (Cox, 1955) with intensities driven by latent factors.
which is an inverse Gamma density. 10:

Sampling the number of jumps and jump sizes
To sample each n it we first integrate out the jump sizes ξ it and then we construct a rejection sampling algorithm to draw from p( . This algorithm is based on Proposition 1. Notice also that a discrete distribution with pmf p(n) is called log-concave if p(n) 2 ≥ p(n − 1)p(n + 1) is satisfied for all n; see for example Devroye (1986) for more details. For the remaining of this subsection the unnormalized Proof. See the supplementary material.
Due to Proposition 1, the target distribution is a unimodal distribution (Devroye, 1986). Let m be an integer at the right of the mode of the distribution. We define the probability mass function (pmf) . The pmf q m (n) is proportional tõ p(n) for n < m and proportional to the geometric pmf with parameter, (p(m) − p(m + 1))/p(m), for n ≥ m. By noting that if a random variable X follows the exponential distribution with parameter, log(p(m)) − log(p(m + 1)), then X follows the geometric distribution with parameter (p(m) −p(m + 1))/p(m), we observe that q m (n), for n ≥ m, corresponds to an exponential density that goes through the points (m,p(m)) and (m + 1,p(m + 1)). Log-concavity of the distribution implies that cq m (n) ≥p(n), for n ≥ m, since any straight line that goes through the points Figure 2:p(n) and its bounding line in the log-scale. Circles: log(p(n)), stars: log(q m (n)).

Algorithm 2 Rejection sampler for the number of jumps.
Generate then generate n from q m (n) truncated from the right at the point m − 1, by using the inversion method. else repeat: generate V ∼ U (0, 1) generate n from the geometric pmf with parameter (p(m) −p(m + 1))/p(m), truncated from the left at the point m.
until V cq m (n) ≤p(n) end if return n (m, log(p(m))) and (m + 1, log(p(m + 1))) bounds from above the logarithm ofp(n). See Figure 2 for an illustration. A rejection algorithm to sample from the distribution r it ) proceeds as summarized by Algorithm 2. Note that Algorithm 2 implies the evaluation ofp(0), . . . ,p(m + 1) at each iteration of the proposed MCMC algorithm.

Jump sizes
Drawing the jump sizes ξ it given the number of jumps n it is based on the following proposition. Let I n and 1 1 1 n be the n × n identity matrix and an n-dimensional vector with ones respectively.
Proposition 2. According to the model defined by equations (3.1) and (3.2) we have Proof. See the supplementary material.

Log-volatilities and factors: MCMC for latent Gaussian models
In the 4th and 12th lines of Algorithm 1 we sample the paths of the log-volatilities H i and factors F respectively. Moreover, each path is updated jointly with the parameters of the corresponding autoregressive models defined by equations (3.2) and (3.8). These steps correspond to the joint sampling of latent paths and parameters of latent Gaussian models. A latent Gaussian model is defined in terms of a non-Gaussian likelihood and a latent Gaussian field controlled by some parameters. By denoting with X a d-dimensional random vector that corresponds either to a latent log-volatility or to a factor path, with x its realization and with θ any parameters we target the density where g(x) is the log-likelihood function, C θ denotes the covariance matrix of the latent Gaussian field, parametrized in terms of θ and π(θ) denotes the density of the prior for θ. We have assumed, without loss of the generality, that the latent Gaussian field has zero mean. We also note that only throughout the present Section we violate our notation and we use lower-case letters to denote the realizations of a random vector; the dependence on any data has been suppressed.
To draw samples from the target in (4.1) two approaches have dominated in the literature. One of them is to alternate sampling from the conditionals p(x|θ) and p(θ|x). Sampling from p(x|θ) is quite challenging since usually corresponds to a high dimensional distribution. To draw the desired samples several well-established methods, based on the Metropolis-Hastings algorithm, have been used, see e.g. Roberts and Stramer (2002), , Girolami and Calderhead (2011) and Cotter et al. (2013); in the latter approach the proposal distribution utilizes information both from the likelihood and the Gaussian prior of the latent states and this is an important feature of the proposed sampler (Titsias and Papaspiliopoulos, 2018). Sampling from p(θ|x) is also challenging since the parameters θ are highly correlated with the Gaussian latent states of the model. See, for example, Papaspiliopoulos et al. (2007),  and Yu and Meng (2011) for efficient methods that have been developed for the update of θ, conditional on the latent Gaussian field, in a more general class of Bayesian hierarchical models. In the second approach, joint sampling of the latent states x and the parameters θ is conducted. This perspective is considered, among others, by Knorr-Held and Rue (2002) where the latent states and the parameters are updated in, carefully selected, blocks and by Titsias and Papaspiliopoulos (2018) where a sampler which utilizes auxiliary variables to perform updates that are both prior and likelihood informed has been constructed.
In the present paper we follow the latter approach and we update jointly the latent states and the parameters of a given model by combining the methods proposed by Yu and Meng (2011) and Titsias and Papaspiliopoulos (2018). Following Yu and Meng (2011) we interweave the so-called (Papaspiliopoulos et al., 2007) centred and noncentred parametrization of the latent states and by utilizing the algorithm developed by Titsias and Papaspiliopoulos (2018) we sample jointly x and θ under each one of the two parametrizations. We show that by utilizing the proposed Metropolis-Hastings step we find efficiency gains relative to alternative samplers that can be used to update the latent log-volatilities, the factor paths and their parameters. In the rest of the Section we present the proposed methodology in the case of a general latent Gaussian model and in the supplementary material we provide the details for the application of the method in order to sample the latent factors F as well as each log-volatility path H i and their parameters.

Metropolis-Hastings for the joint update of latent paths and parameters
To construct a Metropolis-Hastings step in order to sample jointly x and θ we work as in Titsias and Papaspiliopoulos (2018) and we introduce auxiliary variables u ∼ N d (x, (δ/2)I d ). We consider, thus, the expanded target (4.2) Samples from the expanded target can be drawn by first drawing u from p(u|x) = N d (u|x, (δ/2)I d ) and then employing the Metropolis-Hastings algorithm to draw from the intractable p(x, θ|u). To perform the latter update we utilize a proposal distribution of the form q(x , θ |x, θ, u) = q(x |x, θ , u)q(θ |θ). (4.3) To construct q(x |x, θ , u) we approximate the intractable log-likelihood g(·) by employing a first order Taylor expansion of g(·) around the current value x. We obtain, then, the proposal density To reduce the costs of generating from the proposal in (4.4) as well as evaluating the resulting acceptance ratio we follow Titsias and Papaspiliopoulos (2018) and we define new auxiliary variables z by using the reparametrization z ≡ u + (δ/2)∇g(x) ∼ N d (x + (δ/2)∇g(x), (δ/2)I d ).
The expanded target in (4.2) becomes now Algorithm 3 Auxiliary gradient-based sampler to draw samples from (4.1).
while the proposal in (4.4) can be written as where Z(z, θ ) = N d (z|0, C θ + (δ/2)I d ). Notice that, importantly, the proposed x becomes conditionally independent from the current x given the auxiliary variable z. In particular, the proposal q(x , θ |x, θ, u) in (4.3) becomes q(x , θ |x, θ, z) = q(x |θ , z)q(θ |θ), and the acceptance ratio of the move is . In our application we choose q(θ |θ) to be a simple random walk proposal distribution with step-size κ. Algorithm 3 summarizes the proposed Metropolis-Hastings step for the joint update of x and θ. Finally, we note that following Titsias and Papaspiliopoulos (2018) we learn κ and δ during the burn-in period by decomposing step 2 in Algorithm 3 in two sub-steps. We first update x alone and tune δ in order to achieve an acceptance ratio between 50% and 60%. Then, we update (x, θ) jointly and we tune κ to obtain acceptance ratio in the range of 20% to 30%.

Interweaving strategy for the parameters
Although drawing the latent path x jointly with the parameters θ aims to overcome convergence problems that occurring due to their high dependence, it is well-known that the parametrization of latent Gaussian models as well as of more general Bayesian hierarchical models plays a key role in the efficiency of MCMC algorithms (Papaspiliopoulos et al., 2007). To address convergence issues that arise from the parametrizations of the latent log-volatility models and from the latent factor model for the jump intensities we employ the, so-called, ancillarity-sufficiency interweaving strategy (ASIS) developed by Yu and Meng (2011); see also Kastner and Frühwirth-Schnatter (2014) for the benefits of the ASIS in the case of univariate SV models and Simpson et al. (2017) where the ASIS has been applied in more general dynamic linear models. In the case of a latent Gaussian model with latent states x and parameters θ the application of ASIS is briefly described as follows. After the joint update of x and θ with the Metropolis-Hastings step described by Algorithm 3 we transform the latent path x to its non-centred parametrizationx by using an one-to-one transformation. We re-draw then the parameters θ from p(θ|x) before transformingx back to x by using the inverse transformation.
To employ ASIS in our application we first note that equations (3.1)-(3.2) and (3.5)-(3.8) define the centred parametrizations (Papaspiliopoulos et al., 2007) of the corresponding models. The non-centred parametrization for the latent states of a latent Gaussian model can be found by following the reasoning developed by Yu and Meng (2011): under a non-centred parametrization x has to be an ancillary statistic for θ, i.e., the distribution of x needs to be free of θ. We note that there are cases where a many-toone transformation is needed (Papaspiliopoulos et al., 2003) but this is not required in our application. It is easy to see that in the case of the latent log-volatilities h it defined in (3.2) the variablesh it = (h it − μ i )/σ iη are in a non-centred parametrization while in the case of the latent factors defined by (3.7) the transformation Γ t = F t − AF t−1 results in a non-centred factor model for the jump intensities. ASIS is completed by re-drawing the log-volatility parameters μ i , φ i and σ 2 iη and the factor parameters A under the non-centred parametrizations and by using the inverse transformation to return back to h it and F t .
In the supplementary material we provide all the details for our implementation of ASIS along with simulation studies which emphsize the gains obtained by using the ASIS. By describing briefly the results we note that combining ASIS with the auxiliary gradient-based sampler (Algorithm 3) improves substantially the efficiency of the proposed MCMC algorithm (Algorithm 1) in drawing samples from the parameters

Approximation by using a mixture of Gaussian distributions
For a given stock, sampling the whole path of the latent log-volatilities H i at each MCMC iteration could be performed by utilizing the methodology developed by Chib et al. (2002) and Nakajima and Omori (2009). Their methods are based on the idea of Kim et al. (1998) to approximate a SV without jumps model by using a mixture of Gaussian distributions.
The advantage of their approach is that the update of H i can be conducted by drawing samples directly from its full conditional distribution which is a (T + 1)-variate Gaussian with tridiagonal inverse covariance matrix. The jumps sizes and their parameters should be updated after integrating out the mixture component indicators from the likelihood of the approximated model, otherwise convergence of the corresponding MCMC algorithm will be slow (Nakajima and Omori, 2009). Integrating out the mixture component indicators results in a partially collapsed Metropolis-Hastings within Gibbs sampler which has to be implemented with care since permuting the order of the updates can change the stationary distribution of the chain (Van Dyk and Jiao, 2015). If, for example, the update of the mixture indicators is conducted in the step which is intermediate between the update of the latent log-volatilities and the update of their parameters, then the Markov chain will not converge to the desired stationary distribution.
More importantly, by using the methods developed in Chib et al. (2002) and Nakajima and Omori (2009), samples from the posterior of the parameters of the exact model are obtained with an importance sampling step. In this step, weights that are equal to the ratio between the likelihood of the exact and the approximated model, are assigned to the MCMC samples obtained from the approximated model. Although in the case of SV models without jumps the described importance sampling step works without problems, it is known (Johannes et al., 2009) that in the case of models with jumps such weights may have large variance since a few or only one of them could be much larger than the others. Thus, the output of the importance sampling step will not be suitable for any Monte Carlo calculation. For an illustration of the problem we conducted a simulation experiment in which we calculated the required importance sampling weights for SV models with and without jumps. In particular, we simulated p = 1 time series with T = 1,500 log-returns from each model by utilizing equations (3.1)-(3.2) and omitting the jump component in (3.1) to simulate from the model without jumps. In the case of the SV model with jumps we assumed independent jump intensities over time following the modelling approach presented in Section 3.2. We chose values for the parameters of the log-volatility process which are common in the financial literature (Kim et al., 1998); μ 1 = −0.85, φ 1 = 0.98 and σ 1η = 0.15 and we simulated the sizes of possible jumps from the Gaussian distribution with μ 1ξ = 0 and σ 1ξ = 3.5 (Eraker et al., 2003). Then, we drew 3,000 (thinned) posterior samples of the parameters and latent states of the approximated models by using the corresponding MCMC algorithms. Figure 3 displays the log-normalised importance sampling weights plus the logarithm of their number. The variance of the normalized importance sampling weights should become smaller as the importance sampling approximation improves. This implies that the histograms in Figure 3 should be centered at zero in the case of an accurate enough approximation; see the corresponding plots in Kim et al. (1998) and Omori et al. (2007). The left panel of Figure 3 indicates that there is little effect of the mixture approximation in the case of the model without jumps, but the right panel points out that this is not true in the case of the SV with jumps model. The bad performance of the approximation strategy in models with jumps is due to the fact that the mixture model has been proposed by Kim et al. (1998) and improved by Omori et al. (2007) for SV models without jumps. By adding a jump component in the model the mixture approximation has to take into account the uncertainty in the estimation of the jump process and this is not the case in the approximation used by Chib et al. (2002) and Nakajima and Omori (2009).

Out-of-sample forecasting
To compare the predictive performance of different SV models we develop a computational framework for out-of-sample forecasting. Based on T in-sample and outof-sample observations we assess the forecasts obtained from the various models by calculating proper scoring rules (Gneiting and Raftery, 2007). A scoring rule assigns a numerical score to a probabilistic forecast by taking into account the posterior predictive distribution and the realized observation and it is proper if the expected score is maximized when the estimated predictive distribution coincides with the true distribution. Proper scoring rules allow for the joint evaluation of the calibration (compatibility of forecasts and observations) and the sharpness (concentration of the predictive distributions); see, for example, Geweke and Amisano (2010) and Krüger et al. (2020) for more detailed discussions. We focus on three proper scoring rules: (i) logarithmic scores, and particularly, the predictive log Bayes factors which compare the models with respect to their marginal likelihoods, (ii) continuous ranked probability scores which are employed for the assessment of the predictive cumulative distribution functions (CDFs) and (iii) interval scores which are utilized for the evaluation of the prediction credible intervals. We complete a full-fledged assessment of the predictive performance of the different models by calculating root mean squared errors of the obtained forecasts.
To calculate these summary measures we draw samples from the posterior predictive distributions of interest and we estimate their densities by utilizing sequential Monte Carlo methods (Chopin and Papaspiliopoulos, 2020b). In the rest of the Section we present the computational framework that we built in order to conduct out-of-sample forecasts with univariate SV models with and without jumps as well as by using the proposed joint modelling approach for the jump intensities. To evaluate the accuracy of the obtained estimators we calculate the effective sample size of the samples drawn from the posterior predictive distributions. The effective sample size of a given sample is commonly employed (Neal, 2001;Chopin et al., 2013) to assess the variance of estimators obtained from weighted rather than independent samples; see in the supplementary material for more details. For the remaining of this section we assume that the parameters Θ, A and W are fixed to their in-sample estimated posterior means and we omit them from our notation.

Prediction with independent SV models
Let M svj and M sv be the model that consists of p independent, univariate SV models with and without jumps respectively. Let also R 1:T and R T +1:T + denote in-sample and out-of-sample observations. To calculate predictive Bayes factors and the continuous ranked probability and interval scores we consider the posterior predictive distributions with densities p(r it |r i,1:t−1 ), t = T + 1, . . . , T + . The predictive log Bayes factors are defined as log(p(R T +1:T +j |R 1:T , M svj )) − log(p(R T +1:T +j |R 1:T , M sv )), j = 1, . . . , , where for both M svj and M sv we have that Moreover, if l it and u it denote the α/2 and 1 − α/2 quantiles of a (1 − α) × 100% prediction interval for the value of r it then its interval score is given by the formula We note, finally, that the calculation of the continuous ranked probability scores that we employ for the evaluation of predictive CDFs is also based on samples from p(r it |r i,1:t−1 ); see the supplementary material for details.
We draw the desired samples and we estimate the densities p(r it |r i,1:t−1 ) based on the formula and by utilizing sequential Monte Carlo methods also known as particle filters (Chopin and Papaspiliopoulos, 2020a). From the output of a particle filter algorithm that targets the posterior in (5.3) we are able to draw samples from the posterior predictive distribution of r it and to estimate the normalizing constants p(r it |r i,1:t−1 ). See in the supplementary material for more details and for the pseudocode of the particle filter algorithm.

Prediction by using joint modelled jump intensities
Let M mvj be our proposed SV with jumps model, defined by equations (3.1)-(3.3) and (3.5)-(3.8). To compare the predictive performance of M mvj with the performance of competing models we need to estimate the predictive densities p(R T +j |R 1:T +j−1 , M mvj ), for each j = 1, . . . , . This could be achieved with a particle filter algorithm, constructed similarly to the case of independent SV models, as described in Section 5.1. However, in the case of M mvj , particle filter algorithms deliver poor estimations of the marginal likelihoods since the importance weights, calculated at each of their steps, have large variance.
We develop an alternative sequential importance sampling algorithm to obtain the desired estimates which is based on the annealed importance sampling (AIS) proposed by Neal (2001). AIS utilizes the method of simulated annealing (Kirkpatrick et al., 1983) to construct an importance sampling algorithm for drawing weighted samples from an unnormalized target distribution. The proposal distribution of the algorithm is based on a sequence of auxiliary distributions and Markov transition kernels that leave them invariant. As in any importance sampling algorithm, the sample mean of the importance weights is an estimation of the ratio between the normalizing constants of the target and proposal distributions. Moreover, the output of AIS can be used for the estimation of expectations with respect to any of the auxiliary distributions used for the construction of the algorithm. Here we exploit these features of AIS to draw samples from the posterior predictive distributions of interest and to estimate their densities. In the rest of the section we provide a detailed description of how we apply AIS, M mvj is suppressed throughout except when it is necessary.
Let g 0 , . . . , g be the required sequence of auxiliary distributions. By noting that each g j must be proportional to a computable function and satisfies g j = 0 wherever g j−1 = 0 we set g (H 0:T + , F 0:T + ) = p(H 0:T + , F 0:T + |R 1:T + ) and g j (H 0:T + , F 0:T + ) = p(H 0:T + , F 0:T + |R 1:T +j ) = p(H 0:T +j , F 0:T +j |R 1:T +j ) for j = 0, . . . , − 1. Samples from g j , j ≥ 1, are obtained by drawing samples from g j−1 using a few iterations of Algorithm 1 and the transition densities p(H T +j |H T +j−1 ) and p(F T +j |F T +j−1 ). To sample from g 0 we utilize samples from p(H 0:T , F 0:T |R 1:T ). We compute the importance weights of the obtained samples by first noting that g j are proportional to the computable functions g * j (H 0:T + , F 0:T + ) = p(H 0:T +j , F 0:T +j , R 1:T +j ) and g * (H 0:T + , F 0:T + ) = p(H 0:T + , F 0:T + , R 1:T + ). Therefore, the importance weight for every sampled point is and Ω s T +1 = p(R T +1 |H s T +1 , F s T +1 ), s = 1, . . . , S, where S is the number of samples obtained by AIS and We also note that the infinite sum in (5.4) can be truncated without affecting the results of the algorithm (Johannes et al., 2009). We refer to the supplementary material for the pseudocode that we used to construct the described AIS algorithm.
The sample means of the weights calculated at each step of AIS provide estimators of the marginal likelihoods p(R T +1:T +j |R 1:T ). However, the resulting weights have large variance and the corresponding estimates will be inaccurate. Nevertheless, and in contrast with a particle filter that targets p(H 0:T + , F 0:T + |R 1:T + ), we can use the samples {H s 0:T +j , F s 0:T +j } S s=1 to estimate the quantities p(r i,T +1:T +j |R 1:T ) for each stock separately. We propose the following sampling procedure. First, note that This implies that by using only the ith element of the product in (5.4) we obtain an estimator of the ith marginal of p(R T +1:T +j |R 1:T ) as follows. We setp(r i,T +1:T +j |R 1: and ω s i,T +1 = 1. Thus, p(r i,T +1:T +j |R 1:T , M mvj ) is estimated byp(r i,T +1:T +j |R 1:T ) accurately since the latter is based on importance sampling weights with reduced variance and can be used for the estimation of log Bayes factors of the form log(p(r i,T +1:T +j |R 1:T , M mvj )) − log(p(r i,T +1:T +j |R 1:T , M svj )), (5.5) where p(r i,T +1:T +j |R 1:T , M svj ) can be estimated with the methods presented in Section 5.1. We also note that by assuming that the joint predictive density of all returns is estimated by the product of their marginal predictive densities, as in the independent across stocks SV with jumps models, we estimate the approximate log Bayes factors

Real data application
Here we present results from the application of the developed methodology on real data. In the supplementary material we present results from simulation studies used to test our methodology. Further specifics of the data analysis, such as parameter estimates for the log-volatility processes, can be also found in the supplementary material as well as in Alexopoulos (2017).

Data and MCMC details
We applied the developed methods on time series consisted of daily log-returns from stocks of the STOXX Europe 600 Index downloaded from Bloomberg between 10/1/2007 Figure 4: First row: posterior mean of the paths of the three autoregressive factors F 1 , F 2 and F 3 that we used to model the evolution of the jump intensities across stocks and across time. Second row: posterior (solid lines) and prior (dotted lines) distributions of the persistent parameters α 1 , α 2 and α 3 .
to 11/6/2014. We removed 29 stocks by requiring each stock to have at least 1000 traded days and no more than 10 consecutive days with unchanged price. The final dataset consisted of p = 571 stocks and 1947 traded days. We used daily log-returns observed in the first T = 1917 days as in-sample observations to estimate all models described through our article and the last = 30 observations to test their predictive ability. In our dynamic factor model we used K = 3 factors. We ran all the MCMC algorithms for 250,000 iterations using a burn-in period of 100,000 sampled points and we collected 1,000 (thinned) posterior samples. In the supplementary material we give the computing times for the real and simulated data analysis that we have implemented and we evaluate the accuracy of the sequential Monte Carlo methods that we used to conduct the assessment of the predictive performance of the different models. Figure 4 depicts the posterior means of the factor paths along with the posterior distributions of their persistent parameters α 1 , α 2 and α 3 . The posterior densities of α 1 and α 2 indicate strong evidence of autocorrelation in the first two factor processes while that of α 3 suggests that a model with only K = 2 factors adequately captures the dependence of the jumps over times and across stocks. Therefore, all results in the following are based on K = 2 factors.

Bayesian inference for parameters and latent states
By noting that we have applied our methods on a very high-dimensional dataset we present posterior distributions for summary statistics related to the latent volatility and jump processes. Figure 5 shows, for each stock, the 95% credible interval of the temporal average of the volatility process. From the visual inspection of the Figure we note that the stocks can be separated in those with low, medium and high volatilities. More precisely, the posterior temporal average of the volatilities for the majority of the stocks is estimated lower than 5, for a few number of stocks the temporal average ranges between 5 and 15 while for less than 10 stocks the posterior temporal average has 95% credible interval with lower limit greater than 15. See also the supplementary material for additional results regarding Bayesian inference about the log-volatility processes and their parameters. Figure 6 presents the 95% credible intervals for the posterior distribution of the temporal sum of jumps for each stock with respect to the economic sector that it belongs to. The Figure indicates that for each stock have been identified 100 jumps on average between 10/1/2007 and 11/6/2014. We also note that the number of stocks for which almost no jumps have been estimated is very small and such stocks mainly belong to the financial, the industrial and the consumer-cyclical and non-cyclical, the energy and the communications sectors. In the supplementary material we present more results related to the Bayesian analysis of the jump processes of the SV models while Figure 1 offers a visual illustration of the estimated jumps underlying the time evolution of the stock returns.
As already noted in the introduction of the paper the top panel of Figure 1, which has been created by using an empirical method for jump identification, visualizes a dense estimation of the underlying jump process. The middle panel which corresponds to Figure 6: 95% credible intervals of the posterior distributions of the temporal sum of jumps. Each dotted blue line corresponds to a different stock and each plot is consisted of credible intervals for the stocks that belong to the economic sector indicated by its title. independent modelling of jumps depicts a very sparse jump process whereas the bottom panel indicates less number of jumps than the empirical method but more than the independent SV models. To evaluate the estimation of the jump processes in the three panels we first emphasize that the true jumps in the stock prices are unobserved events. Therefore, drawing conclusions for the accuracy of the jumps estimation by only visually inspecting the Figure is difficult. We note, however, that the dense jump structure obtained from the empirical method can be explained by the fact that the time evolution of the volatility of the stock returns is not taken into account; resulting in a possible overestimation of the number of jumps. The latter combined with the more narrow Figure 7: Left: Log Bayes factors (BF) in favour of univariate SV models with jump intensities modelled independently across time and stocks against univariate SV without jumps models; see (5.1) for the corresponding formula. Right: Logarithm of approximate BF for SV models with jointly (numerator) and independently (denominator) modelled jumps intensities; see (5.6) for the corresponding formula. The x-axis in both plots refers to the = 30 out-of-sample time points.
credible intervals for the log-volatilities delivered by SV models with jumps (see Figure  S.12 in the supplementary material) indicates that an efficient jump identification results in accurate estimation of the log-volatility process without increasing unnecessarily its level due to the presence of large price movements and, thus, more precise forecasts of future returns are expected. See also the following Section where we compare the predictive performance of SV models with and without jumps. To assess the jump structures estimated by employing independent and joint models for the jump intensities of SV models we rely on their forecasting performance. In particular, as also described below, the proposed joint modelling approach for the jump intensities results in a more accurate prediction of future returns. We conclude, therefore, that the estimation of the underlying jumps process depicted by the third panel of Figure 1 is closer to the real jump activity of the stock prices allowing further improvement in the estimation of the log-volatilities and more accurate prediction of future stock returns.

Predictive performance
Here we evaluate the predictive performance of the proposed SV model with jointly modelled jump intensities as well as the corresponding performance of univariate SV models with and without jumps which are also specified by equations (3.1)-(3.3) by making suitable assumptions. In the supplementary material we compare the forecasting performance of the proposed model with alternative SV models which account for the heavy tails of the distribution of the observed returns.
The left panel of Figure 7 compares the predictive performance of independent, univariate SV models with and without jumps by presenting predictive log Bayes factors. The Figure is constructed by considering the log Bayes factors defined in (5.1) for the = 30 out-of-sample observations of our dataset. Thus, positive log Bayes factor indicates that the predictions obtained with univariate SV models with jumps are more Figure 8: First column: proportion of stocks for which the interval scores for the 95% prediction intervals of the SV model with jointly modelled jumps is lower than the corresponding score for the SV models without (top) and with independent (bottom) jumps. Second column: medians of interval scores across the 571 stocks. Third column: proportion of stocks for which the continuous ranked probability score (CRPS) for the SV model with jointly modelled jumps is lower than the corresponding score for the SV models without (top) and with independent (bottom) jumps. The x-axis in all plots refers to the = 30 out-of-sample time points. accurate than those of SV models without jumps. According to Kass and Raftery (1995) a value of the log Bayes factor greater than 5 provides strong evidence in favour of the model with jumps. The increasing nature of the log Bayes factor reassures that as more data are collected and as more jumps are identified in stock returns, the evidence that the model with jumps outperforms the model without jumps increases. In the right panel of Figure 7 we compare the predictive performance of our proposed SV with jumps model in which the jump intensities are modelled by using a dynamic factor model with SV models in which the jump intensities are independent over time and across stocks. The Figure compares the two models by presenting the logarithms of the approximate log Bayes factors in (5.6). Clearly, there is strong evidence that our proposed model outperforms the independent modelling of stock returns with a SV with jumps formulation. Figure 8 compares the forecasts of the proposed model with those obtained from SV models without jumps and with independent jumps respectively by employing their probabilistic evaluation (Gneiting and Raftery, 2007). Joint modelling of the intensities leads to 95% prediction credible intervals with the lowest interval scores at each one of the out-of-sample points for the vast majority of the stocks. The definition of the interval score in (5.2) rewards the forecaster that delivers narrow prediction intervals and when the observation is outside the predicted interval a penalty proportional to the significance level of the interval is incurred. It is thus indicated that the proposed SV model delivers the most accurate prediction intervals compared to those obtained from univariate SV models with and without jumps. The fact that Figure 8 displays the median instead of the mean of the 571 interval scores emphasizes an important characteristic of our modelling approach. The efficient identification of the in-sample jumps has resulted in estimating lower volatilities for the stock returns; see Figures S.11 and S.12 in the supplementary material. This is depicted in the conducted out-ofsample exercise by more narrow prediction intervals for a given coverage probability. Moreover, since the prediction of future jumps is based on common across the stocks, autoregressive factors, we expect that jumps that occur suddenly in a small number of stocks at a given day are not easily predictable. On the other hand, the large volatility intervals delivered from SV models without jumps or models with less accurate jump identification can accidentally predict more efficiently these types of movements. The median is chosen to avoid taking into account such predictions. The superiority of the proposed approach is also emphasized by Figure 8 by the lower continuous ranked probability scores that correspond to the proposed SV model than those of the univariate SV models. This implies that the posterior predictive distributions of the proposed model are more concentrated than those obtained from univariate SV models with and without jumps while deliver predictions that are not easily distinguishable from the materialized observations. This is a desired property of probabilistic forecasts described as the main goal of probabilistic forecasting; see for example Gneiting and Katzfuss (2014). Finally, Figure S.14 in the supplementary material which presents root mean squared errors for the predictions obtained from the different models indicates that joint modelling of the jump intensities improves slightly the point forecasts. We conclude that the proposed modelling approach offers a more accurate identification of jumps and consequently a more precise estimation of the returns volatility compared to the one conducted by using SV models without or with independent jumps. As a result, the predictive performance of the proposed model outperforms the other models. This makes the suggested model a competitive alternative for practitioners who are interested in obtaining accurate predictions for the stocks of large portfolios.

Discussion
We have developed a general modelling framework together with carefully designed MCMC algorithms to perform Bayesian inference for SV models with Poisson-driven jumps. We have shown that for the data we applied our models there is evidence that, with respect to predictive Bayes factors, (i) univariate SV models with jumps outperform univariate SV models without jumps and (ii) models that jointly model jump intensities with a dynamic factor model outperform SV models with jumps that are applied independently across stocks. We feel that (ii) is an interesting result that adds considerable insight to the growing literature of modelling financial returns with jumps, adding to the observation by Aït-Sahalia et al. (2015) that there is indeed predictability in jump intensities.
There are various issues that have not been addressed in this paper. The proposed MCMC algorithms can be easily extended in order to conduct inference for the more realistic (Harvey and Shephard, 1996) class of SV models which allow for correlation between the error terms it and η it in (3.1) and (3.2), a phenomenon which is commonly known as leverage effect. In this case samples from their full conditional distributions can be obtained by using the auxiliary gradient-based sampler (Titsias and Papaspiliopoulos, 2018) to draw each correlation parameter jointly with φ i , σ 2 iη and H i . Other modelling aspects include the relaxing of independence of E t in (3.8) or the assumption that A is lower diagonal as in the models proposed in Dellaportas and Pourahmadi (2012).
Finally, a series of interesting financial questions can be addressed with our models by exploiting the fact that panel of stock returns carries additional, possibly important, information. For example, in our dataset one can explore the effects of country and sector effects or could investigate whether the joint tail risk dependence does or does not change before, during and after the financial crisis in Europe. We feel that our proposed models together with our algorithmic guidelines will serve as useful tools for such future research pursuits in financial literature.

Supplementary Material
Web-based supporting material for "Bayesian prediction of jumps in large panels of time series data" by A. Alexopoulos, P. Dellaportas and O. Papaspiliopoulos (DOI: 10.1214/21-BA1268SUPP; .pdf). The supplementary material contains proof of results provided in the main text, simulation experiments and additional results from the real data analysis.