Bayesian emulation for optimization in multi-step portfolio decisions

We discuss the Bayesian emulation approach to computational solution of multi-step portfolio studies in financial time series."Bayesian emulation for decisions"involves mapping the technical structure of a decision analysis problem to that of Bayesian inference in a purely synthetic"emulating"statistical model. This provides access to standard posterior analytic, simulation and optimization methods that yield indirect solutions of the decision problem. We develop this in time series portfolio analysis using classes of economically and psychologically relevant multi-step ahead portfolio utility functions. Studies with multivariate currency, commodity and stock index time series illustrate the approach and show some of the practical utility and benefits of the Bayesian emulation methodology.


Introduction
This work stems from an interest in Bayesian portfolio decision problems with long-term, multistep investment objectives that lead to the need for computational methods for portfolio optimization. Methodological advances reflect the fact that some such optimization problems can be recast-purely technically-as problems of computing modes of marginal posterior distributions in "synthetic" statistical models. We then have access to analytic and computational machinery for exploring posterior distributions whose marginal modes represent target optima in originating optimization/decision problems. We refer to this as Bayesian emulation for decisions, with the synthetic statistical model regarded as an emulating framework for computational reasons.
The use of decision analysis for portfolios coupled with dynamic models for forecasting financial time series continues to be a very active area of Bayesian analysis-in research and in broad application in personal and corporate gambling on markets of all kinds. Forecasting with multivariate dynamic linear/volatility models coupled with extensions of traditional Markowitz mean-variance optimization (Markowitz 1991) define benchmark approaches (e.g. Aguilar and West 2000;Polson and Tew 2000;Quintana 1992;Quintana et al. 2003Quintana et al. , 2010, chapter 10 of Prado andWest 2010, Jacquier andPolson 2011, among others). Much recent work has emphasised advances in forecasting ability based on increasingly structured multivariate models (e.g. Gruber and West 2016a,b;Nakajima and West 2013a,b, 2015, 2016Zhou et al. 2014) with benefits in portfolio outcomes based, in part, on improved characterizations of dynamics in multivariate stochastic volatility. However, relatively little Bayesian work addresses interests in more relevant utility/loss functions, especially in longer-term, multi-step portfolio contexts; much of the cited work here employs standard myopic/one-step ahead decision rules. Our emphasis is to complement these time series forecasting advances with Bayesian decision analysis that explicitly reflects personal or commercial utilities for stable portfolios in a multi-step context.
In stylized forecasting and decision problems, analysis involves computing portfolio weight vectors to minimize expected portfolio loss functions, and to repeatedly apply this sequentially over time. The solutions can be approximated numerically in a number of ways, depending on the form of the loss function, but typically need customization of the numerical techniques. The approach here-emerging naturally in the specific context of multi-step portfolios-is a general approach applicable to a variety of loss functions. At any one time point with decision variable w and expected loss function L(w), the Bayesian emulation strategy is useful if/when there exists a purely synthetic statistical model involving hypothetical random vectors (parameters, latent variables or states) u, z and generating a posterior density p(u, z) under which the marginal model of u is theoretically equal to the optimal w in the portfolio decision. Minimizing L(w) can then be approached by exploring p(u, z) with standard analytic and numerical methods for posterior analysis. While novel in terms of our context and development, the basic idea here goes back (at least) to Müller (1999). There, with discrete decision variables in non-sequential design contexts, optimization is solved using a similar synthetic posterior idea and combining optimization with estimation using MCMC. This approach has, surprisingly, seen limited development, although recent work by Ekin et al. (2014) represents extension and new application. Our work takes a broader emulating perspective with complete separation of models/forecasting and decisions/optimization. We develop emulation of portfolio decisions using forecast information from state-of-the-art multivariate dependency network models , treated as given. We then define the new multi-step decision strategy for computing and revising Bayesian portfolios over time based on these forecasts. Section 2 summarizes the multi-step portfolio set-up in sequential forecasting. To define and exemplify the emulation approach, we give summary details of its use in multi-step portfolios with extensions of standard (myopic, constrained) quadratic loss functions. Here the emulating synthetic statistical models are conditionally linear and normal state-space models, i.e., dynamic linear models (DLMs), amenable to evaluation using analytic forward filtering and backward smoothing (FFBS) methods. This is extended in Section 3 to a class of portfolios with sparsity-inducing penalties on portfolio weights and turnover. The emulating models here also have state-space forms, but now with non-normal structure. With augmented state-spaces, we can convert these to conditional DLMs in which posterior evaluation and mode search are efficiently performed by combining FFBS with a customized EM method. Section 4 discusses a fundamental question of definition of portfolio loss functions and objectives in multi-step contexts, and a strategy for marginal mode evaluation. A range of portfolio loss functions are then evaluated in sequential forecasting and portfolio construction with a 13−dimensional series of daily FX, commodity and market index prices. Section 5 discusses this, highlighting choices of portfolio loss functions and objectives, and practical benefits arising with sparsity-inducing, multi-step portfolio strategies. The latter shows the potential to improve portfolio outcomes, particularly in the presence of realistic transaction costs. Comments in Section 6 conclude the main paper. Appendices provide technical details on optimization and on dynamic dependency network models used for forecasting.
Notational Remarks: We use p(x|y) for a generic density of x given y. Normal, exponential and gamma distributions are written as x ∼ N (µ, Σ), x ∼ Ex(m) with mean 1/m, and x ∼ G(a, b) with shape a and mean a/b; the values of their density functions at a particular x are denoted by N (x|µ, Σ), Ex(x|m) and G(x|a, b), respectively. Indices s, s + 1, . . . , t for s < t are shortened as s:t. The k-dimensional all-ones and all-zeros vectors are 1 k = (1, . . . , 1) and 0 k = (0, . . . , 0) , respectively, and 0 represents a zero vector or matrix when dimensions are obvious.

Setting and Notation
Over times t=1, 2, . . . , we observe a k−vector asset price time series p t ; the returns vector r t has elements r jt = p jt /p j(t−1) − 1, (j = 1:k). At time t with current information set D t = {r t , D t−1 }, a model defines a forecast distribution for returns at the next h time points. With no loss of generality and to simplify notation, take current time t=0 with initial information set D 0 . Predicting ahead, the predictive mean vectors and precision (inverse variance) matrices are denoted by f t = E[r t |D 0 ] and K t = V [r t |D 0 ] −1 over the h−steps ahead t=1:h. The time t portfolio weight vector w t has elements w jt , some of which may be negative reflecting short-selling. Standing at t=0 with a current, known portfolio w 0 , stylized myopic (one-step) Markowitz analyses are Bayesian decision problems focused on choosing w 1 subject to constraints. Standard mean-variance portfolios minimize w 1 K −1 1 w 1 subject to a chosen expected return target m 1 = w 1 f 1 , and usually a sum-to-one constraint w 1 1 k = 1, i.e., allowing only portfolios closed to draw-down or additional investment.
For multi-step portfolios, extend to consider the sequence of potential portfolio vectors w 1:h over the next h periods. The decision is to choose w 1 , but we are interested in target returns and portfolio turnover control over multiple steps, and so must consider how the decision analysis might play-out up to time t=h. Consider multi-step (expected) loss functions of the form where α t , β t and λ t are specified positive weights defining relative contributions of the terms in this sum, while W − t is the (least-norm) generalized inverse of a specified k × k positive-semi-definite matrix W t , and will be the usual inverse in cases of positive-definiteness. Also, D 0 now includes the current portfolio vector w 0 .
The first set of terms in the sum involve specified multi-step target returns m 1:h . Individual investors typically prefer realized portfolios to progress relatively smoothly towards an end-point target m h , rather than bouncing from high to low interim returns. The weights α t can be used to increasingly emphasize the importance of later-stage returns as t approaches h. Note that allowing α t → 0 theoretically implies the hard constraint on expected return, f t w t = m t as in the standard myopic case. Hence we refer to eqn. (1) as including "soft target constraints," while having the ability to enforce the hard constraint at the terminal point via sending α h to zero.
The second set of terms in eqn.
(1) penalize portfolio uncertainty using the standard risk measures V (w t r t |D 0 ) = w t K −1 t w t , again allowing differential weighting as a function of steps-ahead t. The final set of terms relates to portfolio turnover. If W t = I k these terms penalize changes in allocations across all assets. If trades are at a fixed rate, this is a direct transaction cost penalty; otherwise, it still relates directly to transactions costs and so that terminology will be used. With a heavy emphasis on these terms-as defined by the λ t weights-optimal portfolios will be more stable over time, providing less stress on investors (including emotional as well as workload stress for individual investors). The W t can play several constraint-related roles, as we discuss below.

Portfolio Optimization and Emulating Models
There are, of course, no new computational challenges to simple quadratic optimization implied by eqn. (1). Key points are that it is easy to: (i) compute the joint optimizing values w 1:h , and (ii) deduce the one-step optimizing w 1 for the Bayesian decision. Optimization with respect to w 1 alone can be immediately performed using a forward-backward dynamic programming algorithm. Importantly, the optimizing value for w 1 (or for any subset of the w t ) is-as a result of the quadratic nature of eqn. (1)-precisely that sub-vector (or subset of vectors) arising at the global/joint maximizer w 1:h .
The emulation idea translates the above concepts into a synthetic Bayesian model immediately interpretable by statisticians. Rewrite eqn. (1) as where each p(·|·) term is a specific normal p.d.f., the m 1:h , z 1:h , w 1:h are interpreted as random quantities in a multivariate normal distribution underlying this density form, and where each z t is set at z t = 0. Specifically, consider a dynamic linear model (DLM) generating pairs of observations (m t , z t )with m t scalar and z t a k−vectorbased on latent k−vector states w t via with a known initial state (the current portfolio) w 0 and where the ν t , s , ω r are independent and mutually independent innovations sequences. In this model, observing the sequence of synthetic observations m 1:h , z 1:h with z 1:h = 0 immediately implies the resulting posterior p(w 1:h |m 1:h , z 1:h = 0, w 0 ) as given in eqn.
(2). Observe that computing the minimizer of L(w 1:h ) is equivalent to calculating the posterior mode for w 1:h in the synthetic DLM. It is immediate that the required (marginal) optimizing value for w 1 is the marginal mode in this joint posterior. Since the joint posterior is normal, marginal modes coincide with values at the joint mode, so we can regard the Bayesian optimization as solved either way. We easily compute the mode of w 1 using the forward filtering, backward smoothing (FFBS) algorithm-akin to a Viterbi-style optimization algorithm (e.g. Godsill et al. 2001Godsill et al. , 2004Viterbi 1967)-widely used in applications of DLMs (e.g. Prado and West 2010; West and Harrison 1997).

Imposing Linear Constraints
As noted above, some applications may desire a hard target m h at the terminal point, and this is formally achieved by setting the synthetic variance α h = 0 in eqn. (3). The multivariate normal posterior is singular due to the resulting constraint m h = f h w h , but this raises no new issues as the FFBS computations apply directly.
The general framework also applies with singular matrices W t , now playing the roles of the variance matrices of state innovations in eqn. (5). These arise to enforce linear portfolio constraints Aw t = a where a is a given n−vector and A is a full-rank n × k matrix with n < k. Choose w 0 to satisfy these constraints and ensure that each W t is such that AW t = 0. Then the priors and posteriors for the synthetic states w t are singular and constrained such that Aw t = a (almost surely). Again the FFBS analysis applies directly to generate the optimal portfolio vector w 1and the sequence of interim optimizing values w 1:h even though only w 1 is used at t=0. This now involves propagating singular normal posteriors for states, as is standard in, for example, constrained seasonal DLMs (e.g. West and Harrison 1997, sect. 8.4). A key portfolio case is the sum-to-one constraint 1 k w t = 1 for all t. Here we redefine W t beginning with the identity I krepresenting equal and independent penalization of turnover across assets-and then condition on the constraints to give rank k − 1 matrices W t ≡ W = I k − 1 k 1 k /k.

Basic Setting
Now consider modifications to (i) more aggressively limit switching in/out of specific assets between time points-for both transaction and psychological cost considerations, and to (ii) limit the numbers of assets invested at any time point. Several authors have considered absolute loss/penalties to encourage shrinkage-to-zero of optimizing portfolio vectors (e.g. Brodie et al. 2009;Jagannathan and Ma 2003) and we build on this prior work. Key points, however, are that such approaches have not been consistent with a Bayesian decision analysis framework, while goals with respect to marginal versus joint optimization in the multi-step context have been poorly understood and explored, and require clarification. Our fully Bayesian emulation strategy adds to this literature while also clarifying this critical latter point and defining relevant methodology.
The "Laplace loss" terminology relates to novel synthetic statistical models that emulate portfolio optimization with absolute norm terms to penalize portfolio weight changes. Modify eqn. (1) to the form where the final term now replaces the quadratic score with the sum of absolute changes of asset (1), this aims to more aggressively limit transaction costs, both monetary and psychological. Optimizing globally over w 1:h may/will encounter boundary values in which some portfolio weights are unchanged between times t − 1 and t. This theoretical lasso-style fact is one reason for the interest in such loss functions, due to the implied expectation of reduced portfolio turnover-or "churn"-and hence reduced costs.

Emulating Dynamic Laplace Models
In parallel to Section 2.2, we identify a synthetic statistical model-again a state-space model but now with non-normal evolution/transition components for the synthetic latent states w t -of the form where L(λ −1 t ) denotes the Laplace (double exponential) distribution-the p.d.f. for each element is p(w jt |w j,t−1 ) = exp{−|w jt − w j,t−1 |/λ t }/(2λ t ). Also, the ν t , s , ω jr are independent and mutually independent across the ranges of all suffices.
One of the immediate benefits of the Bayesian emulating model approach is that we can exploit latent variable constructs. In particular here, the Laplace distributions are known to be scale mixtures of normals (Andrews and Mallows 1974;West 1984West , 1987. Thus, there exist latent random quantities τ jt > 0, (j=1:k, t=1:h), such that τ jt ∼ Ex(1/{2λ 2 t }) independently over j, t, and based on which each synthetic state evolution in eqn. (9) has the form Augmenting by the vectors of latent scales τ t = τ 1:k,t , the evolutions in eqn. (9) become This defines a conditionally normal DLM and the above/standard FFBS algorithm can be used to evaluate the posterior mode of p(w 1:h |W 1:h , m 1:h , z 1:h ) for any z 1:h including that at zero. To maximize over portfolios w 1:h in the implied marginal with respect to W 1:h , Bayesian EM (e.g. Dempster et al. 1977) is the obvious and easily implemented approach. Here the E-step applies to the latent W 1:h , while FFBS gives the exact M-step for w 1:h at each iterate. In summary: 1. Initialization: Set each w (0) t arbitrarily. Candidates for initial values are the current w 0 , or the trivially computed values that optimize the multi-step portfolios under the quadratic loss of Section 2.
2. For EM iterates s = 1:S under a chosen stopping rule, repeat the following:.
• E-step: For j=1:k and t=1:h, update τ jt via τ • M-step: Implement FFBS for the emulating model of eqn. (7) and (8)  On stopping at iterate S, use w (S) 1 as the approximate optimizing portfolio vector. The addition of linear constraints modifies the W t matrices with details extending those of the normal model in Section 2.3. Write V t = diag(τ t ). Then for the full-rank set of n < k constraints In the key special case of sum-to-one constraints 1 k w t = 1 for all t, this reduces to

Extended Laplace Loss for Sparser Portfolios
In the one-step, myopic context, penalizing portfolio variance w 1 K −1 1 w 1 with a term proportional to 1 k |w 1 | = j=1:k |w jt | is an obvious strategy towards the goal of inducing shrinkage to zero of optimized portfolio weights. As noted earlier, a number of recent works have introduced such a lasso-style penalty directly on portfolio weights, rather than on changes in weights, and with standard convex optimization algorithms for solution (e.g. Brodie et al. 2009) and demonstrating improved portfolio performance in some cases (e.g. Jagannathan and Ma 2003). We now integrate such penalties as components of a more general class loss function embedded in the multi-step framework, and develop the Bayesian emulation methodology for this novel context.
The shrinkage-inducing penalty 1 k |w 1 | aims to drive some subset of weights to zero-exactly in the one-step, myopic context when balanced only by portfolio risk. A key point to note is that, when the portfolio vector is also subject to the sum-to-one constraint, then the combined loss function also more aggressively penalizes negative weights, i.e., short positions, and so is particularly of interest to personal investors and institutional funds that generally adopt long positions. That is, the absolute weight penalty operates as a soft constraint towards non-negative weights. In our broader context below, this does not theoretically imply non-negative optimal weights, but does often yield such solutions. Modify eqn. (6) to the form with weights γ t on the new absolute loss terms at each horizon t = 1:h. Extending the latent variable construction of double exponential distributions to these terms in addition to the turnover terms, we now see that optimizing eqn. (12) is equivalent to computing the mode over states w 1:h in a correspondingly extended synthetic DLM. This emulating model is: with synthetic observations m t (scalar) and z t =u t =0 (k−vectors), and where latent scales τ t are augmented with additional terms φ t = φ 1:k,t for each t. Conditioning on φ jt converts the Laplace term exp(−|w jt |/γ t ) to a conditional normal. To incorporate exact linear constraints on each w t , the above is modified only through the implied changes to the W t ; this is precisely as detailed at the end of Section 3.2 above. Extension of the FFBS/EM algorithm of Section 3.2 provides for computation of the optimizing w 1:h . Each E-step now applies to the latent U 1:h as well as W 1:h , while the M-step applies as before to w 1:h at each iterate. Following initialization at w (0) 1:h , the earlier details of iterates s = 1:S are modified as follows: • E-step:

Profiled Loss and Marginal Loss
In multi-step portfolio analysis, the decision faced at time t=0 is to choose w 1 only. The future weights w 2:h are involved in the initial specification of the joint loss function L(w 1:h ) in order to weigh expected fluctuations in risk and costs up to the target horizon t=h. From the viewpoint of Bayesian decision theory, this is perfectly correct in the context of the actual decision faced if the approach is understood to be minimizing L(w 1 ) = min Joint optimization over w 1:h to deliver the actionable vector w 1 is Bayesian decision analysis with this implied loss as a function of w 1 alone.
The emulation framework provides an approach to computation, but also now suggests an alternative loss specification. With emulating synthetic joint density p(w 1:h ), minimizing the loss L(w 1 ) above is equivalent to profiling out the future hypothetical vectors w 2:h by conditioning on their (joint) modal values. It is then natural to consider the alternative of marginalization over w 2:h ; that is, define the implied marginal loss function L * (w 1 ) as Call L(w 1 ) the profiled loss function and L * (w 1 ) the marginal loss function.
In general, the resulting optimal vectorsŵ 1 (profiled) and w * 1 (marginal) will differ. A key exception is the case of the quadratic loss function and normal synthetic models of Section 2 where the joint posterior p(w 1:h ) is multivariate normal. In that case, joint modes are joint means, whose elements are marginal means, i.e.,ŵ 1 = w * 1 . The situation is different in cases of non-normal emulating models, such based on the Laplace forms. These are now considered further for comparisons of marginal and profile approaches.

Computing Marginal Portfolios under Laplace Loss
Return to the Laplace loss framework of Sections 3.1 and 3.2 (i.e., the extended Laplace context with γ t → ∞) with sum-to-one constraints. Here the key issues of profiled versus marginal losses are nicely illustrated. Similar features arise in the extended Laplace loss context of Section 3.3, but with no new conceptual or practical issues so details of that extension are left to the reader. The FFBS/EM algorithm easily computes the optimal profile portfolioŵ 1 , but it does not immediately extend to evaluating the optimal marginal portfolio w * 1 . Of several approaches explored, the most useful is based on Markov Chain Monte Carlo (MCMC) analysis of the synthetic DLM, coupled with iterative, gradient-based numerical evaluation of the mode of the resulting Monte Carlo approximation to the required marginal density function. Summary details are given here and further explored in application in Section 5.
The density p(w 1 ) is the w 1 margin under the full joint posterior of (w 1:h , τ 1:h ) where τ t = τ 1:k,t is the vector of t−step ahead latent scales. The FFBS/EM approach is enabled by the nice analytic forms of implied conditional posteriors; these also enable MCMC analysis in this conditionally normal DLM with uncertain scale factors. This approach is nowadays standard and easily implemented (e.g. West and Harrison 1997, chapt. 15;Prado and West 2010, sect. 4.5). Now the FFBS is exploited to generate backward sampling, rather than the backward smoothing that evaluates posterior modes. At each MCMC iterate, FFBS applies to simulate one draw of the full trajectory of states w 1:h from the retrospective posterior p(w 1:h |τ 1:h ) conditional on current values of the latent scales. Then, conditional on this state trajectory, the conditional posterior p(τ 1:h |w 1:h ) is simulated to draw a new sample of the latent scales. In the emulating model of eqn. (11) this second step involves a set of conditionally independent univariate draws, each from a specific GIG (generalized inverse Gaussian) distribution. Applying the sum-to-one constraint on each w t vector changes this structure for the τ jt , however, and direct sampling of the τ jt is then not facile. To address this, we define a Metropolis-Hastings extension for these elements to allow use of the constraint. Summary details of this, and of MCMC convergence diagnostics related to the real-data application in Section 5, are given in Appendix A.
The MCMC generate samples indexed by superscript (i), i = 1:I, for some chosen sample size I. The Rao-Blackwellized Monte Carlo approximation to the required margin for w 1 is then Importantly, this is the density of a mixture of I normals: each conditional p(w 1 |•) in the sum is the implied normal margin in the DLM defined by conditioning values of latent scales, with moments trivially computable via FFBS (using backward smoothing), and the density values are easily evaluated at any w 1 . Thus the portfolio optimization problem reduces to mode-finding in a mixture of multivariate normals, and there are a number of numerical approaches to exploit for this. The most effective is really one of the simplest-a Newton-type updating rule based on the first order derivative of the density, with repeat searches based on multiple initial values for numerical iterates. Relevant candidate initial values can be generated by evaluating the mixture at each of the normal component means, and selecting some of those with highest mixture density. Further details are noted in Appendix A.

Data
Evaluation of multi-step portfolios uses data on daily returns of k=13 financial series: exchange rates of 10 international currencies (FX) relative to the US dollar, two commodities and two asset market indices; see

Forecasting Model
Forecasts are generated from a time-varying, vector auto-regressive model of order 2 (TV-VAR(2), e.g. Nakajima and West 2013b;Primiceri 2005), with dynamic dependence network structure (DDN, . Exploratory analysis of the first 500 observations is used to define the sparsity structure of the dynamic precision matrix for the TV-VAR innovations, i.e., a sparse representation of multivariate volatility, following examples in the above references. From day 501, the analysis is run sequentially in time, updating and forecasting each day. The DDN structure enables analytic filtering and one-step forecasting; forecasting multiple steps ahead in a TV-VAR with DDN structure is performed by direct simulation into the future. For each day t during the investment period, the model generates multiple-step ahead forecast mean vectors and variance matrices, {f t+i , K −1 t+i } i=1:h , given as Monte Carlo averages of 50,000 forecast trajectories of the return vectors r t+(1:h) simulated at time t. We take h=5 days as the portfolio horizon, and reset the time index so that t= 0 represents the start of the investment period, January 1, 2009. Appendix B provides detailed discussion of the DDN model, use of exploratory training data, filtering and simulation-based forecasting.

Parameters and Metrics
Comparisons use various values of portfolio parameters in the quadratic/normal and Laplace loss frameworks. In all cases, we take the target return schedule m 1:h to be constant, with m t = 0.0005 representing daily return targets of 0.05%, annualized (261 trading days) to about 13.9%. Then, we have α t > 0 for t < h, rather than strictly enforcing the constraint by α t = 0, so that the interim targets are "soft" rather than strictly enforced. The initial portfolio w 0 is the myopic Markowitz portfolio for comparison. Parameters α t , β t , λ t , γ t define relative weights of the four components of loss. In a long-term context (e.g., when t indexes months or more) some use of discounting into the future becomes relevant. For example, we may take β t , λ t , γ t may be chosen to increase with t, but α t to decrease with t to more aggressively target the soft targets as t approaches h, given the accurate and reliable long-term predictions. In short-term contexts, such as with our 5−day context, this is not relevant, so we take constant weights α t = α = 1, β t = β, λ t = λ, γ t = γ. Setting α = 1 loses no loss of generality, as the remaining three weights are relative to α. Examples use various values of β, λ, γ to highlight their impact on portfolio outcomes. Larger values of β reduce the penalty for risk in terms of overall portfolio variance; larger values of λ t leads to more volatile portfolio dynamics due to reduced penalties on transaction costs; and larger values of γ reduce shrinkage of portfolio weights, also relaxing the penalties on shorting.
Portfolios are compared in several ways, including realized returns. With a fixed transaction cost of δ ≥ 0, time t optimal portfolio vector w t and realized return vector r t , cumulative return R t from the period 0:t is In our examples, we compare cases with δ = 0 and δ = 0.001. We also compare our multi-step portfolios with the standard onestep/myopic Markowitz approach-naturally expected to yield higher cumulative returns with no transaction costs as it then generates much more volatile changes in portfolio weights day-to-day. Our portfolios constraining turnover are naturally expected to improve this in terms of both stability of portfolios and cumulative return in the presence of practically relevant, non-zero δ. Additional metrics of interest are portfolio "risk" as traditionally measured by the realized portfolio standard deviations (w t K −1 t w t ) 1/2 , and patterns of volatility in trajectories of optimized portfolio weights over time.

Normal Loss Portfolios
First examples use the normal loss framework of Section 2 with β = 100. Figure 1 shows trajectories over time of optimized portfolio weight vectors using λ = 100 and λ = 10,000, as well as those from the standard, myopic Markowitz analysis that corresponds to λ → ∞. We see the increased smoothness of changes as λ decreases; at λ = 1 the trajectories (not shown) are almost constant. Figure 2 plots trajectories of cumulative returns for three normal loss portfolios (λ = 1, 100 and 10,000) and for the Markowitz analysis. Markowitz and larger λ normal loss portfolios performs best-in this metric-with no transaction costs; but the Markowitz approach is very substantially out-performed by the smoother, multi-step portfolios under even very modest transaction costs (δ = 0.001). Smaller λ induces portfolios more robust to transaction costs. Of note here is that, during 2009 following the financial crisis, portfolios with larger λ benefit as they are less constrained in adapting; but, later into 2010 and 2011, portfolios with lower λ are more profitable as they define ideal allocations with less switching and therefore save on transaction costs. Figure 3 shows trajectories of realized SDs of optimized portfolios, i.e. (w t K −1 t w t ) 1/2 over time, for each of the portfolios in Figure 2; also plotted is the theoretical lower bound trajectory (1 k K −1 t 1 k ) −1/2 from the myopic, one-step, minimum variance portfolio. Less constrained portfolios with larger λ have lower standard deviations, approaching those of the Markowitz portfolio while also generating smoother changes in portfolio weights and higher cumulative returns. Thus, these portfolios are improved in these latter metrics at the cost of only modest increases in traditional portfolio "risk." Interestingly, the relationship between λ and realized standard deviations is not monotone; we see larger standard deviations in the case of λ = 100 than with λ = 1, the latter, very low value yielding an almost constant portfolio over time that, in this study, turns out to control risk at a level not matched by modestly more adaptive portfolios.

Extended Laplace Loss Portfolios
We explore similar graphical summaries from analyses using the extended Laplace loss framework of Section 3.3, and with sum-to-one constraints. Figure 4 shows optimal weight trajectories with α = 100 and β = 1 (this change of α is only in this example), and with the same level of penalization of turnover and absolute weights, i.e., λ = γ = 100. We see expected effects of the two types of shrinkage-of changes in weights and in weights themselves. First, the hard shrinkage of changes induces much less switching in portfolio allocation over time, with longish periods of constants weights on subsets of equities. This occurs even with larger λ where the portfolio becomes more volatile and similar to the Markowitz case. Second, the penalty on absolute weights themselves, and implicitly on short positions as a result in this context of sum-to-one weights, yields trajectories that are basically non-negative on all equities over time. The joint optimization drives some of the weights exactly to zero at some periods of time, indicating a less than full portfolio over these periods. Furthermore, it is evident that there are periods where some of the weightswhile not zero-are shrunk to very small values, so that a practicable strategy of imposing a small threshold would yield sparser portfolios-i.e., a "soft" sparsity feature. Values λ > γ favor more stability/persistence in the portfolio allocations, and we see more "stepwise" allocation switches rather than more volatile turnover. Conversely, λ ≤ γ more aggressively favors no-shorting and encourages "soft" sparsity of allocations, resulting in dynamically switching portfolio weights over, generally, fewer assets.    Figure 5 plots trajectories of cumulative returns for three extended Laplace loss portfolios to show variation with the value of γ, together with one highly adaptive normal loss portfolio and the Markowitz analysis, for α = 1 and β = 100 fixed. Again we compare cases with transaction cost δ = 0 and 0.001. As with normal loss comparisons, all multi-step cases dominate the traditional Markowitz analysis under even modest transaction costs. In addition, we now see the ability of the increasingly constrained multi-step Laplace portfolios to outperform unconstrained-and hence more volatile-Markowitz as well as multi-step normal loss portfolios even when transactions costs are zero or ignored. Then, cumulative returns with (λ, γ) = (100, 1,000) are essentially uniformly dominated by those with (λ, γ) = (100, 10) and (100, 100), regardless of the existence of transaction costs in this example. This suggests values of γ smaller than or comparable to λ to appropriately balance the two degrees of shrinkage while maintaining relevant returns. One reason for this is the encouragement towards less volatile swings in weights to larger negative/positive values and towards no-shorting as part of that, features that can lead to increased risk and transaction costs. (green), 100 (yellow) and 10,000 (purple), together with those from the normal loss portfolio with λ = 10,000 (blue) and the Markowitz outcomes (pink). The transaction cost is δ = 0 (upper) and 0.001 (lower).

Comparison of Profiled and Marginal Loss Approaches
We now discuss some analysis summaries related to the discussion of profiled and marginal losses of Section 4. As discussed in Section 4.2 we do this in the Laplace loss framework of Sections 3.1 and 3.2 (i.e., with γ t → ∞ in the extended context). First, Figure 6 shows optimal weight trajectories with β = 100 and λ = 100, comparing the profiled Laplace loss weightsŵ • of Section 3.2 with the marginal Laplace loss weights w * • of Section 4. Both strategies generate positive weights on the JPY, GBP and CAD FX rates, with a number of the other assets having quite small weights for longer periods of time, while the weights under profiled loss vary more widely to higher absolute values. We see constant weights for long periods on adaptively updated subsets of assets using the profiled weights, as expected; these trajectories are effectively smoothed-out and shrunk towards zero under the marginal weights. The latter do not exhibit the exact zero values that the former can, as we now understand is theoretically implied by our representation via the emulating statistical model: marginal modes will not be exactly at zero even when joint modes have subsets at zero. Figure 7 plots trajectories of cumulative returns for both profiled and marginal portfolios in each of the cases with λ = 100 and λ = 10,000. Without transaction cost, the profiled and marginal portfolios are similarly lucrative whatever the value of λ, whereas the profiled portfolios show greater differences. In contrast, both approaches are more substantially impacted by transaction costs and in a similar way; the cumulative return performance of portfolios decreases drastically in the presence of transaction costs. Not shown here, portfolios with smaller λ values define far more stable weights while resulting in very similar cumulative returns under both profiled and marginal strategies, as the resulting portfolio weights are very stable over time; this extends this observation as already noted in the normal loss context in Section 5.4.
The marginal strategy tends to be less sensitive to λ than the profiled strategy, suggesting relevance in a "conservative" investment context with respect to loss function misspecification. Even with quite widely varying λ, resulting marginal loss portfolios will be more stable, and far less susceptible to substantial changes and potential deterioration in terms of cumulative returns, than profiled loss portfolios.

Additional Comments
The selected illustrations in our application to financial time series highlight key features and benefits of our Bayesian emulation approach to computing optimal decisions, as well as our development of new multi-step portfolio decision analysis. Versions of the Laplace loss functions generate multistep portfolios that consistently outperform traditional myopic approaches, both with and without transaction costs; they define psychologically and practically attractive framework for investors concerned about portfolio stability over multiple periods with defined targets. Examples show the opportunity-through appropriate selection of loss function parameters-for resulting portfolios to cushion the impacts of economically challenging times for the market, and enhance recovery afterwards, as highlighted in the examples using FX, commodity and market index data over 2009-2012. In addition to showcasing the application of the concept of "Bayesian emulation for decisions", our interest in multi-step portfolios also highlights the central question of optimization for one-step decisions in the multi-step view; while specific numerical methods using dynamic programming might be tuned and customized to a specific loss function in this context, the Bayesian emulation approach opens up new approaches and suggests new avenues for development. As we begin in our discussion of marginal versus profiled loss functions, there is now opportunity to drive some  part of the research agenda from synthetic statistical models as a starting point, exploring and evaluating the implied loss functions. One specific, current direction linked to this is to define classes of non-normal, state-space models with skewed innovation/error distributions that induce asymmetric loss functions; a key idea here is to use discrete/mean-scale mixtures of normals for the innovation/error distributions, so maintaining the ability to use MCMC coupled with FFBS/EM methods for mode-finding while generating a very rich class of resulting loss functions. One key and desirable feature of the latter, in particular, is to represent high penalties on portfolio short-fall relative to moderate or expected gains. This direction, and others opened-up by the "Bayesian emulation for decisions" approach, offers potential for impact on research frontiers in statistics and decision theory as well as application in financial portfolio development and other areas. The next step is mode-finding in this mixture of normals. Modes satisfy with m i = m (i) We iterate this fixed-point equation to compute approximate modes, with the strategy for multiple "global" starting values as noted in the main paper. Normal mixtures can exhibit multiple modes, and our starting values-using means m i prioritized by the resulting values ofp(m i )explicitly address this by defining a set of "spanning" mode searches.

A.2 Sum-to-One Constraint
Note that, when the sum-to-one constraint is imposed on the original model by setting W t = diag(τ t ) − τ t τ t /1 k τ t , then the full conditional of the τ jt is no longer a product of univariate GIG distributions. To sample each τ t , we therefore use a novel, independence chain Metropolis-Hastings algorithm. Here the product of the initial GIG distributions in the unconstrained model is used as the obvious proposal distribution. Acceptance probabilities involve singular normal densities based on the generalized inverses W − t = diag(τ t ) − τ t τ t /(1 k τ t + c) where c = 10 −9 . The acceptance probability of each proposed sample τ new jt conditional on its previous value τ old jt and all other parameters is (c + 1 k τ new jt ) 1/2 /(c + 1 k τ old jt ) 1/2 where τ new jt = (τ 1t , · · · , τ new jt , · · · , τ kt ) and τ old jt = (τ 1t , · · · , τ old jt , · · · , τ kt ) . We observe in our empirical studies and the application of the paper, in particular, that the acceptance probability is generally very high-typically around 98%.
We note also that, due to sum-to-one constraint, the conditional, k × k variance matrices C i are rank-deficient, being of rank k − 1. The generalized inverse C − i in eqn. (17) are based on singular value decompositions in this iterative numerical solver for the modes of the normal mixture.

B Appendix: Dynamic Dependence Network Models
For time series analysis and forecasting, we adapt the framework of dynamic dependence network models (DDNMs) introduced in . This model framework builds on prior work in multivariate dynamic modelling and innovates in bringing formal and adaptive Bayesian model uncertainty analysis to parsimonious, dynamic graphical structures of real practical relevance to financial (and other) forecasting contexts. Specific classes of DDNMs represent both lagged and cross-sectional dependencies in multivariate, time-varying autoregressive structures, with an ability to adapt over time to dynamics in cross-series relationships that advances the ability to characterize changing patterns of feed-forward relationships and of multivariate volatility, and to potentially improve forecasts as a result.
Denote the k × 1 vector of assets by y t ; in our application, y t is the vector of log prices of the financial assets. The DDNM extension of a TV-VAR(2) model represents y t via where x t = (1, y t−1 , y t−2 ) , Φ t is a k×(1+2k) matrix of time-varying intercept auto-regressive coefficients, Γ t is a time-varying, lower triangular matrix with diagonal zeros, and D t = diag(v 1t , . . . , v kt ) with time-varying univariate volatilities on the diagonal. The model can be written element-wise as y jt = x t φ jt + y pa(j),t γ jt + N (0, v jt ), where φ jt is the j-th row of Φ t , pa(j) ⊆ {1:(j−1)} is the parental set of series j defined as the indices of j-th row of Γ t with non-zero elements, and y pa(j),t and γ jt are the corresponding subvectors with |pa(j)| elements of y t and j-th row of Γ t . The state parameters (φ jt , γ jt ) are assumed to follow normal random walks with a discount factor method applied to define the state evolution variance matrices as is standard in univariate DLMs (West and Harrison 1997, chap. 6). The observational variance v jt is modeled as a gamma-beta stochastic volatility process over time, again based on standard DLM methodology (West and Harrison 1997, sect. 10.8). Sparsity of the parental sets pa(j) defines patterns of zeros below the diagonal in Γ t . This in turn defines the sparsity structure Parent j pa(j) GBP EUR CAD AUD CHF ∅ GLD GBP ZAR CAD CHF S&P GBP EUR NOK CAD AUD NZD NSD AUD JPY CHF S&P