Bayesian Emulation for Multi-Step Optimization in Decision Problems

. We develop a Bayesian approach to computational solution of multi-step optimization problems, highlighted in the example of ﬁnancial portfolio decisions. The approach involves mapping the technical structure of a decision analysis problem to that of Bayesian inference in a purely synthetic “emulating” statistical model. This provides access to standard posterior analytic, simulation and optimization methods that yield indirect solutions of the decision problem. We develop this in time series portfolio analysis using classes of economically and psychologically relevant multi-step ahead portfolio utility functions. Studies with multivariate currency time series illustrate the approach and show some of the practical utility and beneﬁts of the Bayesian emulation methodology.


Introduction
This work stems from an interest in Bayesian analysis of sequential, multi-period decision problems. A key example is financial portfolio decisions where long-term, multi-step investment objectives generate new classes of utility functions for which the resulting optimization problems raise computational challenges. We define new optimization strategies that stem from the recognition that some such optimization problems can be recast -purely technically -as problems of computing modes of marginal posterior distributions in "synthetic" statistical models. We then have access to analytic and computational machinery for exploring posterior distributions whose marginal modes represent target optima in originating optimization/decision problems. We refer to this as Bayesian emulation for decisions as the synthetic statistical model regarded as an emulating framework for computational purposes. The concept of the emulation strategy is general and will apply in many decision analysis contexts. That said, this paper uses the motivating example of personal financial portfolio decisions to convey and exemplify the methodological development in a concrete setting.
The use of decision analysis for portfolios coupled with dynamic models for forecasting financial time series continues to be a very active area of Bayesian analysisin research and in broad application in personal and corporate gambling on markets of all kinds. Forecasting with multivariate dynamic linear/volatility models coupled with extensions of traditional Markowitz mean-variance optimization (Markowitz, 1991) define benchmark approaches (e.g. Quintana, 1992;Aguilar and West, 2000;Quintana et al., 2003Quintana et al., , 2010Polson and Tew, 2000, chapter 10 of Prado andWest, 2010, Jacquier andPolson, 2011, among others). Much recent work has emphasised advances in forecasting ability based on increasingly structured multivariate models (e.g. Zhou et al., 2014;Nakajima and West, 2013a,b, 2015Zhao et al., 2016;West, 2016, 2017) with benefits in portfolio outcomes based, in part, on improved characterizations of dynamics in multivariate stochastic volatility. However, relatively little Bayesian work addresses interests in more relevant utility/loss functions, especially in longer-term, multi-step portfolio contexts; much of the cited work here employs standard myopic/one-step ahead decision rules. Our emphasis is to complement these time series forecasting advances with Bayesian decision analysis that explicitly reflects personal or commercial utilities for stable portfolios in a multi-step context.
In stylized forecasting and decision problems, analysis involves computing portfolio weight vectors to minimize expected portfolio loss functions, and to repeatedly apply this sequentially over time. The solutions can be approximated numerically in a number of ways, depending on the form of the loss function, but typically need customization of the numerical techniques. The approach here -emerging naturally in the specific context of multi-step portfolios -is a general approach applicable to a variety of loss functions. At any one time point with decision variable w and expected loss function L(w), the Bayesian emulation strategy is useful if/when there exists a purely synthetic statistical model involving hypothetical random vectors (parameters, latent variables or states) u, z and generating a posterior density p (u, z) under which the marginal mode of u is theoretically equal to the optimal w in the portfolio decision. Minimizing L(w) can then be approached by exploring p(u, z) with standard analytic and numerical methods for posterior analysis. While novel in terms of our context and development, the basic idea here goes back (at least) to Müller (1999). There, with discrete decision variables in non-sequential design contexts, optimization is solved using a similar synthetic posterior idea and combining optimization with estimation using Markov chain Monte Carlo (MCMC). This approach has, surprisingly, seen limited development, although recent work by Ekin et al. (2014) represents extension and new application. Our work here is related while defining a broader emulating perspective and in creating a complete separation of models/forecasting and decisions/optimization. We develop emulation of portfolio decisions using forecast information from state-of-the-art multivariate dependency network models (Zhao et al., 2016), treated as given. We then define the new multi-step decision strategy for computing and revising Bayesian portfolios over time based on these forecasts. Section 2 summarizes the multi-step portfolio set-up in sequential forecasting. To define and exemplify the emulation approach, we give summary details of its use in multi-step portfolios with extensions of standard (myopic, constrained) quadratic loss functions. Here the emulating synthetic statistical models are conditionally linear and normal state-space models, i.e., dynamic linear models (DLMs), amenable to evaluation using analytic forward filtering and backward smoothing (FFBS) methods. This is extended in Section 3 to a class of portfolios with sparsity-inducing penalties on portfolio weights and turnover. The emulating models here also have state-space forms, but now with non-normal structure. With augmented state-spaces, we can convert these to conditional DLMs in which posterior evaluation and mode search are efficiently performed by combining FFBS with a customized EM method. Of directly related work, we note that Kolm and Ritter (2015) recognized the relationship of constrained quadratic optimization for multi-step portfolios to posterior mode search in a synthetic DLM, and considered a direct simulation approach to problems with sparsity-inducing loss functions that are a special case of those we explore. Our main methodological contribution based on Bayesian emulation for broad classes of decision problems is quite distinct and novel, however.
Section 4 discusses new and fundamental questions about the definition of portfolio loss functions and objectives in multi-step contexts, and a strategy for marginal mode evaluation in cases where differences arise. A range of portfolio loss functions are then evaluated in sequential forecasting and portfolio construction with a 9-dimensional series of daily foreign exhange (FX) prices. Section 5 discusses this, highlighting choices of portfolio loss functions and objectives, and practical benefits arising with sparsityinducing, multi-step portfolio strategies. The latter shows the potential to improve portfolio outcomes generally, and particularly in the presence of realistic assumptions about transaction costs. Summary comments in Section 6 conclude the main paper. Appendices provide technical details on optimization and on dynamic dependency network models used for forecasting. The Appendices are available as on-line Supplementary Material (Irie and West, 2018).
Notational Remarks: We use p(x|y) for a generic density of x given y. Normal, exponential and gamma distributions are written as x ∼ N (μ, Σ), x ∼ Ex(m) with mean 1/m, and x ∼ G(a, b) with shape a and mean a/b; the values of their density functions at a particular x are denoted by N (x|μ, Σ), Ex(x|m) and G(x|a, b), respectively. Indices s, s + 1, . . . , t for s < t are shortened as s:t. The k-dimensional all-ones and all-zeros vectors are 1 k = (1, . . . , 1) and 0 k = (0, . . . , 0) , respectively, and 0 represents a zero vector or matrix when dimensions are obvious.

Setting and Notation
Sequentially over time t=1, 2, . . . , we observe a k−vector asset price time series p t ; the returns vector r t has elements r jt = p jt /p j(t−1) − 1, (j = 1:k). At time t with current information set D t = {r t , D t−1 }, a model defines a forecast distribution for returns at the next h time points. With no loss of generality and to simplify notation, take current time t=0 with initial information set D 0 . Predicting ahead, the predictive mean vectors and precision (inverse variance) matrices are denoted by f t = E[r t |D 0 ] and K t = V [r t |D 0 ] −1 over the h−steps ahead t=1:h. The time t portfolio weight vector w t has elements w jt , some of which may be negative reflecting short-selling. Standing at t=0 with a current, known portfolio w 0 , stylized myopic (one-step) Markowitz analyses are Bayesian decision problems focused on choosing w 1 subject to constraints. Standard mean-variance portfolios minimize w 1 K −1 1 w 1 subject to a chosen expected return target m 1 = w 1 f 1 , and usually a sum-to-one constraint w 1 1 k = 1, i.e., allowing only portfolios closed to draw-down or additional investment.
For multi-step portfolios, extend to consider the potential sequence of portfolio vectors w 1:h that might be realized over the next h periods. While the actual decision is to choose w 1 , we are now interested in target returns and portfolio turnover control over multiple steps, and so must consider how the decision analysis might play-out up to time t=h. Consider multi-step (expected) loss functions of the form where α t , β t and λ t are specified positive weights defining relative contributions of the terms in this sum, while W − t is the (least-norm) generalized inverse of a specified k × k positive-semi-definite matrix W t , and will be the usual inverse in cases of positivedefiniteness. Also, D 0 now includes the current portfolio vector w 0 .
The first set of terms in the sum involve specified multi-step target returns m 1:h . Individual investors typically prefer realized portfolios to progress relatively smoothly towards an end-point target m h , rather than bouncing from high to low interim returns. The weights α t can be used to increasingly emphasize the importance of later-stage returns as t approaches h. Note that allowing α t → 0 theoretically implies the hard constraint on expected return, f t w t = m t as in the standard myopic case. Hence we refer to (1) as including "soft target constraints," while having the ability to enforce the hard constraint at the terminal point via sending α h to zero.
The second set of terms in (1) penalize portfolio uncertainty using the standard risk measures V (w t r t |D 0 ) = w t K −1 t w t , again allowing differential weighting as a function of steps-ahead t. The final set of terms relates to portfolio turnover. If W t = I k these terms penalize changes in allocations across all assets. If trades are at a fixed rate, this is a direct transaction cost penalty; otherwise, it still relates directly to transactions costs and so that terminology will be used. With a heavy emphasis on these terms -as defined by the λ t weights -optimal portfolios will be more stable over time, providing less stress on investors (including emotional as well as workload stress for individual investors). The W t can play several constraint-related roles, as we discuss below.

Portfolio Optimization and Emulating Models
There are, of course, no new computational challenges to simple quadratic optimization implied by (1). Key points are that it is easy to: (i) compute the joint optimizing values w 1:h , and (ii) deduce the one-step optimizing w 1 for the Bayesian decision. Optimization with respect to w 1 alone can be immediately performed using a forward-backward dynamic programming algorithm. Importantly, the optimizing value for w 1 (or for any subset of the w t ) is -as a result of the quadratic nature of (1) -precisely that sub-vector (or subset of vectors) arising at the global/joint maximizer w 1:h .
The emulation idea translates the above concepts into a synthetic Bayesian model immediately interpretable by statisticians. Rewrite (1) as where each p(·|·) term is a specific normal p.d.f., the m 1:h , z 1:h , w 1:h are interpreted as random quantities in a multivariate normal distribution underlying this density form, and where each z t is set at z t = 0. Specifically, consider a dynamic linear model (DLM) generating pairs of observations (m t , z t ) -with m t scalar and z t a k−vector -based on latent k−vector states w t via with a known initial state (the current portfolio) w 0 and where the ν t , s , ω r are independent and mutually independent innovations sequences. In this model, observing the sequence of synthetic observations m 1:h , z 1:h with z 1:h = 0 immediately implies the resulting posterior p(w 1:h |m 1:h , z 1:h = 0, w 0 ) as given in (2).
Observe that computing the minimizer of L(w 1:h ) is equivalent to calculating the posterior mode for w 1:h in the synthetic DLM. It is immediate that the required (marginal) optimizing value for w 1 is the marginal mode in this joint posterior. Since the joint posterior is normal, marginal modes coincide with values at the joint mode, so we can regard the Bayesian optimization as solved either way. We easily compute the mode of w 1 using the forward filtering, backward smoothing (FFBS) algorithm -akin to a Viterbi-style optimization algorithm (e.g. Viterbi, 1967;Godsill et al., 2001Godsill et al., , 2004) -widely used in applications of DLMs (e.g. West and Harrison, 1997;Prado and West, 2010).

Imposing Linear Constraints
As noted above, some applications may desire a hard target m h at the terminal point, and this is formally achieved by setting the synthetic variance α h = 0 in (3). The multivariate normal posterior is singular due to the resulting constraint m h = f h w h , but this raises no new issues as the FFBS computations apply directly.
The general framework also applies with singular matrices W t , now playing the roles of the variance matrices of state innovations in (5). These arise to enforce linear portfolio constraints Aw t = a where a is a given n−vector and A is a full-rank n × k matrix with n < k. Choose w 0 to satisfy these constraints and ensure that each W t is such that AW t = 0. Then the priors and posteriors for the synthetic states w t are singular and constrained such that Aw t = a (almost surely). Again the FFBS analysis applies directly to generate the optimal portfolio vector w 1 -and the sequence of interim optimizing values w 1:h even though only w 1 is used at t=0. This now involves propagating singular normal posteriors for states, as is standard in, for example, constrained seasonal DLMs (e.g. West and Harrison, 1997, sect. 8.4). A key portfolio case is the sum-to-one constraint 1 k w t = 1 for all t. Here we redefine W t beginning with the identity I krepresenting equal and independent penalization of turnover across assets -and then condition on the constraints to give rank k − 1 matrices W t ≡ W = I k − 1 k 1 k /k.

Basic Setting
Now consider modifications to (i) more aggressively limit switching in/out of specific assets between time points -for both transaction and psychological cost considerations, and to (ii) limit the numbers of assets invested at any time point. Several authors have considered absolute loss/penalties to encourage shrinkage-to-zero of optimizing portfolio vectors (e.g. Jagannathan and Ma, 2003;Brodie et al., 2009) and we build on this prior work. Key points, however, are that such approaches have not been consistent with a Bayesian decision analysis framework, while goals with respect to marginal versus joint optimization in the multi-step context have been poorly understood and explored, and require clarification. Our fully Bayesian emulation strategy adds to this literature while also clarifying this critical latter point and defining relevant methodology.
The "Laplace loss" terminology relates to novel synthetic statistical models that emulate portfolio optimization with absolute norm terms to penalize portfolio weight changes. Modify (1) to the form where the final term now replaces the quadratic score with the sum of absolute changes of asset weights 1 k |w t − w t−1 | = j=1:k |w jt − w j,t−1 |. Relative to (1), this aims to more aggressively limit transaction costs, both monetary and psychological. Optimizing globally over w 1:h may/will encounter boundary values in which some portfolio weights are unchanged between times t − 1 and t. This theoretical lasso-style fact is one reason for the interest in such loss functions, due to the implied expectation of reduced portfolio turnover -or "churn" -and hence reduced costs.

Emulating Dynamic Laplace Models
In parallel to Section 2.2, we identify a synthetic statistical model -again a statespace model but now with non-normal evolution/transition components for the synthetic latent states w t -of the form where . Also, the ν t , s , ω jr are independent and mutually independent across the ranges of all suffices.
One of the immediate benefits of the Bayesian emulating model approach is that we can exploit latent variable constructs. In particular here, the Laplace distributions are known to be scale mixtures of normals (Andrews and Mallows, 1974;West, 1984West, , 1987. Thus, there exist latent random quantities τ jt > 0, (j=1:k, t=1:h), such that τ jt ∼ Ex(1/{2λ 2 t }) independently over j, t, and based on which each synthetic state evolution in (9) has the form Augmenting by the vectors of latent scales τ t = τ 1:k,t , the evolutions in (9) become This defines a conditionally normal DLM and the above/standard FFBS algorithm can be used to evaluate the posterior mode of p(w 1:h |W 1:h , m 1:h , z 1:h ) for any z 1:h including that at zero. To maximize over portfolios w 1:h in the implied marginal with respect to W 1:h , Bayesian EM (e.g. Dempster et al., 1977) is the obvious and easily implemented approach.
Here the E-step applies to the latent W 1:h , while FFBS gives the exact M-step for w 1:h at each iterate. In summary: 1. Initialization: Set each w (0) t arbitrarily. Candidates for initial values are the current w 0 , or the trivially computed values that optimize the multi-step portfolios under the quadratic loss of Section 2.
2. For EM iterates s = 1:S under a chosen stopping rule, repeat the following:.
• E-step: For j=1:k and t=1:h, update τ jt via τ • M-step: Implement FFBS for the emulating model of (7) and (8)  The addition of linear constraints modifies the W t matrices with details extending those of the normal model in Section 2.3. Write V t = diag(τ t ). Then for the full-rank set of n < k constraints In the key special case of sum-to-one constraints 1 k w t = 1 for all t, this reduces to

Extended Laplace Loss for Sparser Portfolios
In the one-step, myopic context, penalizing portfolio variance w 1 K −1 1 w 1 with a term proportional to 1 k |w 1 | = j=1:k |w jt | is an obvious strategy towards the goal of inducing shrinkage to zero of optimized portfolio weights. As noted earlier, a number of recent works have introduced such a lasso-style penalty directly on portfolio weights, rather than on changes in weights, and with standard convex optimization algorithms for solution (e.g. Brodie et al., 2009) and demonstrating improved portfolio performance in some cases (e.g. Jagannathan and Ma, 2003). We now integrate such penalties as components of a more general class loss function embedded in the multi-step framework, and develop the Bayesian emulation methodology for this novel context.
The shrinkage-inducing penalty 1 k |w 1 | aims to drive some subset of weights to zeroexactly in the one-step, myopic context when balanced only by portfolio risk. A key point to note is that, when the portfolio vector is also subject to the sum-to-one constraint, then the combined loss function also more aggressively penalizes negative weights, i.e., short positions, and so is particularly of interest to personal investors and institutional funds that generally adopt long positions. That is, the absolute weight penalty operates as a soft constraint towards non-negative weights. In our broader context below, this does not theoretically imply non-negative optimal weights, but does often yield such solutions. Modify (6) to the form with weights γ t on the new absolute loss terms at each horizon t = 1:h. Extending the latent variable construction of double exponential distributions to these terms in addition to the turnover terms, we now see that optimizing (12) is equivalent to computing the mode over states w 1:h in a correspondingly extended synthetic DLM. This emulating model is: with synthetic observations m t (scalar) and z t =u t =0 (k−vectors), and where latent scales τ t are augmented with additional terms φ t = φ 1:k,t for each t. Conditioning on φ jt converts the Laplace term exp(−|w jt |/γ t ) to a conditional normal. To incorporate exact linear constraints on each w t , the above is modified only through the implied changes to the W t ; this is precisely as detailed at the end of Section 3.2 above.
Extension of the FFBS/EM algorithm of Section 3.2 provides for computation of the optimizing w 1:h . Each E-step now applies to the latent U 1:h as well as W 1:h , while the M-step applies as before to w 1:h at each iterate. Following initialization at w (0) 1:h , the earlier details of iterates s = 1:S are modified as follows: • E-step: • M-step: FFBS applied to the extended emulating model (13)  defines the optimizing portfolio vector.

Profiled Loss and Marginal Loss
In multi-step portfolio analysis, the decision faced at time t=0 is to choose w 1 only. The future weights w 2:h are involved in the initial specification of the joint loss function L(w 1:h ) in order to weigh expected fluctuations in risk and costs up to the target horizon t=h. From the viewpoint of Bayesian decision theory, this is perfectly correct in the context of the actual decision faced if the approach is understood to be minimizing Joint optimization over w 1:h to deliver the actionable vector w 1 is Bayesian decision analysis with this implied loss as a function of w 1 alone.
The emulation framework provides an approach to computation, but also now suggests an alternative loss specification. With emulating synthetic joint density p(w 1:h ), minimizing the loss L(w 1 ) above is equivalent to profiling out the future hypothetical vectors w 2:h by conditioning on their (joint) modal values. It is then natural to consider the alternative of marginalization over w 2:h ; that is, define the implied marginal loss function L * (w 1 ) as Call L(w 1 ) the profiled loss function and L * (w 1 ) the marginal loss function.
In general, the resulting optimal vectorsŵ 1 (profiled) and w * 1 (marginal) will differ. A key exception is the case of the quadratic loss function and normal synthetic models of Section 2 where the joint posterior p(w 1:h ) is multivariate normal. In that case, joint modes are joint means, whose elements are marginal means, i.e.,ŵ 1 = w * 1 . The situation is different in cases of non-normal emulating models, such based on the Laplace forms. These are now considered further for comparisons of marginal and profile approaches.

Computing Marginal Portfolios under Laplace Loss
Return to the Laplace loss framework of Sections 3.1 and 3.2 (i.e., the extended Laplace context with γ t → ∞) with sum-to-one constraints. Here the key issues of profiled versus marginal losses are nicely illustrated. Similar features arise in the extended Laplace loss context of Section 3.3, but with no new conceptual or practical issues so details of that extension are left to the reader. The FFBS/EM algorithm easily computes the optimal profile portfolioŵ 1 , but it does not immediately extend to evaluating the optimal marginal portfolio w * 1 . Of several approaches explored, the most useful is based on MCMC analysis of the synthetic DLM, coupled with iterative, gradient-based numerical evaluation of the mode of the resulting Monte Carlo approximation to the required marginal density function. Summary details are given here and further explored in application in Section 5.
The density p(w 1 ) is the w 1 margin under the full joint posterior of (w 1:h , τ 1:h ) where τ t = τ 1:k,t is the vector of t−step ahead latent scales. The FFBS/EM approach is enabled by the nice analytic forms of implied conditional posteriors; these also enable MCMC analysis in this conditionally normal DLM with uncertain scale factors. This approach is nowadays standard and easily implemented (e.g. West and Harrison, 1997, chapt. 15;Prado and West, 2010, sect. 4.5). Now the FFBS is exploited to generate backward sampling, rather than the backward smoothing that evaluates posterior modes. At each MCMC iterate, FFBS applies to simulate one draw of the full trajectory of states w 1:h from the retrospective posterior p(w 1:h |τ 1:h ) conditional on current values of the latent scales. Then, conditional on this state trajectory, the conditional posterior p(τ 1:h |w 1:h ) is simulated to draw a new sample of the latent scales. In the emulating model of (11) this second step involves a set of conditionally independent univariate draws, each from a specific GIG (generalized inverse Gaussian) distribution. Applying the sum-to-one constraint on each w t vector changes this structure for the τ jt , however, and direct sampling of the τ jt is then not facile. To address this, we define a Metropolis-Hastings extension for these elements to allow use of the constraint. Summary details of this, and of MCMC convergence diagnostics related to the real-data application in Section 5, are given in Appendix 1.
The MCMC generate samples indexed by superscript (i), i = 1:I, for some chosen sample size I. The Rao-Blackwellized Monte Carlo approximation to the required margin for w 1 is thenp Importantly, this is the density of a mixture of I normals: each conditional p(w 1 |•) in the sum is the implied normal margin in the DLM defined by conditioning values of latent scales, with moments trivially computable via FFBS (using backward smoothing), and the density values are easily evaluated at any w 1 . Thus the portfolio optimization problem reduces to mode-finding in a mixture of multivariate normals, and there are a number of numerical approaches to exploit for this. The most effective is really one of the simplest -a Newton-type updating rule based on the first order derivative of the density, with repeat searches based on multiple initial values for numerical iterates. Relevant candidate initial values can be generated by evaluating the mixture at each of the normal component means, and selecting some of those with highest mixture density.
Further details are noted in Appendix 1.

Data
Evaluation of multi-step portfolios uses data on daily returns of k=9 financial series: exchange rates of 9 international currencies (FX) relative to the US dollar; see

Forecasting Model
Forecasts are generated from a time-varying, vector auto-regressive model of order 2 (TV-VAR(2), e.g. Primiceri, 2005;Nakajima and West, 2013a), with dynamic dependence network structure (DDNM, Zhao et al., 2016). The core form of the model implies a Cholesky-style representation of multivariate stochastic volatility in which dependence parameters as well as variances evolve over time according to simple, discount-factorbased random walks. The DDNM analysis is largely analytic and computationally simple for forward-filtering, and multi-step forecasting is via direct simulation from coupled sets of univariate conditional models. Exploratory analysis of the first 500 observations is used to define the sparsity structure of the dynamic precision matrix for the TV-VAR innovations, i.e., a sparse representation of multivariate volatility, following examples in the above references. From day 501, the analysis is run sequentially in time, updating and forecasting each day. The DDN structure enables analytic filtering and one-step forecasting; forecasting multiple steps ahead in a TV-VAR with DDN structure is performed by direct simulation into the future. For each day t during the investment period, the model generates multiple-step ahead forecast mean vectors and variance matrices, {f t+i , K −1 t+i } i=1:h , given as Monte Carlo averages of 50000 forecast trajectories of the return vectors r t+(1:h) simulated at time t. We take h=5 days as the portfolio horizon, and reset the time index so that t= 0 represents the start of the investment period, January 1, 2009. Appendix 2 provides detailed discussion of the DDN model, use of exploratory training data, filtering and simulation-based forecasting.

Parameters and Metrics
Comparisons summarized below used various values of the portfolio parameters in both the quadratic/normal loss framework of Section 2 and the Laplace loss frameworks of Section 3. In all cases, we take the target return schedule m 1:h to be constant, with m t = 0.0005 representing daily return targets of 0.05%, annualized (261 trading days) to about 13.9%. Then, we have α t > 0 for t < h to define "soft" interim targets rather than strictly enforcing the hard constraint α t = 0. The initial portfolio w 0 is the myopic Markowitz portfolio for comparison. Parameters α t , β t , λ t , γ t define relative weights of the four components of loss. In a long-term context (e.g., when t indexes months or more) some use of discounting into the future becomes relevant. For example, we may take β t , λ t , γ t may be chosen to increase with t, but α t to decrease with t to more aggressively target the soft targets as t approaches h, given the accurate and reliable long-term predictions. In short-term contexts, such as with our 5−day context, this is not relevant, so we take constant weights α t = α = 1, β t = β, λ t = λ, γ t = γ. Setting α = 1 loses no loss of generality, as the remaining three weights are relative to α. Examples use various values of β, λ, γ to highlight their impact on portfolio outcomes. Larger values of β reduce the penalty for risk in terms of overall portfolio variance; larger values of λ leads to more volatile portfolio dynamics due to reduced penalties on transaction costs; and larger values of γ reduce shrinkage of portfolio weights, also relaxing the penalties on shorting.
Portfolios are compared in several ways, key among which is cumulative return over the investment period. With a fixed transaction cost of δ ≥ 0, time t optimal portfolio vector w t and realized return vector r t , cumulative return R t from the period 0:t is R t = −1 + s=1:t {(r t + 1 k ) w t − δ1 k |w t − w t−1 |}. In our examples, we compare cases with δ = 0 and δ = 0.0001. This constraint on transaction cost can further be generalized by replacing δ1 k by k-vector δ t , implying the transaction costs are different across currencies and also dynamic. Our example, however, still approximates the real transaction rates as computed by bid-ask spreads; we observed that all 9 transaction costs move around 0.0001. Also, it is important to note that actual transaction costs observed as a result of the portfolio decision and outcome are distinct from penalties on turnover in quadratic and/or Laplace-style loss functions. The penalizations on turnover w t − w t−1 in such loss functions represent pre-decision preferences of the investor for stable portfolios rather than a direct specification related to trading costs. Of course, as noted in Section 2.1, a major weighting on such components of loss functions will tend to drive down realized transaction costs as a result. We also compare our multistep portfolios with the standard one-step (myopic) Markowitz approach -naturally expected to yield higher cumulative returns with no transaction costs as it then generates much more volatile changes in portfolio weights day-to-day. Our portfolios constraining turnover are naturally expected to improve this in terms of both stability of portfolios and cumulative return in the presence of practically relevant, non-zero δ. Additional metrics of interest are portfolio "risk" as traditionally measured by the realized portfolio standard deviations (w t K −1 t w t ) 1/2 , and patterns of volatility in trajectories of optimized portfolio weights over time.

Normal Loss Portfolios
First examples use the normal loss framework of Section 2 with β = 100. Figure 1 shows trajectories over time of optimized portfolio weight vectors using λ = 100 and λ = 10000, as well as those from the standard, myopic Markowitz analysis that corresponds to λ → ∞. We see the increased smoothness of changes as λ decreases; at λ = 1 the trajectories (not shown) are almost constant. Figure 2 plots trajectories of cumulative returns for three normal loss portfolios (λ = 1, 100 and 10000) and for the Markowitz analysis. Markowitz and larger λ normal loss portfolios performs best -in this metric -with no transaction costs; but the Markowitz approach is very substantially out-performed by the smoother, multi-step portfolios under even very modest transaction costs (δ = 0.0001). Smaller λ induces portfolios more robust to transaction costs. Of note here is that, as seen in the differences between returns with λ = 100 and 10000, during 2009 following the financial crisis, portfolios with larger λ benefit as they are less constrained in adapting. However, later into 2010 and 2011, portfolios with lower λ are more profitable as they define ideal allocations with less switching and therefore save on transaction costs. Figure 3 shows trajectories of realized standard deviations of optimized portfolios, i.e. (w t K −1 t w t ) 1/2 over time, for each of the portfolios in Figure 2, relative to the theoretical lower bound trajectory (1 k K −1 t 1 k ) −1/2 from the myopic, one-step, minimum variance portfolio. Less constrained portfolios with larger λ have lower standard deviations, approaching those of the Markowitz portfolio while also generating smoother changes in portfolio weights and higher cumulative returns. Thus, these portfolios are improved in these latter metrics at the cost of only modest increases in traditional portfolio "risk." The relationship between λ and realized standard deviations is monotone; the larger λ is, the more adaptive the portfolio becomes and the more aggresive in minimizing its expected risk. Although this interpretation seems natural and general, other applications show that this is not always true; a very low value of λ yielding an almost constant portfolio over time might be able to control risk at a level not matched by modestly more adaptive portfolios.
In addition, the comparison with the return trajectory in Figure 2 implies that, in times of higher currency volatility, the portfolios are less profitable. One point to make is that this relates partly to emphasis on targeting trajectories of expected portfolio returns, as the resulting risk of portfolios is then naturally increased as a function of increasing targets. In institutional settings it is more traditional to target risk directly, i.e., to place a heavier weight on controlling the variance of portfolios at the cost of typically lower returns. Modified loss functions, or current loss functions with much higher values of α and lower values of β, would address this while not impacting on the emulation methodology.

Extended Laplace Loss Portfolios
We explore similar graphical summaries from analyses using the extended Laplace loss framework of Section 3.3, and with sum-to-one constraints. Figure 4 shows optimal  weight trajectories with α = 1 and β = 100, and with the different levels of penalization of turnover and absolute weights, i.e., λ = 200, γ = 100. We see expected effects of the two types of shrinkage -of changes in weights and in weights themselves. First, the hard shrinkage of changes induces much less switching in portfolio allocation over time, with longish periods of constants weights on subsets of equities. This occurs even with larger λ where the portfolio becomes more volatile and similar to the Markowitz case. Second, the penalty on absolute weights themselves, and implicitly on short positions as a result in this context of sum-to-one weights, yields trajectories that are basically non-negative on all equities over time. The joint optimization drives some of the weights exactly to zero at some periods of time, indicating a less than full portfolio over these periods. Furthermore, it is evident that there are periods where some of the weights -while not zero -are shrunk to very small values, so that a practicable strategy of imposing a small threshold would yield sparser portfolios -i.e., a "soft" sparsity feature. Values λ > γ favor more stability/persistence in the portfolio allocations, and we see more "stepwise" allocation switches rather than more volatile turnover. Conversely, λ ≤ γ more aggressively favors no-shorting and encourages "soft" sparsity of allocations, resulting in dynamically switching portfolio weights over, generally, fewer assets. Figure 5 plots trajectories of cumulative returns for three extended Laplace loss portfolios to show variation with the value of γ, together with one highly adaptive normal loss portfolio and the Markowitz analysis, for α = 1 and β = 100 fixed. Again we compare cases with transaction cost δ = 0 and 0.0001. As with normal loss comparisons, all multi-step cases dominate the traditional Markowitz analysis under even modest transaction costs. In addition, we now see the ability of the increasingly constrained multi-step Laplace portfolios to outperform unconstrained -and hence more volatile -Markowitz as well as multi-step normal loss portfolios even when transactions costs are zero or ignored. Then, cumulative returns with (λ, γ) = (100, 1000) are essentially uniformly dominated after 2010 by those with (λ, γ) = (100, 10) and (100, 1000), regardless of the existence of transaction costs in this example. This suggests values of γ smaller than or comparable to λ to appropriately balance the two degrees of shrinkage while maintaining relevant returns. One underlying reason for this is the encouragement towards less volatile swings in weights to larger negative/positive values and towards no-shorting as part of that, features that can lead to increased risk and transaction costs.

Comparison of Profiled and Marginal Loss Approaches
We now discuss some analysis summaries related to the discussion of profiled and marginal losses of Section 4. As discussed in Section 4.2 we do this in the Laplace loss framework of Sections 3.1 and 3.2 (i.e., with γ → ∞ in the extended context). First, Figure 6 shows optimal weight trajectories with β = 100 and λ = 100, comparing the Figure 4: Trajectories of optimal portfolio weights (upper) and number of non-zero portfolio weights (lower) using extended Laplace loss with β = 100, λ = 100 and γ = 100.
profiled Laplace loss weightsŵ • of Section 3.2 with the marginal Laplace loss weights w * • of Section 4. Both strategies generate positive weights on GBP, EUR, CAD and JPY FX rates, with a number of the other assets having volatile or quite small weights for longer periods of time. We see constant weights for long periods on adaptively updated subsets of assets using the profiled weights, as expected; these trajectories are effectively smoothed-out and shrunk towards zero under the marginal weights. The latter do not exhibit the exact zero values that the former can, as we now understand is theoretically implied by our representation via the emulating statistical model: marginal modes will not be exactly at zero even when joint modes have subsets at zero. Figure 7 plots trajectories of cumulative returns for both profiled and marginal portfolios in each of the cases with λ = 100 and λ = 10000. With and without transaction cost, the profiled and marginal portfolios are similarly lucrative whatever the value of λ, whereas the profiled portfolios show greater differences. The performance of the two Figure 5: Cumulative returns from extended Laplace loss portfolios with β = 100, λ = 100 and γ =10 (green), 100 (yellow) and 10000 (purple), together with those from the normal loss portfolio with λ = 10000 (blue) and the Markowitz outcomes (pink). The transaction cost is δ = 0 (upper) and 0.0001 (lower). portfolios coincide uniformly and almost completely when λ = 10000, as expected; without the transaction term in the synthetic model, the portfolio choice at each time point is independent to one another, leading to the equivalence between profile and marginal posteriors. In addition, not shown here, portfolios with smaller λ values define far more stable weights while resulting in very similar cumulative returns under both profiled and marginal strategies, as the resulting portfolio weights are very stable over time; this extends this observation as already noted in the normal loss context in Section 5.4.
The marginal strategy tends to be less sensitive to λ than the profiled strategy, suggesting relevance in a "conservative" investment context with respect to loss function misspecification. Even with quite widely varying λ, resulting marginal loss portfolios will be more stable, and far less susceptible to substantial changes and potential deterioration in terms of cumulative returns, than profiled loss portfolios.

Additional Comments
As noted in the introduction, our development of Bayesian emulation for decisions will apply to various decision problems in which the resulting objective function to optimize can be theoretically recast as a synthetic posterior distribution. We have anchored presentation of the concept and resulting new methodology on the example of sequential, multi-step portfolio decisions. Based on a novel class of personalistic utility functions relevant for sequential portfolio decisions, the context illuminates the potential of the emulation approach as well as providing useful practical examples and insights. By the way, the paper has introduced these new classes of portfolio utility functions which, coupled with the emulation methodology, now open the path to broader and more detailed application in investment decision contexts.
In addition to showcasing the application of the concept of "Bayesian emulation for decisions", our interest in multi-step portfolios also highlights the central question of op- timization for one-step decisions in the multi-step view; while specific numerical methods using dynamic programming might be tuned and customized to a specific loss function in this context, the Bayesian emulation approach opens up new approaches and suggests new avenues for development. As we begin in our discussion of marginal versus profiled loss functions, there is now opportunity to drive some part of the research agenda from synthetic statistical models as a starting point, exploring and evaluating the implied loss functions. There are interesting and open theoretical questions about the relationships between marginal and profiled loss functions. In general, this relates to understanding relationships between marginal and joint posterior modes in complicated multivariate settings. In the portfolio context specifically, it will be of interest to explore connections with other functional forms and shapes of investor utility functions, both personal and institutional. Further, as discussed in the portfolio examples, a central applied focus will require evaluation of investors' preferences at a deeper level in order to more contextually assess relevant values of the sets of weights (α t , β t , λ t , γ t ) on components of loss functions. Our empirical examples have illustrated the relative roles of these parameters at a general level, but deployment in practice will involve context-specific elicitation as well as robustness evaluations. In such studies, we would expect to see real practical benefits in the use of the new utility functions, suggested by the initial examples here. We have shown that versions of the Laplace loss functions generate multi-step portfolios that consistently outperform traditional myopic approaches, both with and without transaction costs; they define psychologically and practically attractive framework for investors concerned about portfolio stability over multiple periods with defined targets. These examples show the opportunity -through appropriate selection of loss function parameters -for resulting portfolios to cushion the impacts of economically challenging times for the market, and enhance recovery afterwards, as highlighted in the examples using FX data over 2009-2012. Analysis and evaluation over longer time periods, and in periods of more volatile market behavior such as experienced during 2007-2009, will need to address questions of whether relative weights on components of utility functions might be modified as a function of market conditions, among other things.
In addition to developing a richer case study to explore and better understand the new classes of portfolio utilities introduced here, one specific, current direction is to define classes of non-normal, state-space models with skewed innovation/error distributions that induce asymmetric loss functions. A key idea here is to use discrete/mean-scale mixtures of normals for the innovation/error distributions, so maintaining the ability to use MCMC coupled with FFBS/EM methods for mode-finding while generating a very rich class of resulting loss functions. One key and desirable feature of the latter, in particular, is to represent high penalties on portfolio short-fall relative to moderate or expected gains. This direction, and others opened-up by the "Bayesian emulation for decisions" approach, offers potential for impact on research frontiers in statistics and decision theory as well as productive application in financial portfolio development and other areas.