Continuity of Cost in Borkar Control Topology and Implications on Discrete Space and Time Approximations for Controlled Diffusions under Several Criteria

We first show that the discounted cost, cost up to an exit time, and ergodic cost involving controlled non-degenerate diffusions are continuous on the space of stationary control policies when the policies are given a topology introduced by Borkar [V. S. Borkar, A topology for Markov controls, Applied Mathematics and Optimization 20 (1989), 55-62]. The same applies for finite horizon problems when the control policies are markov and the topology is revised to include time also as a parameter. We then establish that finite action/piecewise constant stationary policies are dense in the space of stationary Markov policies under this topology. Using the above mentioned continuity and denseness results we establish that finite action/piecewise constant policies approximate optimal stationary policies with arbitrary precision. This gives rise to the applicability of many numerical methods such as policy iteration and stochastic learning methods for discounted cost, cost up to an exit time, and ergodic cost optimal control problems in continuous-time. For the finite-horizon setup, we establish additionally near optimality of time-discretized policies by an analogous argument. We thus present a unified and concise approach for approximations directly applicable under several commonly adopted cost criteria.


Introduction
In this paper, we study regularity properties of induced cost (under several criteria) on a controlled diffusion process with respect to a control topology defined by Borkar [1], and implications of these properties on existence and, in particular, approximations for optimal controlled diffusions. We will arrive at very general approximation results for optimal control policies by quantized (finite action / piecewise constant) stationary control policies for a general class of controlled diffusions in the whole space R d as well as time-discretizations for the criteria with finite horizons.
Such a problem is of significant practical consequence, and accordingly has been studied extensively in a variety of setups. Due to its wide range of applications in domains that spans from mathematical finance, large deviations and robust control, vehicle and mobile robot control and several other fields, the stochastic optimal control problems for controlled diffusions have been studied extensively in literature see, e.g., [2], [3] (finite horizon cost) [4], [5] (discounted cost) [6], [7], [8], [9], [10], [11] (ergodic cost) and references therein . Typically, there are two main approaches to deal with these problems. The first one is the Bellman's Dynamic Programming Principal (DPP). The DPP approach allows one to characterize the value function of the optimal control problem as the unique solution of the associated Hamilton-Jacobi-Bellman (HJB) equation [2], [3], [11], [12], [13]. The second one is Pontryagin maximum principal (in the stochastic framework) [14] .
For numerical methods as well as learning theoretic methods, it is imperative to arrive at rigorous approximation results.
For finite horizon criteria, a commonly adopted approach of approximating controlled diffusions by a sequence of discrete time Markov chain via weak convergence methods was studied by Kushner and Kushner and Dupuis, see [15], [16], [17] . These works deal with numerical procedures to construct near optimal control policies for controlled diffusion models by approximating the space of (open-loop adapted) relaxed control policies with those that are piece-wise constant, and by considering the weak convergence of approximating probability measures on the path space to the measure on the continuous-time limit. It is shown in [15], [16], [17] that if the constructed controlled Markov chain satisfies a certain "consistency" condition at the discrete-time sampling instants, then the state process and the corresponding value function asymptotically approximates the continuous time state process and the associated value function. This approach has been referred to as the weak convergence approach.
In an alternative program, building on finite difference approximations for Bellman's equations utilizing their regularity properties, Krylov [18], [19] established the convergence rate of for such approximation techniques, where finite difference approximations are studied to arrive at stability results. In particular, some estimates for the error bound of the finite-difference approximation schemes in the problem of finding viscosity or probabilistic solutions to degenerate Bellman's equations are established. The proof technique is based on mean value theorems for stochastic integrals (as in [23]), obtained on the basis of elementary properties of the associated Bellman's equations . Also, for controlled non-degenerate diffusion processes, it is shown in [24] that using policies which are constant on intervals of length h 2 , one can approximate the value function with errors of order h 1 3 . In [20], [21] Barles et. al. improved the error bounds obtained in [18], [19], [24] . Borkar [1], [2], for the finite-horizon cost case pursued an alternative approach to show continuity (when only stationary state feedback policies are considered for finite horizon problems) in his newly introduced topology; he studied the dependence of the strategic measures (on the path space) on the control policy, via regularity properties of generator functions. Additionally, Borkar [1] did not study the implications in approximations.
Instead of the approaches adopted in the aforementioned studies, in this paper, utilizing regularity results of the associated Poisson equations via PDE theory, we arrive at continuity results under relatively weaker set of assumptions on the diffusion coefficients (with the exception of Krylov's method, which is tailored for finite horizon problems). Our approach allows one to arrive at a unification of approximation methods for finite horizon criterion, infinite discounted criterion, control up to an exit time, and ergodic cost criterion problems. Accordingly, our primary approach is to utilize the regularity properties of the partial differential equations directly, first via uniqueness of solutions, and then via regularity properties of the solutions to establish consistency of optimality equations satisfied by the limits of solutions (as policies converge). We will see that one can obtain rather concise, direct, and general results.
Additionally, our results can be used to present weaker conditions under which the weak convergence methods can be applicable or when discretized approximations can be shown to be near optimal: For example it will be a consequence of our analysis that for many of the criteria one can utilize piece-wise continuous or continuous control policies for near optimality, which implies [15, Assumption A2.3, pp. 322] used for approximations under ergodic cost criteria (where invariant measures under sampled chains can be shown to converge to the invariant measure of a continuoustime limit as discretization gets finer). Furthermore, we do not impose uniform boundedness conditions on the drift term or (uniform) Lipschitz continuity conditions, a common assumption in [15], [16], [17], [18], and [19].
As noted above, the study of the finite action/piecewise constant approximation problem plays important role in computing near optimal policies and learning algorithms for controlled diffusions in R d . As it is pointed out in [25], [26], piecewise constant policies are also useful in numerical methods for solving HJB equations. The computational advantage comes from the fact that over the intervals in which the policy is constant, we have to only solve the linear PDEs . In the continuous time setup learning problems become much more involved due to the complex structure of the dynamics and the optimality equation. One common approach to overcome these difficulties is to construct simpler models by discretizing time, space and action spaces which approximates the original continuous time model . In a recent work [27], the authors studied an approximate Qlearning algorithm for controlled diffusion models by discretizing the time, space and action spaces. Under mild assumptions, they produced a learning algorithm which converges to some approximately optimal control policy for a discounted cost problem. They assumed that the discretization is uniform in time but the discretization in state and action can be non-uniform . Similar learning algorithm for controlled diffusions is proposed in [28], this result is based on the finite difference and finite element approximations (as in, [15]) . Thus, if one can establish that learning a control model with finitely many control actions is sufficient for the approximate optimality, then it will be easier to produce efficient learning algorithms for the original model .
In the literature of discrete time Markov decision processes (MDPs), various approximation techniques are available to address the approximation problems, e.g., approximate dynamic programming, approximate value or policy iteration, approximate linear programming, simulation based techniques, neuro-dynamic programming (or reinforcement learning), state aggregation, etc. (see [29], [30], [31], [32] and the references therein) . For discrete time controlled models the near optimality of quantized policies studied extensively in the literature see, e.g., [33], [34], [35], [36], [32] . In [34], [35], authors studied the finite state, finite action approximation (respectively) of fully observed MDPs with Borel state and action spaces, for both discounted and average costs criteria . In the compact state space case explicit rate of convergence is also established in [34] . Later, these results are extended to partially observed Markov decision process setup in [33], [36], also see the references therein . Recently, [37,Section 4] established the denseness of the performance of deterministic policies with finite action spaces, among the performance values attained by the set of all randomized stationary policies.
Contributions and main results. In this manuscript our main goal is to study the following approximation problem: for a general class of controlled diffusions in R d under what conditions one can approximate the optimal control policies for both finite/infinite horizon cost criteria by policies with finite actions/ piecewise constant/continuous policies? While the time discretization approximation results for finite horizon problems, studied extensively by Krylov [18], [19], [24] (for degenerate diffusions), we will discuss this (for the non-degenerate case) as an application our results.
In order to address these questions, we first show that both finite horizon and infinite horizon (discounted/ergodic) costs are continuous as a function of control policies under Borkar topology [1]. We establish these results by exploiting the existence and uniqueness results of the associated Poisson equations (see, Theorem 6.1 (finite horizon), Theorem 3.1 (discounted), Theorem 3.2 (control up to an exit time), Theorem 3.4, 3.6 (ergodic)). The analysis of ergodic cost case is relatively more involved. One of the major issues in analyzing the ergodic cost criteria under the near-monotone hypothesis is the non-uniqueness/restricted uniqueness of the solution of the associated HJB/Poisson equation (see, [11,Example 3.8.3], [7]) . In [11,Example 3.8.3], [7] it is shown that under near-monotone hypothesis the associated HJB/Poisson equation may admit uncountable many solutions . In this paper, we have shown that under near-monotone hypothesis the associated Poisson equation admits unique solution in the space of compatible solution pairs (see, [7, Definition 1.1]) . Continuity results obtained in the paper will be also useful in establishing the existence of optimal policies of the corresponding optimal control problems.
Next, utilizing the Lusin's theorem and Tietze's extension theorem we show that under Borkar topology, quantized (finite actions/ piecewise constant) stationary policies are dense in the space of stationary Markov policies (see, Section 4) . Also, following the analogous proof technique, we establish the denseness of space continuous stationary polices in the space of stationary policies (see Theorem 4.2) .
Following and briefly modifying the proof technique of the denseness of stationary policies, including time also as a parameter we establish that piecewise constant Markov policies are dense in the space of Markov policies under Borkar topology (see, Theorem 6.2).
Then, using our continuity and denseness results, we deduce that for both finite and infinite horizon cost criteria, the optimal control policies can be approximated by quantized (finite actions/ piecewise constant) policies with arbitrary precision (see, Theorem 6.3 (finite horizon), Theorem 5.2 (control upto an exit time), Theorem 5.3, 5.4 (infinite horizon)).
The remaining part of the paper is organized as follows. In Section 2 we provide the problem formulation . The continuity of discounted cost/ cost up to an exit time as a function of control policy are proved in Section 3.1. Similar continuity result for ergodic cost is presented in Section 3.2, where we establish these results under two types of condition; stability or near-monotonicity. Section 4 is devoted to establish the denseness of finite action/piecewise constant stationary policies under Borkar topology. Then using the denseness and continuity results we show the near optimality of finite models for cost up to an exit time and discounted/ ergodic cost criteria in Section 5. Finally, in Section 6, we analyze the denseness of piecewise constant Markov policies under Borkar topology and then exploiting the denseness result we prove the near optimality of the piecewise constant Markov policies for finite horizon cost criterion.

Notation:
• For any set A ⊂ R d , by τ(A) we denote first exit time of the process {X t } from the set A ⊂ R d , defined by τ(A) := inf {t > 0 : X t ∈ A} . • B r denotes the open ball of radius r in R d , centered at the origin, • τ r ,τ r denote the first exist time from B r , B c r respectively, i.e., τ r := τ(B r ), andτ r := τ(B c r ). • By Tr S we denote the trace of a square matrix S. • For any domain D ⊂ R d , the space C k (D) (C ∞ (D)), k ≥ 0, denotes the class of all realvalued functions on D whose partial derivatives up to and including order k (of any order) exist and are continuous.
consisting of functions that have compact support. This denotes the space of test functions.
, denotes the subspace of C k (D), 0 ≤ k < ∞, consisting of functions that vanish in D c . • C k,r (D), denotes the class of functions whose partial derivatives up to order k are Hölder continuous of order r. • L p (D), p ∈ [1, ∞), denotes the Banach space of (equivalence classes of) measurable functions f satisfying D |f (x)| p dx < ∞. • W k,p (D), k ≥ 0, p ≥ 1 denotes the standard Sobolev space of functions on D whose weak derivatives up to order k are in L p (D), equipped with its natural norm (see, [38]) . • If X (Q) is a space of real-valued functions on Q, X loc (Q) consists of all functions f such that f ϕ ∈ X (Q) for every ϕ ∈ C ∞ c (Q). In a similar fashion, we define W k,p loc (D). [39]) Also, we use the following convention f W 1,2,p,µ = f 1,2,p,µ .

The Borkar Topology on Control Policies, Cost Criteria, and the Problem Statement
Let U be a compact metric space and V = P(U) be the space of probability measures on U with topology of weak convergence.
1≤i,j≤d , be given functions. We consider a stochastic optimal control problem whose state is evolving according to a controlled diffusion process given by the solution of the following stochastic differential • W is a d-dimensional standard Wiener process, defined on a complete probability space (Ω, F, P).
for v ∈ V.
• U is a V valued adapted process satisfying following non-anticipativity condition: for s < t , W t − W s is independent of F s := the completion of σ(X 0 , U r , W r : r ≤ s) relative to (F, P) .
The process U is called an admissible control, and the set of all admissible controls is denoted by U (see, [8]). By a Markov control we mean an admissible control of the form U t = v(t, X t ) for some Borel measurable function v : The space of all Markov controls is denoted by U m . If the function v is independent of t, i.e., U t = v(X t ) then U or by an abuse of notation v itself is called a stationary Markov control. The set of all stationary Markov controls is denoted by U sm . To ensure existence and uniqueness of strong solutions of (2.1), we impose the following assumptions on the drift b and the diffusion matrix σ .
(A1) Local Lipschitz continuity: The function σ = σ ij : locally Lipschitz continuous in x (uniformly with respect to the other variables for b). In other words, for some constant C R > 0 depending on R > 0, we have for all x, y ∈ B R and ζ ∈ U, where σ := Tr(σσ T ) . Also, we are assuming that b is jointly continuous in (x, ζ).
(A2) Affine growth condition: b and σ satisfy a global growth condition of the form for some constant C 0 > 0.
(A3) Nondegeneracy: For each R > 0, it holds that and for all z = (z 1 , . . . , z d ) T ∈ R d , where a := 1 2 σσ T . 2.1. The Borkar Topology on Control Policies. We now introduce the Borkar topology on stationary or Markov controls [1] • Topology of Stationary Policies: From [11, Section 2.4], we have that the set U sm is metrizable with compact metric.
It is well known that under the hypotheses (A1)-(A3), for any admissible control (2.1) has a unique weak solution [11,Theorem 2.2.11], and under any stationary Markov strategy (2.1) has a unique strong solution which is a strong Feller (therefore strong Markov) process [11, Theorem 2.2.12].

Cost Criteria.
Let c : R d × U → R + be the running cost function. We assume that c is bounded, jointly continuous in (x, ζ) and locally Lipschitz continuous in its first argument uniformly with respect to ζ ∈ U. We extend c : In this article, we consider the problem of minimizing finite horizon cost, α-discounted cost and ergodic cost, respectively: 2.2.1. Finite Horizon Cost. For U ∈ U, the associated finite horizon cost is given by (2.4) and the optimal value is defined as Then a policy U * ∈ U is said to be optimal if we have For U ∈ U, the associated α-discounted cost is given by where α > 0 is the discounted factor and X(·) is the solution of (2.1) corresponding to U ∈ U and E U x is the expectation with respect to the law of the process X(·) with initial condition x. The controller tries to minimize (2.7) over his/her admissible policies U . Thus, a policy U * ∈ U is said to be optimal if for all where V α (x) is called the optimal value.

Ergodic Cost Criterion.
For U ∈ U, the associated ergodic cost is given by and the optimal value is defined as Then a policy U * ∈ U is said to be optimal if we have Control up to an Exit Time. For each U ∈ U the associated cost is given aŝ

Problems Studied.
The main purpose of this manuscript will be to address the following problems: • Continuity of finite and infinite horizon costs. Suppose {v n } n∈N is a sequence of control policies which converge to another control policy v in some sense (in particular, under Borkar topology, see Subsection 2.1). Does this imply that • for finite horizon cost: • for cost up to an exit time:Ĵ vn e (x) →Ĵ v e (x) ? • Near optimality of quantized policies. For any given ǫ > 0, whether it is possible to construct a quantized (finite action/ piecewise constant) policy v ǫ such that • for finite horizon cost: • for cost up to an exit time:Ĵ vǫ e (x) ≤Ĵ * e (x) + ǫ ? In this manuscript, we have shown that under a mild set of assumptions the answers to the above mentioned questions are affirmative. For the finite horizon case, we also study the timediscretization approximations as a further implication of our analysis.
Let us introduce a parametric family of elliptic operator, which will be useful in our analysis . With ζ ∈ U treated as a parameter, we define a family of operators L ζ mapping for v ∈ V we extend L ζ as follows: Now by standard elliptic p.d.e. estimates as in [40,Theorem 9.11], for any p ≥ d + 1 and R > 0, we deduce that for some positive constant κ 1 which is independent of n . Since We know that for 1 < p < ∞, the space W 2,p (B R ) is reflexive and separable, hence, as a corollary of Banach Alaoglu theorem, we have that every bounded sequence in W 2,p (B R ) has a weakly convergent subsequence (see, [41,Theorem 3.18.]). Also, we know that for p ≥ d+ 1 the space , which implies that every weakly convergent sequence in W 2,p (B R ) will converge strongly in C 1,β (B R ) . Thus, in view of estimate (3.3), by a standard diagonalization argument and Banach Alaoglu theorem, we can extract a subsequence In the following we will show that V Hence, as k → ∞, we obtain Hence, using (3.4), (3.5), and letting k → ∞ (in the sense of distributions), we obtain Let X be the solution of the SDE (2.1) corresponding to v. Now applying Itô-Krylov formula, we obtain the following Hence, by (3.7), we get This completes the proof.  [40,Theorem 9.15], it follows that there exist a unique function ψ n (x) ∈ W 2,p (O) satisfying the following Poisson's equation (3.10) Applying Itô-Krylov formula, one can show that ψ n (x) =Ĵ vn e (x) (this stochastic representation also ensures the uniqueness of the solution of (3.10) ) . Now following the argument as in Theorem 3.1, by standard elliptic p.d.e. estimates [40, Theorem 9.11], we deduce that there exists ψ(x) ∈ W 2,p (O) such that ψ n → ψ weakly in W 2,p (O) . Thus, closely following the proof of Theorem 3.1, letting n → ∞ , from (3.10) it follows that (3.11) Again, by Itô-Krylov formula, using (3.11) we deduce that ψ(x) =Ĵ v e (x) . This completes the proof of the theorem .

Continuity for Ergodic Cost.
In this section we study the continuity of the ergodic costs with respect to policies under Borkar topology in the space of stationary Markov policies. We will study this problem under two sets of assumptions: the first is so called near-monotonicity assumption on the running cost function and other one is Lyapunov stability assumption on the system. Our proof strategies will be slightly different under these two setups: In the former we will build on regularity properties of invariant probability measures, in the latter we will build more directly on regularity properties of solutions to HJB equations .

3.2.1.
Under a near-monotonicity assumption. We assume that the running cost function c is nearmonotone with respect to E * (c), i.e., (A4) It holds that lim inf This condition penalizes the escape of probability mass to infinity. Since our running cost c is bounded it is easy to see that E * (c) ≤ c ∞ . It is known that under (3.12), optimal control exists in the space of stable stationary Markov controls (see, [11,Theorem 3.4.5]). First, we prove that for each stable stationary Markov policy v ∈ U sm the associated Poisson's equation admits a unique solution in a certain function space. This uniqueness result will be useful in establishing the continuity and near optimality of quantized policies. For the following supporting result, we closely follow [11] . (3.13) Then, there exists a unique pair (3.14) is the α-discounted cost defined as in (2.7). As earlier, we have that J v α (x, c) is a solution to the Poisson's equation (see, [11,Lemma A.3.7]) c) . Thus, following the arguments as in [11, Lemma 3.6.3], we deduce that for each R > R 0 there exist constantsC 2 (R),C 2 (R, p) depending only on d, R 0 such that Hence, by following the arguments as in [11, Lemma 3.6.6], we conclude that there exists (V v ,ρ v ) ∈ W 2,p loc (R d )×R such that along a subsequence (as (3.20) We will show that the subsequential limits are unique . From (3.16), we getρ v ≤ ρ v . Now, in view of estimates (3.16) and (3.19), it is easy to see that where in the third inequality we have used the fact that inf Now, applying Itô-Krylov formula and using (3.20) we obtain Since v is stable, letting R → ∞, we get Now dividing both sides of the above inequality by T and letting T → ∞, it follows that Thus, ρ v ≤ρ v . This indeed implies that ρ v =ρ v . The representation (3.15) of V v follows by closely mimicking the argument of [11, Lemma 3.6.9]. Therefore, we have a solution pair (V v , ρ v ) to (3.14) satisfying (i) and (ii).
Next we want to prove that the solution pair is unique. To this end, let SinceV v is bounded from below, applying Itô-Krylov formula and using (3.24) we get Hence, from (3.25), it follows that This implies thatρ v = ρ v . Now, applying Itô-Krylov formula and using (3.24), we obtain Since v is stable andV v is bounded from below, for all x ∈ R d we obtain In the above we have used the fact that [11,Theorem 2.6.10]) . Hence, letting R → ∞ by Fatou's lemma from (3.27), it follows that SinceV v (0) = 0, letting r → 0, we deduce that Proof. From Theorem 3.3, we know that for each n ∈ N there exists (V vn , ρ vn ) ∈ W 2,p loc (R d ) × R, 1 < p < ∞, with V vn (0) = 0 and inf R d V vn > −∞, satisfying Since c is bounded, the first term of the right hand side converges to zero since ϕ vn → ϕ in L 1 (R d ) and the second term converges to zero by the convergence of v n → v (see Definition 2.1) . Hence, it follows that This completes the proof.
Remark 3.1. The tightness assumption is not superfluous. In view of [7], we know that the map where h (> 0) is locally Lipschitz continuous in its first argument uniformly with respect to the second and V > 1.  withV v (0) = 0. Thus, by Itô-Krylov formula, for R > 0 we obtain Thus, letting R → ∞ by monotone convergence theorem, we get SinceV v ∈ o(V), in view of [11, Lemma 3.7.2 (ii)], letting R → ∞, we deduce that Hence, dividing both sides of (3.36) by T and letting T → ∞, we obtain This implies thatρ v = ρ v . Again, applying Itô-Krylov formula and using (3.33), we havē Also, from (3.32), by Itô-Krylov formula it follows that This gives us the following (since h(x, ζ) > 0) SinceV v ∈ o(V), from the above estimate, we get Thus, letting R → ∞ by Fatou's lemma from (3.37), it follows that SinceV v (0) = 0, letting r → 0, we deduce that Since ρ v =ρ v , from (3.34) and (3.38), it follows that Proof. From Theorem 3.5, we know that for each n ∈ N there exists unique solution pair ( In view of (3.32), it is easy to see that, each v ∈ U sm is stable and inf v∈Usm η v (B R ) > 0 for any R > 0 (see, [ where the positive constant C 2 (R, p) depends only on R and p . Since the running cost is bounded we have c ∞ ≤ M for some positive constant M . Thus, we have ρ vn ≤ M . Hence from (3.41), we deduce that

This implies that
where C 3 (R, p) is a positive constant which depends only on R and p . Hence, by a standard diagonalization argument and Banach Alaoglu theorem (see, (3.4)), one can extract a subsequence Also, since ρ vn ≤ M , along a further subsequence ρ vn k → ρ * (without loss of generality denoting by the same sequence). Now, by similar argument as in Theorem 3.1, multiplying by test function on the both sides of (3.40) and letting k → ∞, we deduce that ( V * , ρ * ) ∈ W 2,p loc (R d ) × R satisfies ρ * = L v V * (x) + c(x, v(x)) . (3.44) Since V vn (0) = 0 for each n, we get V * (0) = 0 Next we want to show that V * ∈ o(V). Following the proof of [11, Lemma 3.7.8] (see, eq.(3.7.47) or eq.(3.7.50)), it is easy to see that This gives us the following estimate We know that, for d < p < ∞, the space Since U is compact, we have U is totally bounded. Thus, one can find a sequence of finite grids Let Λ n := {ζ n,1 , ζ n,2 , . . . , ζ n,kn } and define a function Q n : U → Λ n by Q n (ζ) = arg min where ties are broken so that Q n is measurable. The function Q n is often known as nearest neighborhood quantizer (see, [34]). For each n the function Q n induces a partition {U n,i } kn i=1 of the action space U given by U n,i = {ζ ∈ U : Q n (ζ) = ζ n,i } .
By triangle inequality, it follows that diam(U n,i ) := sup ζ 1 ,ζ 2 ∈U n,i d U (ζ 1 , ζ 2 ) < 2 n . Now, for each v ∈ U sm define a sequence of policies with finite actions as follows: v n (ζ n,i |x) = Q n v(ζ n,i |x) = v(U n,i |x) . (4.1) In the next lemma we prove that the space of stationary policies with finite actions are dense in U sm with respect to the Borkar topology (see, Definition 2.1) .
Lemma 4.1. For each v ∈ U sm there exists a sequence of policies {v n } n (defined as in (4.1)) with finite actions, satisfying Proof. Let f ∈ L 1 (R d ) ∩ L 2 (R d ) and g ∈ C b (R d × U). Then from the construction of the sequence {v n } n , it is easy to see that Since g ∈ C b (R d × U) and diam(U n,i ) < 2 n , it follows that As we know that g is bounded, for some positive constant M 1 we have |g| ≤ M 1 . Thus, we deduce that This completes the proof of the lemma.
Proof. Let B 0 = ∅ and define D n = B n \ B n−1 for n ∈ N . Thus it is easy to see that R d = ∪ ∞ n=1 D n . Since each v ∈ U sm is a measurable map v : R d → V, it follows thatv n := v| Dn : D n → V is a measurable map. Hence, by Lusin's theorem (see [43,Theorem 7.5.2]), for any ǫ n > 0 there exists a compact set K ǫn n ⊂ D n and a continuous functionv ǫn n : K ǫn n → V such that (the Lebesgue measure of the set D n \ K ǫn n ) |(D n \ K ǫn n )| < ǫ n andv n ≡v ǫn n on K ǫn n . Again, Tietze's extension theorem (see [44,Theorem 4.1]) there exists a continuous functionṽ ǫn n : D n → V such thatṽ ǫn n ≡v ǫn n on K ǫn n . Step1 Therefore for anyf ∈ L 1 (R d ) ∩ L 2 (R d ) andĝ ∈ C(U), we have Now, since (V, d P ) is compact, for each m ∈ N there exists a finite set Λ m = {µ m,1 , µ m,2 , . . . , µ m,km } such that inf Let Q m : V → Λ m be defined as Ties are broken so that Q m is a measurable map. Hence, it induces a partition { U m,i } km i=1 of the space V which is given by By triangle inequality it is easy to see that diam( U m,i ) := sup for all x ∈ D n and m ∈ N .
It is well known that in C b (B N 1 × U) the functions of the form { m i r i (x)p i (ζ)} m∈N forms an algebra which contains constants, where r i ∈ C(B N 1 ) and p i ∈ C(U) . Thus by Stone-Weierstrass theorem there existsm (large enough) such that . (4.11) Since p i ∈ C(U) we can find h j(i) ∈ C(U) such that Also, since f r i ∈ L 1 (R d ) there existsf k(i) such that Now, using (4.11), (4.12), (4.13) we have the following  N 1 ). Therefore, from (4.10) and (4.14), we conclude that Proof. As earlier we have {f i } i∈N is a countable dense set in L 1 (R d ) . Now for each i ∈ N, define a finite measure ν i on (R d , B(R d )), given by Let v ∈ U sm . Then, as in the proof of Theorem 4.1, by successive application of Lusin's theorem (see [43,Theorem 7.5.2]) and Tietze's extension theorem (see [44,Theorem 4.1]), for any ǫ i > 0 there exists a closed set K i ∈ R d and a continuous function v i : Since {f i } i∈N is dense in L 1 (R d ), by choosing ǫ i appropriately, we obtain our result .

Near Optimality of Finite Models for Controlled Diffusions
First we prove the near optimality of quantized policies for the α-discounted cost.
Theorem 5.1. Suppose Assumptions (A1)-(A3) hold. Then for each ǫ > 0 there exists a policy v * ǫ ∈ U sm with finite actions and piecewise constant policiesv * ǫ ∈ U sm such that is continuous on U sm (see, Theorem 3.1) and the space of quatized stationary policies are dense in U sm (see, Lemma 4.1), it follows that for each ǫ > 0 there exists a quatized policy v * ǫ ∈ U sm satisfying (5.1) . Similarly, since the peicewise constant policies are dense in U sm (see, Theorem 4.1), we conclude that for any ǫ > 0 there exists v * ǫ ∈ U sm which satisfies (5.1) . This completes the proof. We now show that for the cost upto an exit time, the quantized (finite action/ piecewise constant) policies are near optimal .
Theorem 5.2. Suppose Assumptions (A1)-(A3) hold. Then for each ǫ > 0 there exists a policy v * ǫ ∈ U sm with finite actions and piecewise constant policiesv * ǫ ∈ U sm such that Proof. From [45, p. 229], we know that there exists v * ∈ U sm such thatĴ v * e (x) = inf U ∈UĴ U e (x) . Now form the continuity of the map v →Ĵ v e (x) (see Theorem 3.2) and the density results (see Section 4), it is easy to see that for any given ǫ > 0 there exists policies v * ǫ ∈ U sm with finite actions and piecewise constant policiesv * ǫ ∈ U sm satisfying (5.2) . This completes the proof of the theorem .
Next we prove the near optimality of the quantized policies for the ergodic cost under nearmonotonicity assumption on the running cost . Let Θ v := {v n | v n is the quantized policy defined as in (4.1) corresponding to v} andΘ v := {v n |v n is the quantized policy defined as in (4.7) corresponding to v} . In order to establish our result we are assuming that the invariant measures set andΓ v * := {ηv * n | ηv * n is the invariant measure corresponding tov * n ∈Θ v * } are tight, where v * ∈ U sm is an ergodic optimal control. The sufficient condition which assures the required tightness is the following: if there exists a non-negative inf-compact function f ∈ C 2 (R d ) such that Theorem 5.3. Suppose that Assumptions (A1) -(A4) hold. Also, suppose that corresponding to the optimal policy v * ∈ U sm , the following set of invariant measures Γ v * andΓ v * are tight and the running cost c is near monotone with respect to Then for any given ǫ > 0 there exists a policy v ǫ ∈ U sm with finite actions and a piecewise constant policyv ǫ ∈ U sm such that  Proof. From [11, Theorem 3.7.14], we know that there exists v * ∈ U sm such that E x (c, v * ) = E * (c). Now, since the space of quantized polices and piecewise constant policies are dense in U sm (see, Lemma 4.1 and Theorem 4.1) and the map v → inf R d E x (c, v) is continuous on U sm (see, Theorem 3.6). For any given ǫ > 0, one can find a quantized policy v ǫ ∈ U sm with finite actions and a piecewise constant policyv * ǫ ∈ U sm such that (5.4) holds. Remark 5.1. In view of the continuity (see Section 3.1, Section 3.2) and the denseness (see Theorem 4.2) results, we have the near optimality of continuous stationary policies .

Finite Horizon Cost: Time Discretization of Markov Policies and Near Optimality of Piecewise Constant Policies
Recall (2.4) as our cost criterion for the finite horizon setup. We will present three results in this section, where the ultimate goal is to arrive at near optimality of piecewise constant policies. While this approximation problem is a well-studied problem [15], [46], [47], our proof method is rather direct and appears to be new. Under uniform Lipschitz continuity and uniform boundedness assumptions on the diffusion coefficients and running cost function, in [15], [46], [47] the authors have established similar approximation results using numerical procedures .
Near Optimality of Piecewise Constant Policies for Finite Horizon Cost. Now, from Theorem 6.2 and Theorem 6.1, we have the following near-optimality results .
Theorem 6.3. Suppose that assumptions (A1),(A3) and (B1) hold. Then for any given ǫ > 0 there exists a piecewise constant policyv * ǫ ∈ U m such that J T (x,v * ǫ ) ≤ J * T + ǫ for all x ∈ R d . (6.16) Proof. From our previous discussion, we know that there exists v * ∈ U m such that J T (x, v * ) = J * T . Since the space of piecewise constant policies are dense in U m (see Theorem 6.2) and the map v → J T (x, v) is continuous on U m (see Theorem 6.1), for any given ǫ > 0, one can find a piecewise constant policyv * ǫ ∈ U m such that (6.16) holds . Remark 6.1. In view of the existence results as in [48,Chapter 4], in obtaining the near optimality of piecewise constant Markov policies for finite horizon costs, one can relax the uniform boundedness assumption (B1), in particular, under (A1)-(A3) we can deduce similar results . Which extends the results of [15], [46], [47] to a more general control model .

Conclusion
We studied regularity properties of induced cost (under several criteria) on a controlled diffusion process with respect to a control policy space defined by Borkar [1]. We then studied implications of these properties on existence and, in particular, approximations for optimal controlled diffusions. Via such a unified approach, we arrived at very general approximation results for optimal control policies by quantized (finite action / piecewise constant) stationary control policies for a general class of controlled diffusions in the whole space R d as well as time-discretizations for the criteria with finite horizons.