Controlled random walk with a target site

We consider a simple random walk W_i in 1 or 2 dimensions, in which the walker may choose to stand still for a limited time. The time horizon is n, the maximum consecutive time steps which can be spent standing still is m_n and the goal is to maximize P(W_n=0). We show that for dimension 1, if m_n grows faster than (\log n)^{2+\gamma} for some \gamma>0, there is a strategy for each n such that P(W_n = 0) approaches 1. For dimension 2, if m_n grows faster than a positive power of n then there are strategies keeping P(W_n=0) bounded away from 0.


Introduction.
We consider a process {W i , 0 ≤ i ≤ n} on Z d (d = 1 or 2) in which W 0 = 0 and each step W i − W i−1 either is 0 (i.e.standing still) or is a step ±e i of symmetric simple random walk (SSRW), with e i being the ith unit coordinate vector.The choice between standing still and SSRW step is determined by a strategy.Formally, a strategy (or n-strategy) is a mapping δ n : T n → {0, 1} defined on the space T n = {(i, w i ), 0 ≤ i ≤ j} : 0 ≤ j ≤ n − 1, |w i − w i−1 | ≤ 1 for all i of all space-time trajectories of length less than n; here the values 0 and 1 for δ n correspond to standing still and taking a SSRW step, respectively.Thus the choice of whether to take a step at a time i depends on the trajectory up to time i − 1. Letting {Y i , i ≥ 1} be SSRW, we then construct the controlled random walk process iteratively from the strategy by where We define the time since the last SSRW step to be in this definition is empty.We designate a maximum number m n − 1 of consecutive steps standing still and say that a strategy is depends only on the pair (W i , J i ).It is easily seen that if the strategy is Markov, then {(W i , J i ) : i ≥ 0} is a Markov process.We write A n for the set of all admissible strategies and A * n for the set of all admissible Markov strategies.When the dependence on the strategy needs to be made clear, we write P (•; δ n ) for probability when strategy δ n is used, but generally we suppress the δ n in the notation.We are interested in steering the process toward a target site, and specifically in the behavior of (1.1) as n → ∞.The word "steering" is a bit misleading here, as the process never has a drift.For SSRW, of course n } this strategy has a success probability of order 1, so the unconditional success probability is of order min(m n /n 1/2 , 1); in particular it is bounded away from 0 if This brings up two questions.For which {m n } is P (W n = 0) bounded away from 0? And are there nontrivial {m n } for which P (W n = 0) → 1? Our two main theorems give some answers.
In one dimension, the "more sophisticated" strategy described above relies on the fact that a SSRW started at distance k from 0 has a probability of order one to hit 0 by time k 2 .In two dimensions, this probability is only of order (log k) −1 so it is more difficult to construct a strategy based on waiting for the RW to return to 0 after it has wandered away.Nonetheless we have the following.Theorem 1.2.Suppose d = 2 and m n ≫ n ǫ , for some ǫ > 0. Then In (1.1) one could consider only Markov strategies, i.e. take the sup over A * n .Since the underlying SSRW is Markov, one cannot actually do better with a non-Markov strategy, so the two sups are the same.Allowing non-Markov strategies simply lets us use ones that can be concisely described and analyzed.
Remark 1.3.One could consider alternate ways of slowing down the controlled RW, in place of standing still.For example, one could allow a choice between a standard SSRW step and a delayed SSRW step, the latter meaning we stand still with probability 1 − 1 mn and take a standard SSRW step of ±e i with probability 1/m n , with no limit on how many consecutive times we choose the delayed SSRW step.A brief examination of the proofs shows that both theorem statements above remain valid in this case.
Remark 1.4.The continuous analog of the problem we consider is a diffusion ξ t in which the drift is always 0 and one can control the diffusivity σ(x, t), but constrained to an interval [σ 1 , σ 2 ], on a time interval [0, T ], where σ 1 > 0. We take [σ 1 , σ 2 ] = [ǫ T , 1] and ask, how slowly can we have ǫ T → 0 as T → ∞ and still have P (|ξ T | < 1) → 1 or lim inf T P (|ξ T | < 1) > 0? Note that ǫ T is the analog of 1/m n .McNamara [3] considered closely related questions for a one-dimensional diffusion; see also [2] for another variant.He proved that there are constants while in the other direction, for each h > 0 there exists ∆ > 0 such that for all δ ∈ (0, ∆).
Here P x denotes probability (maximized over the allowed controls) for a process started at x. Numerical evidence was given that β is approximately proportional to σ 1 /σ 2 .If we assume this to be true and consider ǫ T of order 1/ log T , we get β of order 1/ log T as well.
If we could take h also of this order, then the right side of (1.3) would be bounded away from 0 in T , as desired.The problem is that one cannot get a useful result from (1.3) if one takes h depending on T , since then δ must also depend on T .Nonetheless we may observe that ǫ T ≤ C/ log T corresponds to m n ≥ C log n in our discrete problem.Further, still assuming β proportional to σ 1 /σ 2 , we see that since we want the left side of (1.2) bounded away from 0, we cannot allow ǫ T ≫ 1/ log T , suggesting that we perhaps cannot do better than requiring m n ≥ C log n in Theorem 1.1.

Proof of Theorem 1.1
Throughout the paper we will make use of various quantities which approach infinity as n → ∞.For λ > 0 let u n (λ) = max{k : m 1+kλ n ≤ n}; note that u n (λ) is of order log n/ log m n for all λ > 0. Also, the hypothesis that m n ≫ (log n) 2+γ for some γ > 0 is equivalent to for some η ∈ (0, 1).
Fixing such an η and writing u n for u n (η), we observe that (2.1) is equivalent to hence also to ≥ n for all large n, for every θ > 0, and therefore finally to (2.2) Fix n, write m for m n .and define We define a sequence of windows {t k } × I k , 1 ≤ k ≤ u n , in space-time, with size decreasing as k increases and the target (n, 0) is approached; we then construct a strategy which makes the space-time trajectory of the process pass through all of these windows, with high probability.Specifically, let , and I un+1 = {0}.We want to find a strategy, and choice of ǫ m , for which we can show From (2.2) and ( 2.3) it follows that (2.4) as n → ∞, proving the theorem.
To fully specify, and then bound, these probabilities we need to designate a strategy; we do so by describing how the strategy works between t k−1 and t k , for general k.For k ≤ u n , we begin by taking all SSRW steps (i.e.no standing still) from time t k−1 until time τ 0 ∧ t k .If τ 0 > t k , we deem the strategy to have failed and we continue in an arbitrary manner, say all SSRW steps.If τ 0 ≤ t k , we continue from time τ 0 to t k by always standing still for the maximum allowed period of time m during the interval (τ 0 , t k ], that is, we take an SSRW step every mth time step.The last standing period is truncated if it would otherwise go beyond time t k . For k = u n + 1, our strategy during (t un , t un+1 ] = (n − m, n] is to maximize steps until the the time τ 0 = τ (un+1) 0 when the process first hits 0 (if τ 0 ≤ n), then stand still until time n.
We now bound the first term on the right side of (2.5).From the Reflection Principle we have (for ℓ, h mk of opposite even-odd parity): (2.6) P (τ , and hence for every ǫ > 0, The left side of (2.6) is a nondecreasing function of |x|, so for all x ∈ I k , by (2.7) Turning to the second term on the right side of (2.5), it is 0 for k = u n +1 so we consider k ≤ u n .We can condition also on τ 0 as follows: for t ∈ (t k−1 , t k ], using Hoeffding's Inequality [1], For k ≥ 1 we have N k+1 /N k ≥ m −η (with equality for k ≥ 3), so (2.9) says that (2.10) Since x ∈ I k−1 is arbitrary, combining (2.5), (2.8) and (2.10) yields that for all 1 ≤ k ≤ u n + 1, (2.11) 3) and thereby proving Theorem 1.1.

Proof of Theorem 1.2
We keep the same definition of u n (λ) and note that now our hypothesis on m n is equivalent to the statement that {u n } is bounded, say u n ≤ u < ∞ for all n.We keep the same formula for t k but with η replaced by κ, determined as follows.Choose We then write u n for u n (κ).For our windows, in place of the interval I k we have the square To distinguish dimensions clearly, we now write {Y i } for d-dimensional SSRW, d = 1, 2. We use the same strategy as in one dimension: in each interval (t k−1 , t k ], take an SSRW step every time step until time τ  to t k .In place of (2.3), we will need that for some C > 0, (3.2) Let L [a,b] be the number of visits to 0 by the SSRW {Y i } during the time interval [a, b].In comparison to (2.8), we have for all x ∈ Q k−1 : Here C 1 depends on θ.The denominator in (3.3) is bounded above by Therefore we have the analog of (2.8): In comparison to (2.9), we have for x ∈ Q k and t ∈ (t k−1 , t k ], using again Hoeffding's Inequality [1]:

Acknowledgements
The author would like to thank Ananda Weerasinghe for helpful conversations.
an SSRW step every mth time step from time τ
. (Here and throughout the paper, C and C 1 , C 2 , ...are generic constants, and a n ∼ b n means the ratio converges to 1.) A simple strategy to increase this is to minimize steps, i.e. always stand still as long as allowed, taking only n/m n steps, yieldingP (W n = 0) ≍ (m n /n) d/2, where we use a n ≍ b n to mean the ratio is bounded away from 0 and ∞.A slightly more sophisticated strategy is to minimize steps until time n − m n then maximize steps (i.e.never stand still) until {W i } hits 0 (if it does), and then stand still until time n.For d = 1, conditionally on {|W n−mn | ≤ m