Sharp maximal L p-bounds for continuous martingales and their differential subordinates

Suppose that X, Y are Hilbert-space-valued continuous-path martingales such that Y is differentially subordinate to X. The paper contains the proof of sharp estimates between p-th moments of Y and the maximal function of X for 0 < p < 1. The proof rests on Burkholder’s method and exploits a certain special function of three variables, enjoying appropriate size and concavity requirements. The analysis reveals an unexpected phase transition between the cases 0 < p < 1/2 and 1/2 ≤ p < 1. The latter case is relatively simple: the special function is essentially quadratic and the best constant is equal to √ 2/p. The analysis of the former case is much more intricate and involves the study of a non-linear ordinary differential equation.


Introduction
Suppose that (Ω, F, P) is a complete probability space, filtered by a nondecreasing right-continuous family (F t ) t≥0 of sub-σ-fields of F. We assume in addition that F 0 contains all the events of probability 0. Suppose further that X, Y are two adapted martingales taking values in a given separable Hilbert space H with the norm | · | and scalar product denoted by ·, · . With no loss of generality, we may and do assume that the space is equal to 2 . As usual, we impose standard regularity restrictions on trajectories of X and Y : we assume that the paths are right-continuous and have limits from the left. Then [X, X], the quadratic covariance process of X, is given by [X, X] = Throughout, we will work under the assumption that the martingale Y is differentially subordinate to X. This concept appeared for the first time in Burkholder's paper [8] and concerned discrete-time processes. Recall that a martingale g = (g n ) n≥0 is differentially subordinate to f = (f n ) n≥0 , if for any n ≥ 0 we have |dg n | ≤ |df n | almost surely. Here df = (df n ) n≥0 , dg = (dg n ) n≥0 are the difference sequences of f and g, given by f n = n k=0 df k and g n = n k=0 dg k , n = 0, 1, 2, . . . .
This domination was generalized to the continuous-time context by Bañuelos and Wang [7] and Wang [24]: the martingale Y is differentially subordinate to X, if the process t≥0 is nondecreasing and nonnegative as a function of t. Treating two given discrete-time martingales f , g as continuous-time processes (via X t = f t and Y t = g t , t ≥ 0), we see this domination is consistent with the original definition of Burkholder. The following example will be useful to us later: suppose that X is an H-valued martingale, H is a predictable process taking values in the interval [−1, 1] and let Y be the stochastic integral of H with respect to X, i.e., Y t = H 0 X 0 + t 0+ H s dX s , t ≥ 0. Then Y is differentially subordinate to X, since Another example, which is very important for applications (see e.g. [5], [7], [14]), is the following. Suppose that B is a Brownian motion in R ν and H, K are predictable processes taking values in the matrices of dimensions m × ν and n × ν, respectively. For any t ≥ 0, define If the Hilbert-Schmidt norms of H and K satisfy ||K t || HS ≤ ||H t || HS for all t > 0, then Y is differentially subordinate to X: this follows from the identity ||H s || 2 HS − ||K s || 2 HS ds.
The differential subordination implies many interesting inequalities comparing the sizes of X and Y . The literature on this subject is very extensive, so we refer the interested reader to the survey articles [9], [21] and the monograph [19] for the detailed description and further references. In addition, these estimates have found numerous profound applications in harmonic analysis, e.g. in the study of boundedness of wide classes of Fourier multipliers [4,5,6,14], semigroup theory [11,13] and quasiconformal mappings [1,2,3], to name just a few. As the motivation for our research, we recall a celebrated result of Burkholder which gives the following information on the L p -norms (see [8], [9] and [24]). Theorem 1.1. Suppose that X, Y are Hilbert-space-valued martingales such that Y is differentially subordinate to X. Then for any 1 < p < ∞ we have Besides the beauty of this result, its proof is of independent interest and connections. Burkholder showed how to deduce the validity of (1.1) from the existence of a certain special function, possessing appropriate size and concavity properties. Since the appearance of the seminal paper [8], this type of approach has turned out to be very efficient in the study of related semimartingale inequalities (for much more on the subject, see [19]).
In the boundary case p = 1, the above moment inequality breaks down, but, as a substitution, there are certain weak-type and logarithmic estimates; see [8], [21] and [22]. There is also a corresponding maximal L 1 bound, which will be important for our considerations below. Namely, in [10] Burkholder proved the following result.

Theorem 1.2.
Suppose that X is a real-valued martingale and Y is the stochastic integral, with respect to X, of some predictable real-valued process H taking values in [−1, 1]. Then we have the sharp estimate where γ = 2.536 . . . is the unique positive number satisfying γ = 3 − exp 1−γ 2 .
This result was strengthened by Osękowski [17] to the case in which the first moment of Y is replaced by the first moment of its maximal function.

Theorem 1.3.
Under the assumptions of the above theorem, we have the sharp inequality 3) The precise description of the above constant involves the analysis of a complicated system of ODE's. If the dominating martingale has continuous paths, then the best constants in (1.2) and (1.3) decrease to √ 2 and 2, respectively (cf. [18,20]). The following result can be found in [18].
By Burkholder-Davis-Gundy inequalities, the estimate (1.4) holds also, with some finite constant c p , in the range 0 < p < 1. The purpose of this paper is to identify the corresponding optimal constant. Our main result can be stated as follows.
Theorem 1.5. Suppose that X, Y are H-valued continuous-path martingales such that Y is differentially subordinate to X. Then for any 0 < p < 1 we have the estimate where the optimal constant is equal to Here y 0 (p) is the unique positive zero of the solution to a certain ODE: see Lemma 3.1 for the precise definition.
So, the behavior of the best constant C p in (1.5) is different for p < 1/2 and p ≥ 1/2. As we will see, the proof in the latter case ('big p') is much simpler: it can actually be extracted from the work [18]. Very interestingly (and quite unexpectedly, at least to us), there is a phase transition at p = 1/2. The analysis in the case p < 1/2 is much more elaborate and the construction of the associated special functions (which are completely different from those for p ≥ 1/2) will rest on solutions of the non-linear ordinary differential equation (3.4).
· (2M (N + 1)) p 1/p and the right-hand side converges to 1 as M → ∞ (the parameter N is fixed). Consider the martingale g n = ξ 0 + ξ 1 + ξ 2 + . . . + ξ n , n = 0, 1, 2, . . . , N . We have |dg n | = |df n | for each n, so g is differentially subordinate to f . Furthermore, g satisfies g N = N + 1 on the set A, so g N p ≥ (N + 1)(P(A)) 1/p = (N + 1)(M/(M + 1)) (N +1)/p . The latter expression converges to N + 1 as M → ∞, so we see that taking M sufficiently large, we may make the ratio g N p / f * N p bigger than N . Since N was arbitrary, the estimate (1.5) cannot hold with any finite constant. We have organized this paper as follows. The next section contains the description of Burkholder's method in the abstract setting: we show how to exploit certain special appropriately smooth functions to establish maximal estimates for continuous-time processes. The contents of that section extends significantly the material in [18] (see below). Section 3 contains the proof of the inequality (1.5): we construct and study the corresponding special functions there. In Section 4 we show that the constant C p cannot be improved, which is accomplished by exhibiting appropriate extremal examples. In the final part of the paper we describe in detail the reasoning which has led us to the discovery of the special function in the difficult case 0 < p < 1/2.

Burkholder's method
The original technique developed by Burkholder in [10] concerned the discrete-time martingales (the passage to the context of stochastic integrals, as in Theorem 1.2, follows by approximation). The successful treatment of continuous-path, differentially subordinate and vector-valued martingales required the enhancement of the method, which was obtained by Osękowski [17] under strong regularity assumptions on the special function. In this section we relax this requirement and show how to deduce maximal estimates from the existence of a function satisfying a relatively mild regularity condition (class C 1 ). Our reasoning rests on mollification arguments, which can be tracked back to the works of Bañuelos and Wang [7] and Wang [24].
Consider the domain D = {(x, y, z) ∈ H × H × (0, ∞) : |x| ≤ z} and fix a Borel function V : D → R, which is bounded on bounded sets. Let us assume that we want to establish the estimate (2.2) 4 • For any j = 1, 2, . . . , n, there is a function c j : D j → (0, ∞), bounded on any set of the form {(x, y, z) ∈ D j : |x|, |y| ≤ L, 1/L ≤ z ≤ L} for some L > 0, such that holds for any (x, y, z) ∈ D j and all h, k ∈ H.
The interplay between the existence of such a function U and the validity of (2.1) is described in the statement below. In what follows, X 1 0 denotes the first coordinate of X 0 . Theorem 2.1. Suppose that U satisfies the conditions 1 • −4 • . Assume further that X, Y are bounded, H-valued, and continuous-path martingales such that Y is differentially subordinate to X and P(|X 1 0 | ≥ η) = 1 for some η > 0. Then (2.1) holds true.
Proof. It is convenient to split the reasoning into a few separate parts.
Step 1. Reductions. Let t ≥ 0 be fixed. The random variable V (X t , Y t , X * t ) is integrable, since V is locally bounded and X, Y ∈ L ∞ . Furthermore, note that For 0 ≤ s ≤ t, consider the projected processes and let : |x| ≤ z} and let U : D d → R be the restriction of the function U , given by the formula U (x, y, z) = U ((x, 0, 0, . . .), (y, 0, 0, . . .), z).
Step 3. Proof of (2.1). We apply Itô's formula to the process (U δ (Z The random variable I 1 has zero expectation, since both stochastic integrals are L 2bounded martingales: this follows from the fact that X, Y , and hence also X (d) and Y (d) , are bounded processes. The term I 2 is nonpositive by (2.8). The last term I 3 is also nonpositive, which follows from (2.9) and the approximation of the integrals by Riemann sums. Namely, let 0 ≤ s 0 < s 1 ≤ t. For any j ≥ 0, let (η j i ) 0≤i≤ij be a sequence of nondecreasing finite stopping times with η j 0 = s 0 , η j ij = s 1 such that lim j→∞ max 0≤i≤ij −1 |η j i+1 − η j i | = 0. Keeping j fixed, we apply, for each i = 0, 1, 2, . . . , i j , the inequality (2.9) to x = X (d) . Summing the obtained i j + 1 inequalities and letting j → ∞ yields Here C is given by (2.4) and we have used the notation [S, If we approximate I 3 by discrete sums, we see that the inequality above leads to where the last passage is due to the differential subordination. Now we take the expectation of the both sides of (2.10) and use (2.6) to obtain by Lebesgue's dominated convergence theorem and 1 • . Since ε was chosen arbitrarily, the estimate (2.5) follows.

Reductions
Fix t ≥ 0. We start with some initial observations. First, note that in the proof of (1.5) we may assume that X * t L p < ∞, since otherwise the claim is evident. Second, it is enough to establish the estimate under the additional assumption that X and Y are bounded and P(|X 1 0 | ≥ η) = 1 for some η > 0. Indeed, suppose that we have established the above inequality. To deduce the claim in the general case, suppose that X, Y are arbitrary H-valued, continuous-path martingales such that Y is differentially subordinate to X. Adding one dimension to H if necessary, we may assume that the first coordinates of X and Y vanish: Y 1 t = X 1 t = 0 almost surely for all t. Given a positive integer N , consider the stopping time τ N = inf{t : |X t | + |Y t | ≥ N } and fix an auxiliary parameter η > 0. Then the martingales t≥0 are bounded and satisfy the differential subordination, so (3.1) gives for any t ≥ 0. Therefore, letting η → 0, N → ∞ and using Fatou's lemma, we get The final observation is that the inequality (3.1) is of the form (2.1), with Therefore, by the technique described in the previous section, all we need is the appropriate special function U .

Proof in the case 1/2 ≤ p < 1
In this range of p, we use the function constructed in [18] and given by the formula It was proved in [18] that this object satisfies 1 • , 2 • and 4 • , and a certain slightly weaker form of 3 • . Thus, to complete the proof, we need to check (2.2), the full version of 3 • . But this is easy: we have Therefore (3.1), and hence also (1.5), follow. In the case p < 1/2 the calculations will be much more involved.

The case 0 < p < 1/2
Throughout this subsection, p is a fixed exponent from the interval (0, 1/2). To define the special function, we will need some additional objects. The central role in our further considerations is played by the non-linear ordinary differential equation is of class C 1 and so locally Lipschitz, which yields the existence and uniqueness of the solution. We extend this solution to its maximal domain [0, M ). It is not difficult to prove that M < ∞, but we will not need this; furthermore, replacing y with −y in (3.4), it is easy to see that g can be extended 'backwards', i.e., to some interval of the form (m, M ) with m < 0 (and, in particular, there is no problem with the existence of derivatives of g at zero). The following technical fact will be used later on.
To show that y 0 ≥ 1 − 2p, note first that g(y) ≤ g(0) = p/(p − 1), as we have just proved above. Therefore, we have (p + y)g(y) − p ≥ (y + 1)g(y) and hence, by (3.4), we The latter estimate is equivalent to saying that ln −g(y) Plugging y = y 0 , we obtain y 0 ≥ − ln(2p) and it remains to note that − ln u ≥ 1 − u for all u > 0.
Finally, to show (3.5), it is enough to consider the case y < 1 − p only; indeed, for y ≥ 1 − p both terms on the left of (3.5) are nonnegative. We rewrite (3.6) in the form −g(y)(1 − pe y ) ≤ pe y , which, because of the elementary estimate 1 − pe y ≥ 1 − pe 1−p > 0, is equivalent to g(y) ≥ −pe y /(1 − pe y ). Consequently, The function g gives rise to another special function: Actually, since g extends smoothly to some neighborhood of [0, y 0 ], so does A (and hence we can speak of its derivatives at 0 and y 0 ).

Lemma 3.2.
The function A enjoys the following properties.
Proof. (i) Obviously, we have A < 0 on [0, y 0 ]: dividing both sides of the desired differential equation by A and noting that A (y)/A(y) = g(y), A (y)/A(y) = g (y) + g 2 (y), we see that the claim is equivalent to (3.4).
(ii) Differentiating both sides of the equation in (i), we obtain Again by (i), the expression on the right is nonnegative if and only if (p + y − 1)[(p + y)A (y) − pA(y)] + (1 − p)(1 + y)A (y) ≥ 0. We divide both sides by A < 0 and, after some straightforward manipulations, the estimate A ≥ 0 becomes there is nothing to prove (the left-hand side is positive). For 1 − p − y > 0, the above estimate follows by multiplying the inequalities g(y) ≤ g(0) = p/(p − 1) and It remains to prove that A (y) ≥ A (y). Multiplying both sides by y + 1 and exploiting (3.7), we check that the estimate becomes (2 − p)A (y) ≥ (1 − p)A (y). Multiplying both sides by y + 1 again, and applying part (i), we obtain the equivalent form However, by (3.5), we know that (p + y − 1)g(y) ≤ p; furthermore, we have (2p − p 2 )g(y) < 0 < p(1 − p). Adding the latter two estimates we obtain (3.8).
Let D 1 , D 2 , D 3 be three subsets of the strip [−1, 1] × R, given by The function u is quite regular, as we will prove below. Proof. Let us first handle the continuity; by symmetry, it is enough to check it on the set u(x , y ).  The special function U : D → R, which will lead us to the estimate (1.5) for p ∈ (0, 1/2), is given by U (x, y, z) = z p u(|x|/z, |y|/z).
Let us check that this object enjoys all the necessary requirements. Proof of 4 • . The inequality is evident if (|x|/z, |y|/z) belongs to the interior of D 1 : then and the coefficient in front of |k| 2 − |h| 2 on the right is positive. The inequality (2.3) is also not difficult if (|x|/z, |y|/z) lies in the interior of D 3 . For such (x, y, z), we have where η does not depend on x or y. By Lemma 3.1, we have 2p + y 0 − 1 ≥ 0 and hence (x, y) → A(y 0 )(2p + y 0 − 1)z p−1 |y| is a concave function in x and y. Consequently, which is the desired bound, since − 2p+y0 y0+1 A(y 0 )z p−2 ≥ 0. The main technical difficulty lies in the verification of (2.3) for (|x|/z, |y|/z) lying in the interior of D 2 . A bit lengthy, but rather straightforward computations allow to obtain and u = |x|/z + |y|/z − 1. By Lemma 3.2 (ii), the term I 1 is nonpositive. Similarly, the third part of that lemma combined with y , k 2 ≤ |k| 2 yields I 2 ≤ 0. So, U xx (x, y, z)h, h + 2 U xy (x, y, z)h, k + U yy (x, y, z)k, k ≤ z p−2 A (u)(|k| 2 − |h| 2 ), and by Lemma 3.2 (ii) again, the factor in front of |k| 2 − |h| 2 is positive.

2(y0+1)
A(y 0 ) < 0) and it vanishes for y = y 0 , as we have just checked above. Thus, it is enough to prove that its derivative at y = y 0 is nonpositive. This is equivalent to (p − 1)u y (1, y 0 ) − y 0 u yy (1, y 0 ) ≤ 0, or Since both terms on the left are nonpositive, the bound holds and hence (2.2) follows.
Proof of 2 • . We start with the observation that for any y ∈ H and any z > 0, the function x → U (x, y, z) is concave on the set {x : |x| ≤ z}: this follows from 4 • . Furthermore, the analogous function x → V (x, y, z) is constant. Therefore, it is enough to establish the majorization 2 • under the additional assumption |x| = z; in the language of u, this is equivalent to saying that u(1, y) ≥ y p − C p p for all y ≥ 0. (3.10) The function y → u(1, y) is of class C 1 . Furthermore, it is convex: if y < y 0 , its second derivative equals A (y) − A (y), which is nonnegative, by Lemma 3.2 (ii); if y > y 0 , the function is quadratic, with a positive coefficient in front of y 2 . On the other hand, the right-hand side is a concave function of y. Thus, the majorization follows at once from the observation that both sides match, along with the derivatives, at y = y 0 + 2: Proof of 1 • . This is easy. The function ξ : [−1, 1] → R given by ξ(t) = U (tx, ty, z) is even and concave (by 4 • ). Consequently, we have which is the desired estimate.

Sharpness
There are two general methods which are typically used to prove that the constant involved in a given martingale inequality is the best possible. The first type of approach rests on the construction of appropriate extremal processes, for which the equality is attained, or asymptotically attained (i.e., in the limit). This natural method has been applied in many cases: see e.g. [8,10,22]. The second method is more abstract and indirect in nature: one assumes that an estimate under investigation holds with some constant and then proves the existence of an associated special function, enjoying appropriate size and regularity conditions. Then, using these properties in the right order, one shows that the constant in question must be bounded below by an appropriate quantity. This type of reasoning originates in Burkholder's paper [9] and has been successfully exploited in various contexts: see e.g. [9,18,19,23].
We have decided to follow the first path and show the optimality of the constant C p explicitly, with the use of examples. This approach has the additional advantage that some elements of the proof will be of key importance in the next section, where we describe some steps leading to the formula for the special function U for 0 < p < 1/2. We consider the cases 1/2 ≤ p < 1 and 0 < p < 1/2 separately. The first case is simpler and hence its analysis can be treated as a 'basis' for the more complicated reasoning for p < 1/2.

The case 1/2 ≤ p < 1
Let us first describe the behavior of the extremal pair (X, Y ), or rather the extremal triple (X, Y, X * ). The conditions appearing below are informal and they will be rigorously specified later. We assume that η < 1 is a fixed parameter, and consider the process (X, Y, X * ) whose evolution is governed by the following requirements.
To gain some intuition about the evolution, see Figure 1 below. Actually, we have formulated the above conditions as if the triple (X, Y, X * ) was a discrete-time process.
To avoid confusion, let us explain that the word "leads" means that the triple moves, in a continuous manner, along the line segments joining the appropriate states. The formal description of the triple (X, Y, X * ) is the following. The process Y will be the appropriately stopped one-dimensional Brownian motion started at zero and X will be a stochastic integral of Y : Thus, all we need is the specification of the process H. To this end, we will construct inductively a sequence (σ n ) n≥0 of stopping times. Set σ 0 ≡ 0 and let for n = 0, 1, 2, . . .. Here and in what follows, we will use the (slightly unusual) convention sgn 0 = 1. The predictable 'control process' H is given by It is easy to see that then the process (X, Y, X * ) evolves according to the rules (i)-(v) described above.
Let us gather some basic information about the triple (X, Y, X * ), which follows directly from the above construction. First, note that Y is differentially subordinate to X: indeed, we have |Y 0 | = 0 ≤ 1 = |X 0 | and the transforming process H takes values in the set {−1, 1}. Furthermore, on the set {s ≥ 0 : X * s increases}, we have |X s | = X * s and Y s = 0. Finally, we have |Y t | ≤ 2ηX * t almost surely for all t, and the process (X, Y, X * ) terminates ultimately when the equality |Y t | = 2ηX * t is experienced. The next step of the analysis is to show that if 2η < 2/p, then X * ∞ ∈ L p . The key observation is that the process (U (X t , Y t , X * t )) t≥0 is a local martingale. To see this, we will apply Itô's formula and check that the finite-variation part vanishes. First, for any This follows from the identity U z (±x, 0, |x|) = 0 for all x > 0 and the fact that on the set where X * s increases, we have |X s | = X * s and Y s = 0 (which follows directly from the above construction). Next, since H takes values in {−1, 1}, we have This establishes the local martingale property of (U (X t , Y t , X * t )) t≥0 . Consider the sequence τ 1 ≤ τ 2 ≤ . . . of stopping times given by τ n = inf{t > 0 : X * t = n}. Then we obviously have |X τn | = X * τn (since this is either equal to n, or, if the triple (X, Y, X * ) terminated earlier, we have |X τn | = X * τn = |Y τn |/(2η)). Furthermore, the sequence (τ n ) n≥1 is localizing for (U (X t , Y t , X * t )) t≥0 , since for each n the process (X * τn∧t ) t≥0 , and hence also (X τn∧t ) t≥0 and (Y τn∧t ) t≥0 , are bounded. Consequently, we have Plugging the formula for U and using the facts that |Y τn | ≤ 2ηX * τn and |X τn | = X * τn , we obtain Therefore, if 2η < 2/p, then we get E(X * τn ) p ≤ 2/p 2/p − 4η 2 and letting n → ∞ we get X * ∞ ∈ L p . Consequently, Y converges almost surely (which follows, for example, from Burkholder-Davis-Gundy inequalities). Denoting the limit by Y ∞ , we obtain P(|Y ∞ | = 2ηX * ∞ ) = 1 (by the construction of (X, Y, X * )) and hence, by Since 2η can be made arbitrarily close to 2/p, the sharpness is established.

The case p < 1/2
The above reasoning cannot be carried over to exponents smaller than 1/2. Indeed, the limiting value of the parameter η for which the above construction of the triple (X, Y, X * ) makes sense, is equal to 1 (for bigger η, the line segments 'stick out' of the picture). Recalling the key identity 2η = 2/p, we see that the endpoint value η = 1 leads to the boundary case p = 1/2. However, the above discussion gives some indications about the structure of the triple (X, Y, X * ) for the small values of the parameter p. Namely, let us emphasize that for p ≥ 1/2, we had the pointwise identity |Y ∞ | = 2ηX * ∞ , even stronger: |Y ∞ | = 2η|X ∞ | = 2ηX * ∞ , where η could be chosen arbitrarily close to C p . It seems quite natural to search, in the case p < 1/2, for a triple satisfying the same condition. A little thought and experimentation lead to the following candidate. Let Y be a standard one-dimensional Brownian motion started at zero and define Furthermore, define the termination time by σ = inf{t : |Y t | = 2η|X t | = 2ηX * t }. Here is some (very rough and informal) intuition about the behavior of (X, Y, X * ). The process starts at (1, 0, 1). For any time t, if |X t | + |Y t | < (2η − 1)X * t , then the pair (X, Y ) moves (locally) along a line segment of slope − sgn(X t Y t ) until one of the coordinates reaches zero and the slope switches its sign. On the other hand, if we have then the pair (X, Y ) is forced to move along the line segment of slope 1 until |Y | = 2η|X| = 2ηX * (then we stop) or |Y | = (2η − 2)|X| = (2η − 2)X * . If the latter happens, say, at some time τ , then the third coordinate increases instantly and the equality |X|+|Y | = (2η −1)X * is no longer valid: we actually have |X t |+|Y t | < (2η −1)X * t almost surely for t bigger, but infinitesimally close to τ . Hence, we can apply the preceding evolution rule. See Figure 2 below.
5 On the search of the special function, 0 < p < 1/2 Now we will describe some informal steps which have led us to the discovery of the optimal constant C p and the special function U : D → R used in the case 0 < p < 1/2. In principle, the problem of the identification of such objects can be investigated with the use of two approaches: analytically or with the use of probabilistic methods. Actually, in a given situation, it is often worth to exploit a combination of both techniques; this will also be the case here. In the considerations below, we will focus on the real-valued setting: we assume that H = R. The passage to the vector-valued case is standard: in the formula for the obtained special function, we interpret all the absolute values as norms in H.

Analytic contribution
Let V : D → R be a locally bounded function. Suppose that we are interested in showing the estimate for any pair (X, Y ) of differentially subordinate martingales. In particular, we want to show this estimate if Y is the stochastic integral, with respect to X, of some predictable process H with values in {−1, 1}. The fundamental property of the special function U to be found is that for any X, Y as above, the process U (X, Y, X * ) must be a supermartingale majorizing V (X, Y, X * ). If we also have U (X 0 , Y 0 , X * 0 ) ≤ 0, then we and the claim follows. Now suppose that U is of class C 2 . We may freely impose this requirement, since our goal is to discover a candidate for the special function; even if the final function we come up with does not enjoy this regularity, we may hope that it will still lead to the desired bound anyway (and indeed, this is the case here).
If U is of class C 2 and one applies Itô's formula, the required supermartingale property gives rise to two partial differential inequalities: and U xx (x, y, z) ± 2U xy (x, y, z) + U yy (x, y, z) ≤ 0 for (x, y, z) ∈ D. (5.4) Now, if U is the function leading to the sharp inequality, then there is a pair (X, Y ) for which equality is attained (possibly asymptotically, in the limit). For simplicity, let us assume that equality holds in (5.1) for some t > 0 and some nontrivial pair of processes.
This implies that equalities must also hold in (5.2), and hence the partial differential inequalities (5.3), (5.4) also become equalities (at least on the range of the extremal triple (X, Y, X * )): we have seen this, very transparently, in the previous section. As a priori we do not know how the extremal process (X, Y, X * ) will look like, a natural starting point is to assume that these equalities hold everywhere. Furthermore, the first inequality in (5.2) enforces us to impose the condition U ≥ V and the existence of an extremal pair implies that equality must hold on a nontrivial set.
Summarizing the discussion so far, we search for a C 2 function U : D → R which satisfies the following three assumptions: (A1) We have U z (x, y, z) = 0 if |x| = z and y ∈ R.
(A3) We have U ≥ V and equality holds for at least one point.
In our case, we have V (x, y, z) = |y| p − C p p z p for some a priori unknown constant C p . Note that V is homogeneous of order p, so it seems plausible that the same is true for U . Setting U (x, y, z) = z p u(x/z, y/z), we see that the problem reduces to the search of a symmetric function u : (a3) We have u(x, y) ≥ |y| p − C p p and equality holds for at least one point. Let us emphasize that the above set of properties has been obtained on a base of some more or less reasonable assumptions and guesses. One should also keep in mind that there might be no function u satisfying the above requirements (for instance, equalities in (a1), (a2) may hold only on some part of the domain; the function u may not be of class C 2 ; etc.). So, one should treat the above requirements flexibly, allowing some small modifications if necessary. We would also like to mention here that in general, there is no uniqueness; so, there might be several (actually, infinitely many) special functions leading to the estimate (5.1). In some situations, one might be interested in the best (i.e., pointwise smallest) special function, but in practice one often searches for the simplest function, having as non-complex formula as possible. Then one can expect the analysis to avoid many tedious technical issues. This will also be the case here.
To proceed, we need to have more information about the structure of the signs in (a2) and the set of those (x, y) for which equality in (a3) is attained. It is convenient to incorporate probabilistic arguments.

Probabilistic contribution
There is an abstract formula for special function U corresponding to (5.1). For any x, y ∈ R, consider the class M (x, y) which consists of all pairs (X, Y ) of continuous-time bounded martingales such that X 0 = x and for some predictable process H with values in {−1, 1}. Here the probability space and the filtration are also assumed to vary. Define U 0 : D → R by the formula U 0 (x, y, z) = sup EV (X ∞ , Y ∞ , X * ∞ ∨ z), (5.6) where the supremum is taken over all (X, Y ) ∈ M (x, y). Here X ∞ , Y ∞ are the almost sure limits of X and Y as t → ∞, which exist due to the boundedness assumption. By a standard change-of-time argument, one may assume that X is a stopped one-dimensional Brownian motion, started at x and run at some fixed speed a (that is, X t = x + B at for t ≥ 0, where B is a standard Brownian motion). The function U 0 has all the required properties. Indeed, we have U 0 ≥ V (simply consider the constant pair (X, Y ) ≡ (x, y) in the definition) and, using standard Markovian arguments, one checks that U 0 (X, Y, X * ) is a supermartingale if Y is a stochastic integral of X. The reason why we have decided to denote this object by U 0 (that is, with the additional superscript) is that this is actually the best (i.e., pointwise smallest) special function. Of course, there is no guarantee that U 0 is of class C 2 , or even continuous; however, we do not worry about any regularity issues. Again, we stress that the main purpose of this section is to provide some ideas which led us to the function of Section 3 (which was rigorously studied there).
In our case, when V (x, y, z) = |y| p − C p p z p , the formula (5.6) becomes where the supremum is taken over appropriate processes as above. Now it can be proved formally that the special function is homogeneous of order p (which was assumed in the analytic setting): it follows from the simple scaling property that (X, Y ) ∈ M (x, y) if and only if (λX, λY ) ∈ M (λx, λy) for any λ > 0. Next, the following ideas turn out to be helpful. By (5.7), given (x, y, z), we want to maximize the expectation E |Y ∞ | p − C p p (X * ∞ ∨ z) p ; naively speaking, we would like to try to maximize E|Y ∞ | p , keeping E(X * ∞ ∨ z) p relatively small. As we mentioned above, we may assume that X is a stopped Brownian motion started at x and hence all we need is the identification of the 'control' process H in (5.5). There is a perfect analogy between this problem and the problem of finding the sign function E in the above analytic context. Namely, both these objects (H and E) determine the directions of line segments along which the extremal pairs (X, Y ) evolve.
First we will try to identify 'the shape' of the termination time for X. We look at the quantity E(X * ∞ ∨ z) p , which we want to keep relatively small. Obviously, the function t → (X * t ∨ z) p is nondecreasing and if |X t | is strictly less than X * t , then (X * t ∨ z) does not increase for some time. On the other hand, let us inspect the behavior of the function t → E|Y t | p , which we want to make as large as possible. If Y t is away from zero, then this expectation decreases (which is not 'profitable' from our viewpoint): this follows from the local concavity of the function y → |y| p . On the other hand, when Y visits zero, then the expectation will experience a huge increase in a short time, because of the cusp of the function y → |y| p . The above discussion suggests the following behavior of the extremal pair (X, Y ): we should try to keep Y near zero, and stop when it gets far from the origin. What does 'far' mean? In the light of the homogeneity, it seems natural to conjecture that we should stop if |Y | ≥ ηX * for some unknown parameter η. Motivated by the above analysis of t → EX * t , it seems plausible to guess that at the moment τ of termination, we should have |X τ | = X * τ (otherwise, it might be profitable to wait for a while. . . ). Can we say anything about the threshold η? Take the extremal pair (X, Y ) of differentially subordinate martingales (for which equality is attained). If our intuition is correct, then we have |Y ∞ | ≥ η|X ∞ | = ηX * ∞ almost surely and hence η X * ∞ L p ≤ Y ∞ L p ≤ C p X * ∞ L p .
This leads to the natural guess η = C p .
Finally, we address the structure of the 'control' process H. We want to keep Y close to zero but, on the other hand, we want it to reach {−C p X * , C p X * } after some finite time. It is not difficult to guess what H should be for some particular points (formally: how to define H if the triple (X, Y, X * ) lies in some particular set). For example, suppose that at some time t, (X t , Y t ) got to the point lying on line segment joining (0, (C p − 1)z) and (z, C p z), where z = X * t . Then the pair should evolve along this line segment until it gets to one of its endpoints: if it reaches (z, C p z) then the whole process terminates ultimately, if it gets to the other endpoint, then the evolution is continued. A similar analysis can be carried over for the symmetric line segments, i.e., those joining (0, (C p −1)z) and (−z, C p z); (0, −(C p −1)z) and (z, −C p z); (0, −(C p −1)z) and (−z, −C p z). In the language of H, this means that if (X, Y ) visits one of these segments, then we should take H = sgn(XY ) (until we leave the segment); in the language of the sign function ε from the analytic setting, this means ε(x, y) = sgn(xy) for (x, y) belonging to the union of the segments (with z = 1).
Actually, it is quite tempting to take H = sgn(XY ) everywhere, but this is not the right choice. Indeed, if we took (X 0 , Y 0 , X * 0 ) = (1, 1, 1), then this identity for H would simply give X = Y . This cannot be the extremal pair for (5.1), since it would imply that the best constant C p is equal to 1 (and, in the light of the case p ≥ 1/2, one expects it to be bigger). So, let us try to switch the sign of the process H to − sgn(XY ): this leads to the conjecture that for x, y lying between the line segments, we have ε(x, y) = − sgn(xy).
This turns out to be the right guess, as it leads to the appropriate special function on the crucial part of the domain (see Figure 3 below).