$A_1$ Fefferman-Stein inequality for maximal functions of martingales in uniformly smooth spaces

Let $f$ be a martingale with values in a uniformly $p$-smooth Banach space and $w$ any positive weight. We show that $\mathbb{E} (f^* \cdot w) \lesssim \mathbb{E}(S_p f \cdot w^*)$, where $\cdot^*$ is the martingale maximal operator and $S_p$ is the $\ell^p$ sum of martingale increments.

The most basic examples are that, for any r ∈ (1, 2], any L r space is (r, 1)-smooth, see [Pis16,(10.33)] (this is also a consequence of Clarkson's inequality), and, for any r ∈ [2, ∞), any L r space is (2, r − 1)-smooth, this follows from [Pis16,(10.37)] and Jensen's inequality. In general, unless X is zero-dimensional, we must have C sm ≥ 1, as can be seen by taking x = 0 in (1.1). Our main result is the following.
In order to put Theorem 1.1 into context, we list the previously known cases (in each of which the inequality (1.2) is in fact known with a smaller constant).
(2) The scalar (X = R) case, which served as the main inspiration for this work, was proved in [Osę17a]. (3) The unweighted (w = 1) case is one of the implications in the characterization of martingale type, see [Pis16,Theorem 10.60]. We follow [Osę17a] in calling the inequality (1.2) a Fefferman-Stein inequality, in reference to [FS71,§3], where the first inequality involving the pair of weights w, w * appeared (see [HvNVW16,Theorem 3.2.3] for a martingale version). In order to distinguish this result from many others due to Fefferman and Stein, we prepend the designation "A 1 ", which in the one-weight theory stands for the condition w * ≤ [w] A 1 w. The pair w, w * can be seen as satisfying a two-weight version of the A 1 condition.
For dyadic martingales, assuming w ∈ A ∞ , an inequality similar to (1.2) with w * replaced by w is known [GW74,Theorem 2]. The recent result [BO21, Theorem 1.3] (which applied to martingale transforms in place of the square function) suggests that no such inequality is possible for general martingales.
The advantage of weighted estimates such as (1.2) is that they can be easily extrapolated to estimates for other moments, see Appendix A. We illustrate the extrapolation idea with a basic argument, which shows that the linear dependence on C sm in (1.2) is optimal. Assume that the inequality holds for all weights w. By Hölder's inequality and Doob's maximal inequality, see e.g. [HvNVW16, Theorem 3.2.2], for any r ∈ (1, ∞), we obtain Since L r is the dual space of L r , this implies Incidentally, the linear growth in r of the constant in the inequality (1.3) is optimal in the scalar case X = R, p = 2, see [Bur73, Theorem 3.2].
Let now X be a Banach space such that the inequality (1.3) holds with p = r ∈ (1, 2] for all martingales f with values in X. By Pisier's renorming theorem [Pis16,Theorem 10.22], the space X admits an equivalent norm that is (p, CK)-smooth for some C depending only on p. In this sense, the linear dependence of (1.2) on C sm is optimal.
The dependence of the bound (1.2) on p does not seem natural, since it does not appear in the corresponding non-maximal bound (3.1). Also, the p = 1 bound clearly holds with constant 1. Therefore, we find it reasonable to conjecture that 84p in (1.2) can be replaced by a constant that does not depend on p.
1.1. Non-martingale version. The proof of Theorem 1.1 in fact yields a more general statement, involving processes with a structure that was introduced in [vNV20, Theorem 3.1]. Let (Ω, (F n ) n∈N ) be a filtered probability space and (g n ) n∈N , (f n ) n∈N , (f n ) n∈N be adapted processes with values in a (p, C sm )-smooth Banach space X. Assume that f 0 =f 0 = 0, and for every n ∈ N >0 we have f n =f n−1 + (g n − g n−1 ), |f n | ≤ |f n |. Then, As in (1.3), for r ∈ [1, ∞), this implies The Rosenthal-type inequality in [vNV20, Theorem 3.1] states that, if X is a (2, C sm )space, then where sg is the conditional square function: For r ≥ 2, (1.6) implies (1.5), since sg L r ≤ (r/2) 1/2 S 2 g L r , r ∈ [2, ∞), by Doob's maximal inequality and duality. On the other hand, the version of (1.6) for r < 2 in [vNV20, Corollary 3.6] is not obviously related to (1.5). In Section 2, we review the characterization of uniform smoothness that will be used in the proofs of our main results.
In Section 3, we prove the inequality (3.1), which is a non-maximal version of Theorem 1.1. The proof of that inequality uses a Bellman function that is adapted from [Osę17a]. Although that inequality will not be used in the proof of Theorem 1.1, the Bellman function estimate in Proposition 3.1 will be used again there.
In Section 4, we prove the full Theorem 1.1. This is accomplished using a Bellman function that combines features present in the articles [BS15] and [Osę17a].
In Appendix A, we give a sample application of the weighted bound (1.2).

General facts about uniformly smooth spaces
We will use the regularity properties of the norm on a uniformly smooth Banach space that can be found e.g. in [vNZ11, Lemma 2.1]. We take the opportunity to streamline the deduction of these properties from (1.1). The following lemma is a minor variant of [DGZ93, Lemma I.1.3] (there, the case φ(x) = |x| is considered).
Lemma 2.1. Let (X, |·|) be a Banach space, φ : X → R a convex function, and x ∈ X such that Then φ is Fréchet differentiable at x, and its derivative satisfies |φ (x)| X ≤ L.
Proof. Convexity implies that, for any y ∈ X, the function t → φ(x+ty)−φ(x) t is monotonically increasing in t ∈ (0, ∞). Therefore, there exist one-sided directional derivatives and |A(y)| ≤ L|y| by (2.1). We will show that A is the Fréchet derivative of φ at x. From (2.2), it follows that A(y) + A(−y) = 0 for all y ∈ X. Hence, again by (2.2), we obtain This shows that the difference quotients of φ converge to A locally uniformly. It remains to show that A is linear. To this end, we first observe that A is convex, since it is the limit of the convex functions y → (φ(x + ty) − φ(x))/t. Then also Let (X, |·|) be a (p, C sm )-smooth Banach space with p ∈ (1, 2] and The hypothesis (2.2) of Lemma 2.1 follows directly from the definition (1.1). It is also easy to see that, for any x ∈ X, the hypothesis (2.1) holds with L = L(x) = p|x| p−1 . Therefore, Lemma 2.1 implies that the function φ is Fréchet differentiable, and |φ (x)| X ≤ p|x| p−1 .
Let C H ∈ [0, ∞] be the smallest constant such that, for any x, y ∈ X, we have Conversely, for any x, y ∈ X, we have Therefore, C p sm ≤ 2 p−2 C p H /p, so the conditions (2.3) and (1.1) are equivalent. However, we find the condition (2.3) more convenient to use, so all subsequent results will be formulated in terms of C H . We note that C p H ≥ p, as can be seen by considering a one-dimensional subspace of X.

Bellman function for the martingale
In this section, we adapt the Bellman function from [Osę17a] to our setting. This will allow us to prove the inequality Note that, unlike in (1.2), the constant on the right-hand side of (3.1) does not explicitly depend on p.
We denote the x-and the u-derivatives of U by U x and U u , respectively. Note that U is indeed Fréchet differentiable in x, and the derivative is given by The main feature of the function (3.2) is the following concavity property., Proposition 3.1. Suppose that C = 9 andC = 4 √ 2. Then, for any x, d ∈ X, q, u, v ∈ R ≥0 , and e ∈ R with u ≤ v and 0 ≤ u + e, we have Before turning to the verification of 3.3, let us quickly show why it is useful.
Proof of (3.1) assuming Proposition 3.1. Let w n := E(w|F n ) and w * n := max n ≤n w n . For each n, we apply Proposition 3.1 with Taking the conditional expectation on both sides of the resulting inequality, we obtain EU (f n+1 , q n+1 , w n+1 , w * n+1 ) ≤ EU (f n , q n , w n , w * n ). Iterating this inequality, we obtain Unlike in the scalar case in [Osę17a], it does not seem possible to directly use the Bellman function (3.2) to deduce the maximal estimate (1.2). However, Proposition 3.1 will be used in the proof of Proposition 4.1, which will in turn imply the maximal estimate.
We did not attempt to optimize the numerical values of C,C in Proposition 3.1. Also the conditions (3.16) and (3.17), according to which these values are chosen, can be improved by a more careful choice of numerical constants at various places in the proof. However, we should like to point out that the main loss compared to [Osę17a] is due to the use of the estimate (3.9) in several denominators.
Proof of Proposition 3.1. The inequality (3.3) is quite delicate for small values of d and e, and quite sloppy for large values. This can be seen by looking at the asymptotic behavior of (3.3) for d → ∞ or e → ∞, which is dominated by the term −C((u + e) ∨ v)(q + |d| p ) 1/p . Accordingly, we distinguish the following cases.

Bellman function for the maximal function
In this section, we combine the Bellman functions from [Osę17a] and [BS15]. For x ∈ X, |x| ≤ m, q ≥ 0, and 0 ≤ u ≤ v, let U (x, m, q, u, v) Evidently, the function (4.1) is a modification of (3.2). The most obvious such modification would be to replace |x| by m; the more sophisticated modification in (4.1) is chosen in such a way that the left-hand side of (4.2) becomes differentiable in d. The following concavity property is the main feature of the function (4.1).
Proof of Theorem 1.1 assuming Proposition 4.1. We apply Proposition 4.1 with the same parameters as in (3.4), and additionally m = f * n := max n ≤n |f n |.
Taking the conditional expectation on both sides of the resulting inequality, we obtain Iterating this inequality, we obtain This implies which in turn implies (1.2) in view of (2.4). A similar argument also shows (1.4).
Remark 4.2. Proposition 4.1 can also be used to recover a non-maximal bound similar to (3.1) (but with a larger absolute constant). This is because, by the AMGM inequality, Due to an additional maximum in (4.2), we have to distinguish a few more cases than in Section 3. The main distinction is according to the ordering of |x + d| and m, since this ordering substantially affects the shape of the function (4.1). The cases are as follows.
Case 1b. We keep the assumptions |x + d| ≤ m and |d| p ≤ q/2. Now, we consider the case u + e ≥ v. In particular, e ≥ v − u ≥ 0. Let For |td| p ≤ q/2, we showed I(t) ≤ 0 in the previous steps. Hence, it suffices to show the dominated convergence theorem holds in X [Zaa67, §72, Theorem 2]. Hence, we can approximate f by simple functions in L r (Ω, X), that is, finite linear combinations of characteristic functions of product subsets of Ω × S.
Let > 0, and let f be such a simple function with f − f L r (Ω,X) < .
Remark A.2. The space L r (Ω) in Proposition A.1 can be replaced by another Banach function space Y , provided that Y (X ) is a norming subspace of the dual space of Y (X), and, most importantly, that the martingale maximal operator is bounded on Y (X ). One example is when Y is a weighted L r space and X = R; the appropriate maximal bounds in this case have been proved in [DP19].
Proof. By the monotone convergence theorem, we may consider a finite sequence of times n ≤ N , so that the left-hand side of (A.7) is finite if the right-hand side is. Let f (ω, s) := max n≤N |f n (ω, s)|.
Remark A.4. In [VY19, Theorem 1.1], also a converse inequality to (A.7) has been proved. That converse inequality does not follow from the main result of [Osę17b], due to the restriction to weights that are almost surely continuous in time in that result. In [WZ21], we extend the main result of [Osę17b] in such a way that it recovers the converse to (A.7).