Infinite Random Power Towers

We prove a probabilistic generalization of the classic result that infinite power towers, $c^{c^{\dots}}$, converge if and only if $c\in[e^{-e},e^{1/e}]$. Given an i.i.d. sequence $\{A_i\}_{i\in\mathbb N}$, we find that convergence of the power tower $A_1^{A_2^{\dots}}$ is determined by the bounds of $A_1$'s support, $a=\inf(\mathrm{supp}(A_1))$ and $b=\sup(\mathrm{supp}(A_1))$. When $b\in[e^{-e},e^{1/e}]$, $a<1<b$, or $a=0$, the power tower converges almost surely. When $b<e^{-e}$, we define a special function $B$ such that almost sure convergence is equivalent to $a<B(b)$. Only in the case when $a=1$ and $b>e^{1/e}$ are the values of $a$ and $b$ insufficient to determine convergence. We show a rather complicated necessary and sufficient condition for convergence when $a=1$ and $b$ is finite. We also briefly discuss the relationship between the distribution of $A_1$ and the corresponding power tower $T=A_1^{A_2^{\dots}}$. For example, when $T\sim\mathrm{Unif}[0,1]$, then the corresponding distribution of $A_1$ is given by $UV$ where $U,V\sim\mathrm{Unif}[0,1]$ are independent. We generalize this example by showing that for $U\sim\mathrm{Unif}[\alpha,\beta]$ and $r\in\mathbb R$, there exists an i.i.d. sequence $\{A_i\}_{i\in\mathbb N}$ such that $U^r \stackrel{d}{=} A_1^{A_2^{\dots}}$ if and only if $r\in[0, \frac1{1+\log \beta}]$.}

1 is determined by the bounds of A1's support, a = inf(supp(A1)) and b = sup(supp(A1)).When b ∈ [e −e , e 1/e ], a < 1 < b, or a = 0, the power tower converges almost surely.When b < e −e , we define a special function B such that almost sure convergence is equivalent to a < B(b).Only in the case when a = 1 and b > e 1/e are the values of a and b insufficient to determine convergence.We show a rather complicated necessary and sufficient condition for convergence when a = 1 and b is finite.
We also briefly discuss the relationship between the distribution of A1 and the corresponding power tower T = A A . . . 2

Introduction and Main Result 1.Background
In this paper we investigate infinite random power towers, defined as the limit of the sequence On first exposure to power towers, it may be surprising that it is possible for them to converge at all when c > 1.After all, exponentiation famously diverges to infinity rapidly, one would expect the iterated sequence to diverge even more rapidly.However, for nonnegative c, the infinite power tower converges if and only if c ∈ [e −e , e 1/e ] [11].Even if the existence of this interval of convergence isn't a surprise, the endpoints seem far nicer than they have a right to be.
The sequence c, c c , c c c , . . . is naturally viewed as repeated iterates of the function x → c x starting at x = 1, which turns the problem into a discrete dynamical system.The theory of dynamical systems is complex and deep; in general their behavior is difficult or impossible to predict.However, this particular system is easy to get a handle on.For example, the iteration of x → ( √ 2) x forms a sequence that increases to the first fixed point of this function, i.e. 2 (Figure 1).
In general, iterating an increasing function creates sequence that either converges to a fixed point or diverges to infinity.Sequences arising from iterated decreasing functions will either converge to a fixed point, or oscillate in the limit.The convergence results about c c ... become perfectly natural in light of these facts: The sequence c, c c , c c c , . . ., converges in the limit when c ∈ [e −e , e  by analogy to summation and product notation, which we will follow here.Baker and Rippon [2] gave necessary and sufficient conditions for convergence of an alternating power tower with positive real coefficients, i.e. c k alternates between a and b, with a, b ∈ (0, ∞).Later [3], they found conditions for convergence and divergence when c k is periodic with other periods.Bachman (1995) [1] proves an essentially tight bound on convergence in the event that c k converges to e 1/e from above, based on a note without proof found in Ramanujan's notebook.
In this paper, we investigate the probabilistic question: If {A i } i∈N is an i.i.d.sequence of random variables, when does ∞ E i=1 A i converge?In our main result, Theorem 1.2, we prove necessary and sufficient conditions for almost sure convergence under the assumption that the support of A 1 is a bounded subset of [0, ∞).Surprisingly, almost sure convergence depends only on the upper and lower bounds of the support of A 1 , except when the lower bound is 1 and the upper bound is larger than e 1/e .Along the way in our proof, we will derive a closed form for a function whose existence was proven by Baker and Rippon [2], which defines the boundary between convergence and divergence of alternating power towers a b a b ••• for a, b ∈ (0, 1).
We end by asking which distributions arise as a limit of a sequence of random power towers: Given a random variable X, can we find an i.i.d.sequence A i such that X = E ∞ i=1 A i ?If so, we will say that X has the tower property, and the distribution of A i is its inverse tower distribution.In Theorem 3.1 we prove that if U ∼ Unif[α, β] and r ∈ R, then U r has the tower property if and only if 1 ∈ [α, β] and r ∈ [0, 1 1+log β ].
As in the deterministic case, it is natural to think of random power towers as com-EJP 0 (2023), paper 0. posing random maps x → (A n ) x .Composition sequences of random maps has also been studied extensively, for example, see the review by Diaconis and Freedman [8], who showed a probabilistic analogue of the Banach fixed point theorem, which may be applied in many diverse instances of random iterated functions.Stated informally, if f n is a random i.i.d.sequence of functions from some metric space S to itself, and f n is "on average" a contraction, then Note that the expression in the expectation is simply the log of Lipschitz constant of the map x → (A 1 ) x on the interval [0, e].The condition of the log Lipschitz constant having negative expectation was called "super-contracting" by Steinsaltz [14], who also studied infinite iterated function systems, and gave convergence results with weaker assumptions than Diaconis and Freedman's theorem.If we allow the base to exceed e 1 e , then their result is not much help, as x → a x is not a Lipschitz map of any subinterval of [0, ∞) to itself.For bases close to 0, we have the difficulty that log | log a| becomes very large, so one might expect the random power tower to have an oscillating limit if most of its weight is near 0, similar to the behavior of a a . . .when a < e −e .There are in fact many distributions for A 1 that will give rise to an almost-surelyconvergent power tower that do not satisfy the "contracting on average" condition, and indeed that do not satisfy any condition based on Lipschitz constants.

Notation
As mentioned above, we will be following Barrow's power tower notation with . We will also find it convenient to add to these .
Similar to exponentiation, we define this notation to associate to the right, i.e.

E
i∈A Also, by analogy to defining the empty product as 1, we define the empty power tower For the sake of concision, we will want to define This operation is commonly called "tetration" and is often denoted n c or c ↑↑ n, but we find this notation easier to read and pronounce ("c star n").We also define the iterated logarithm function Just as e ⋆ n diverges to infinity extremely rapidly, so log ⋆ (x) diverges to infinity extremely slowly.The iterated logarithm function is used in study of algorithms, and this notation is fairly standard there [7].
Also, let W (z) be the principal branch of the Lambert-W function, defined to be the inverse function of ze z on C. As a real-valued function, W (z) has domain [− 1 e , ∞] and range [−1, ∞].See [6] for an in-depth introduction to this function.
Throughout, we will always have {A n } n∈N representing an i.i.d.sequence on (0, ∞), A i , and a = inf(supp(A 1 )) and b = sup(supp(A 1 )).
In this paper, any unqualified equations or inequalities involving random variables may be taken either absolutely or almost surely, depending on whether the involved random variables are surely between the endpoints of their support, or only almost surely in that range.

The main theorem
The following theorem gives necessary and sufficient conditions for almost-sure convergence of the random power tower T n provided that the support of A 1 is not an unbounded subset of [1, ∞]: Theorem 1.2.We have four different cases, depending on the values of a and b: x Surprisingly, unless a = 1, the particulars of the distribution of A 1 are completely irrelevant except for the bounds of its support.One can also see immediately that this is a much more powerful result than the "contracting on average" condition from the introduction; indeed T n converges almost surely when a = 0 and b < e −e , but x → (A n ) x is never a contraction.
Even when a = 1 and b ∈ (e 1/e , ∞), the only fact about A 1 which matters is how much the distribution of A 1 is weighted on neighborhoods around 1. Interestingly, because log ⋆ 1 t−1 goes to infinity so slowly as t approaches 1 from above, for any function There is no reason to expect that any "nice" distribution for A 1 should correspond to a nice distribution of T , but we do have the following fortuitous example: Example 1.5.If U n and V n are two independent i.i.d.sequences, uniformly distributed on (0, 1), then the sequence E n i=1 (U i V i ) converges almost surely, and the limit E ∞ i=1 (U i V i ) is uniform on (0, 1).
Almost sure convergence follows immediately from the first convergence condition of Corollary 1.3, even though this distribution fails to be "contracting on average".
Proof that its limit is uniform is slightly more involved: If U, V and W are all independent uniformly distributed random variables on (0, 1), computation of the elementary As a result of this fact, we have that E 2n i=1 (U i V i ); U 2n+1 is uniformly distributed on (0, 1) for each n.We furthermore have which follows from the fact that the leftmost quantity equals x is an increasing function of x.The leftmost and rightmost terms both converge to the same limit, namely E ∞ i=1 (U i V i ), hence the middle term also converges the same thing.Since the middle term is uniformly distributed on (0, 1) for each n, it follows that its limit is as well.

Proof of Theorem 1.2
As we have seen, we cannot use contraction-type of results to prove Theorem 1.2.However, exponential functions do have a different useful property: They are always monotonic.The proof in all different cases relies heavily on this fact, but it works differently when a ≥ 1 and when a < 1, because in the former case, x → (A n ) x is always nondecreasing, whereas in the latter case, it can be either nonincreasing or nondecreasing depending on which side of 1 the base is on.

Cases 1 and 2 (a ≥ 1)
Importantly, a ≥ 1 implies that T n is a non-decreasing sequence because Our proof strategy for convergence in cases 1 and 2 involves putting an appropriate upper bound on T n to ensure convergence.For case 1, this is trivial: Proof.The proof is in several steps.The basic idea here is that, any time a run A i , A i+1 , . . ., A j are all far from 1, the partial power tower E j k=i A k becomes very large, but if A i−1 is extremely close to 1, it will counteract the run of large terms and E j k=i−1 A k may still be close to 1. Thus if A n is close enough to 1 often enough, we may have convergence of T n , but if not, we expect T n to diverge to infinity.
The quantity E inf{n : A n ≤ e 1 e⋆n } actually measures how frequently A n is sufficiently close to 1.Because e 1 e⋆n converges to 1 rapidly, we have P (A n ≤ e 1 e⋆n ) goes to 0 as well (unless P (A 1 = 1) > 0), and this expectation being finite means that these probabilities do not go to 0 too rapidly.For our proof, we substitute this condition for another, slightly easier to compute one, then show equivalence of the two at the end: We start by showing that convergence of T n is equivalent to convergence of For convenience, we define p j = P (A n > b 1 b⋆j ), which is well-defined because A n are identically distributed.

Summability of (2.1) implies T n converges
Here, we use the convergence of (2.1) to construct a distributional upper bound for T n .
For deterministic nondecreasing sequences, proving existence of an upper bound proves convergence.We use the following lemma, which is a probabilistic analogue of this fact, to prove convergence of T n .Lemma 2.1.Let {X n } n∈N be an almost surely non-decreasing sequence of random variables.Suppose that there exists an identically distributed (but not necessarily independent) sequence of random variables {Y n } n∈N such that X n ≤ Y n almost surely for all n.Then lim n→∞ X n converges with a nonzero probability.
In the case of EY n < ∞, this result follows from Markov's inequality.The general version is a straightforward modification.
Notice that T n 's convergence is a tail event (Theorem 1 in [4]), and therefore it either has probability 1 or 0 by Kolmogorov's 0-1 Law.Hence, it suffices to show that T n converges with nonzero probability.By Lemma 2.1, it suffices to construct a sequence of random variables B n , constant in distribution, such that T n ≤ B n .This construction will be fairly involved: Begin by defining a triangular array of random integer-valued random variables N ℓ,n for 0 ≤ ℓ ≤ n.We define N 0,n to be i.i.d. and independent from {A i } i∈Z , each having distribution . This is a well-defined probability distribution by the assumption that Note that N ℓ,n becomes 1 when A n−ℓ+1 is close to 1, and otherwise increments by 1.In this way, N ℓ,n is (roughly speaking) a measure of the length of a run of "distant-from-1" values of A i starting with index i = n − ℓ + 1.The requisite closeness to 1 is determined by how long the subsequent run of large values is.
We will ultimately use B n = b ⋆ N n,n as our bound on T n .First, we show that N ℓ,n has identical distribution for all ℓ ∈ {0, 1, . . ., n} by inducting on ℓ.Suppose for all m.This is true by definition for ℓ = 0.For ℓ > 0 and m = 1: For ℓ > 0 and m > 1, we again have Hence we have , which completes the induction step.
To apply Lemma 2.1, it suffices to show that because N n,n all have the same distribution, and T n is a nondecreasing sequence.
Then we conclude P (T n converges) > 0, and Kolmogorov's 0-1 law implies almost sure convergence.Define T n (ℓ) by and in particular T n = T n (n) is a nondecreasing sequence with b ⋆ N n,n as an upper bound whose distribution is constant in n, and therefore Lemma 2.1 implies that T n converges almost surely.

Convergence of T n implies summability of (2.1)
This time we assume lim n→∞ T n converges, and use its limiting distribution to construct a random variable U n on N whose distribution's existence is tied to (2.1) in a similar way to N n,n from the other direction.The random variable U n will also measure runs of large values of A i , but in a different way.The construction hinges on the fact that b ⋆ n → ∞, which follows from the assumption that b > e 1/e .We also will find it convenient to extend the i.i.d.sequence A n to the non-positive integer indices.Using almost sure convergence of T n and the fact that (A n ) n∈Z is i.i.d., we have is almost surely finite, satisfies S n+1 = (A n+1 ) Sn , and is identically distributed.We also observe that S n 's distribution has unbounded support.Define U n = min{U ∈ N | (b ⋆ U ) 3 > S n }.This is well-defined because b ⋆ n is unbounded, also note that its distribution is constant in n.We ultimately will use the distribution of U 1 to put an finite upper bound on and the support of U 1 's distribution is unbounded.In order to tie U 1 's distribution to our desired sum, we will need to define an auxiliary sequence of random integers (L n ) n≥0 recursively by L 0 = 0 and We claim L n ≤ U n for all n, which can be proven by inducting n.The base case is trivial as from the definitions, which implies L n+1 ≤ U n+1 .In the remaining case, i.e. when in this case.Therefore we have by induction that L n ≤ U n for all n.
The distribution of L n will be somewhat similar to that of N ℓ,n from the first part of the proof.Observe that for m ∈ N we have P ( By repeatedly applying this fact, we obtain P ( 3 , which in turn is equivalent to U n = K + 1.Therefore, using the fact that U n is constant in distribution and U n ≥ L n , < ∞ for all n and hence Convergence of this series is equivalent to convergence of the series from (2.1).

Equivalence of finiteness of (2.1) and E inf{n :
We start by observing that for any c > 1 For c = b this is simply the summation in equation (2.1) .Therefore it suffices to prove that E inf{n : Of course, this would follow if, for some fixed k ∈ N depending only on b, for all j ∈ N. To show this, we apply this lemma on the growth rate of power towers: Lemma 2.2.Suppose s n , t n are two sequences on (e 1 e , ∞) and u, c ∈ (e 1/e , ∞) such that When b > e 1/e , we can take s n ≡ b, t n ≡ e, and λ = 1 lnb and conclude that there exists from which (2.2) follows immediately.This completes the proof of Theorem 1.2 case 2.
Proof of Lemma 2.2.We first show the result for t n = u and T n = u ⋆ n: Pick τ > 1.
Observe that Therefore we can choose k τ ∈ N large enough such that u ⋆ (j + 1) > s τ 1 and for all j ≥ k τ .
We will show inductively that for all n > k τ .We already have assumed the base case, i.e. n = k τ + 1. Suppose that (2.3) holds for some n > k τ .Then we make a lengthy computation: EJP 0 (2023), paper 0.
Thus, by induction, we have u ⋆ n > S τ n−kτ for all n > k τ .For the other inequality, pick β < min{ ln c ln u , 1}, then chose k β ∈ N large enough so that Then we show inductively that u ⋆ n < βS n+k β (2.4) for all n.We have the base case (n = 0) already.Note that the second inequality implies thus we have that (2.4) holds for all n ≥ 0.
for all n ≥ k λ .Therefore, if we take k = 1 + max{k λ , k τ , k β }, we will have, for any n ≥ k that Now to prove the general case.Suppose s n and t n are two sequences satisfying the theorem's bounds.As we have shown, there exist k 1 , k 2 ∈ N such that obtain the desired result.

Cases 3 and 4 (a < 1)
Because a < 1, T n is not a monotonic sequence.A different approach is needed.Our proofs hinge on the following theorem.
Then F n (x) almost surely converges, and its limit is independent of x.
Unlike the bulk of previous work on iterated random functions, this theorem makes no continuity assumptions.Furthermore, unlike Lemma 2.1 and the "contracting on average" results in Diaconis and Freedman, this theorem has no obvious non-probabilistic analogue.It does bring to mind the fact that any non-decreasing function mapping a closed interval to itself must have a fixed point (a consequence of Tarski's latticetheoretical fixed point theorem [15]), but Tarski's theorem has no implications about the dynamics of such a function.Indeed, the iterates of a non-decreasing function of a closed interval need not converge to a fixed point in general.
Proof.Without loss of generality, we suppose [a, b] is finite (in the other case, we can use a bounded increasing function to convert [a, b] to a finite interval).Observe that F n (a) must form a non-decreasing sequence because f n (a) ≥ a, and similarly, F n (b) forms a non-increasing sequence, hence both of these converge almost surely.Call F ∞ (a) and F ∞ (b) their respective limits.Let f ω and f ω+1 be i.i.d. with the same distribution as the f i 's but independent of that sequence.We let We furthermore have convergence in probability: Hence we have a constant, nonzero lower bound on P (Y n ≤ X n ).By convergence in probability of X n and Y n to F ∞ (a) and F ∞ (b), respectively, we can conclude the same inequality for their limits It is clear from the definition that F ∞ (b) ≥ F ∞ (a) almost surely, so this implies that there is a nonzero probability F ∞ (b) = F ∞ (a).By monotonicity of F n for each n, F ∞ (b) = F ∞ (a) is equivalent to F n converging uniformly to a constant function.Thus, we have a nonzero probability of F n converging to a constant function.Since convergence of F n to a constant function is a tail event, this implies that the convergence is almost sure since its probability is nonzero.
We will apply Theorem 2.3 to both cases 3 and 4 of Theorem 1.2.
x < e −e .Proof.We will show that, when a < B(b), the functions x → E 2k(n+1) i=2kn+1 A i ; x satisfy the conditions of Theorem 2.3 for some k.When a ≥ B(b), we will show that there is almost surely a non-zero lower bound on the difference between T 2n and T 2n+1 .
Before diving into the proof, we note that for c ∈ [0, 1] function x → c x is a nonincreasing function from [0, 1] to itself, so we can restrict all our attention to this interval as T n will always be in that interval, and furthermore either T n converges in the limit, or it diverges by oscillation.We also have that T 2n is a non-increasing sequence, and T 2n+1 is a non-decreasing sequence, which follows from the fact that composing two exponential maps with base less than 1 forms a non-decreasing function.We furthermore observe that E n k=1 c k is nondecreasing in its even terms and nonincreasing in its odd terms.
Now, define the alternating power tower function AT n (x, y) recursively by AT 0 (x, y) = 1 and AT n (x, y) = x ATn−1(y,x) , so called because it makes a power tower that alternates between x and y, e.g.
AT 5 (x, y) = x y x y x .
These have the monotonicity properties that for each n, AT n (x, y) is increasing in x and decreasing in y.Furthermore, for fixed x, y, we have that AT 2n−1 (x, y) is an increasing sequence and AT 2n (x, y) is a decreasing sequence, so we can also define the functions which obey the relations and furthermore, both are fixed points of the map t → x y t .In fact, AT even (x, y) is the largest such fixed point, and AT odd (x, y) is the smallest.Baker and Rippon [2] have shown that there exists a function B(x) such that, for y ≤ x, AT even (x, y) = AT odd (x, y) if and only if 0 < y ≤ B(x).Here, we have found its closed form in terms of the Lambert W-function.This case of Theorem 1.2 follows immediately from these two lemmas: We note that (f n ) n∈N is a random i.i.d.sequence of increasing functions on a closed interval, so we will attempt to apply Theorem 2.3.By continuity, we have that for some Thus we can apply the result of Theorem 2.3 to obtain To complete the proof, we use the fact that both the even-and odd-indexed power towers converge.We conclude hence we have a nonzero probability that lim n→∞ T n does not converge, since there is a nonzero probability that the limit of T 2k is strictly greater than the limit of T 2k−1 .Since convergence is a tail event, Kolmogorov's 0-1 law implies that lim gives us the desired form for a = B(b), and the requirement that a ≤ b implies that we must take the principle branch of the W -function.
We can do a similar technique, using Theorem 2.3, to solve the case of a < 1 < b.Proof.We obviously have P (A n < 1) > 0, therefore we have a.s. that A n < 1 infinitely often.Let (n i ) i≥0 be the set of indices where A n < 1 (note that n i is an increasing sequence of random variables, and also n i+1 − n i is i.i.d.).Define and apply Theorem 2.3.(The limit is well defined because the limit of a monotonic function on [0, ∞) either converges or diverges to infinity.)To show that this theorem is applicable, we must show that f i (t) are increasing in t, i.i.d., and P (f Firstly, f i is increasing because {A n2i , . . ., A n2i+2−1 } contains exactly two elements less than one.Secondly, f i are i.i.d. because the three sequences (A ni ) i≥1 , {A n | A n ≥ 1}, and (n i ) i≥1 are independent and each individually is i.i.d.Thirdly, we must show This exists by the definition of a. Thus the event that A 1 ∈ [c, 1) and A 2 , A 3 ∈ [a, c] and A 4 > 1 has nonzero probability.In this event, we have n i = i + 1 for i ≤ 2 and n 3 > 4, and the following inequalities: Because c c ≥ c, we therefore conclude that Therefore we can apply Theorem 2.3 to x and find that lim i→∞ F i (x) converges almost surely to a limit that does not depend on x.
In particular, lim i→∞ From the definition, we note that f i (t) is always a number less than 1 to a nonnegative power, hence it is bounded above by 1, and therefore lim i→∞ F i (x) is finite.To complete the proof, we note what this means for T n .Let The upper and lower bounds almost surely converge to the same finite value, and hence T n converges almost surely.

Proof of Corollaries
The corollaries follow trivially from Theorem 1.2 except in the cases where b > e and a ≥ 1, where we have the claim that Also, observe that for any non-increasing function f : [1, ∞) → [0, ∞) and any random variable X supported on a subset of [1, ∞) with Ef (X) < ∞, we have using the dominated convergence theorem in the last step, which is applicable because f (t)1 [1,t] (x) ≤ f (x) for all x, and f (x) is integrable with respect to dP X .Therefore = 0 < 1, so we conclude that it suffices to prove Lemma 2.6.
In this form, it is clear that having information on the asymptotic behavior of P (A 1 ≤ t) as t goes to 1 from above should give us information about the convergence of this sum.First, consider the case of P (A 1 = 1) > 0, which implies lim = ∞, and also P (A 1 = 1) > 0 implies E(inf{n : Hence, we may proceed assuming that P (A 1 = 1) = 0.
Before proving either claim under this assumption, we want to replace log ⋆ ( 1 A1−1 ) with a new variable X = log ⋆ ( 1 log A1 ) as this will be easier to work with.To do this, we start by noting that , which may be shown by elementary calculus.We also know that for integers n and similarly the corresponding limsups are equal.
Suppose that lim inf Then, there exist constants c > 1 and N ∈ N such that j ≥ N implies Thus, for n > N , we have Similarly, if lim sup t→1 + P (A 1 ≤ t) log ⋆ 1 log t < 1, the summands go to 0 slower than 1 n c for some c ∈ (0, 1), hence the series diverges in that case.

The Inverse Question
It is natural to ask, which distributions can be represented as power towers of an i.i.d.sequence?More precisely, given a random variable T , does there exist an i.i.d.
? If the answer is yes, we will say T has a tower distribution, and the distribution of A i we will call its inverse tower distribution.
We will not answer this question in full generality, but Theorem 3.1 gives the answer in the case that T = U r for a uniform distribution U and fixed r ∈ R \ {0}.
This result suggests that the inverse question may be quite difficult for an arbitrary T , as it is not clear exactly what "goes wrong" when the conditions are not met.In the case of α = 0 and β = 1, this F has a relatively simple form, and we get an interesting generalization of Example 1.5: has a tower distribution if and only if r ∈ [0, 1].When r ∈ (0, 1): Let V 1 , V 2 , V 3 be independent and uniform on (0, 1).
has the inverse tower distribution of T .
In our proof of Theorem 3.1, we will use the following lemmas, each of which may be useful on its own for checking whether a distribution is a tower distribution: then Y has a tower distribution, and its inverse tower distribution is X.
Proof.Let X i be an i.i.d.sequence with terms having the same distribution as X, but independent of it.By the second condition of Corollary 1.3, the power tower formed T = E ∞ i=1 X i converges almost surely.This allows us to apply Letac's principle [13], which says that if (f i ) i∈N is an i.i.d.sequence of random continuous functions on some space E and lim n→∞ almost surely converges to a constant function on E, then the distribution of its limit is the unique distribution stationary under iteration by an independent copy of f i .In this case, taking E = [0, e] and f i (x) = X x i , we have  Proof.Start by assuming (a, b) ̸ = (0, 1).Choose ϵ ∈ (0, b).Then note that P (X ≤ a + ϵ) and P (X ≥ b − ϵ) are both nonnegative.Thus, we observe: Letting ϵ go to 0 we obtain the desired result for inf(supp(Y )).The proof for the supremum is similar.
If a = 0 and b = 1, we have inf(supp(Y )) = 0 by similar reasoning to the above.We must have sup(supp(Y )) ≤ 1 because but nothing else can be said about sup(supp(Y )).For example, if Y ∼ Unif(0, b) for b ∈ (0, 1) and we define X to be a power tower of i.i.d.copies of Y (which converges by Theorem 1.2), we have supp(X) = (0, 1).
An interesting consequence of Lemma 3.4 is that a which is the maximum of x 1 x for x ∈ R + .Proof of Theorem 3.1: Let T = U r .We observe that P (T ≤ x) = x is the inverse tower distribution function of T .By Lemma 3.3, it suffices to show that if A is a random variable, independent of T , with distribution function F , then A T d = T .First, we show that if b > e, then T does not have the tower property: By Lemma 3. that the infinite power tower formed by an i.i.d.sequence of copies of A converges, but we can also see that the limit of this sequence is bounded above by e, so its distribution cannot be T .For the case of b = ∞, in order for the power tower of A i 's to converge to T , we would need sup(supp(A i )) > e Thus, T cannot have a power-law tail, because such a distribution would have E(T ϵ ) < ∞ for sufficiently small ϵ.
If b ≤ e, then we check whether T has a tower distribution.Let G(x) = P (T ≤ x) be the distribution function of T .We note that, for x ̸ = 1 In the final step, we make the substitution t = log x log u .If a = 0, we take x As we have seen, it suffices to check whether A T d = T .Thus A has the inverse tower distribution of T if and only if its distribution function F (x) satisfies so we begin by looking for solutions to this equation.It is not hard to show that there is no distribution function satisfying the equation for p = 0, which in turn implies that exp(U ) never has a tower distribution for uniform U .However, this is not necessary for our theorem, and is left to the interested reader to check.
We do not necessarily have (3.3)implies (3.2) because of the differentiation step.We do, however, have that (3.3) implies −C We will start by showing that if p > 0 and F (x) is a distribution function satisfying (3.3), then K in the above equation is 0, hence (3.2) is satisfied by F , and we conclude that F is the distribution function for the inverse tower distribution of T .Observe that p > 0 implies lim x→∞ G(x) We first show this for a = 0, then deal with the case of a > 0. We note that when a = 0, we must have p > 0, because otherwise x p−1 does not have a finite integral.Then, (3.3) gives us F in closed form which is the form claimed in the theorem, but is not necessarily a distribution function.
, and therefore F cannot be a distribution function, so for a = 0 and b < 1, T does not have a tower distribution.If b ≥ 1, we do have F (x) ∈ [0, 1] for all x, so it suffices to check whether F is non-decreasing.Looking at the derivative, we can see that for x ∈ (0, b Similarly to the case of a = 0, when 0 < a ≤ 1 ≤ b ≤ e we must check that F (x) as defined by (3.3) is a distribution function.Since b ≥ 1 and a ≤ 1, Lemma 3.4 implies that F (x) = 0 for x < a , and hence F will also have negative derivative, hence we may proceed under the assumption that p > 0. If p ≥ 1 1−log b , then we note that F ′ (x) ≥ 0 for all x where the derivative exists.Conversely, if p < 1 1−log b , we can show that F ′ (x) < 0 for some x.Suppose b > 1.
Obviously, if M n < 0 for any n, then F ′ (x) < 0 for some x ∈ (0, 1).We claim that p < 1 implies lim n→∞ M n < 0. Assume that M n ≥ 0 for all n.Then lim n→∞ M n exists (since M n is non-increasing), and we have which is negative if p < 1 = 1 1−log b , a contradiction.Therefore M n < 0 for some n, which in implies that there exists x ∈ (0, 1) such that F ′ (x) < 0. Hence we have that F is not a distribution if p < 1, as claimed.
Thus, it remains to see what happens at the points where F is not differentiable.These points are all those of the form a   However, this is a contradiction because f N (x) and − x ap a p + 1 + ax ap log(x) a p are distinct analytic functions on (0, ∞), so they cannot be equal on any open set.That they are distinct can be seen by looking at the limiting behavior as x goes to 0 or goes to ∞.When p > 0, the limit as x goes to infinity of f N (x) is −∞, while the limit of − x ap a p + 1 + ax ap log(x) a p is ∞.Similarly, when p < 0, the limits as x goes to 0 of the two functions are unequal.Thus, when a > 1, we have that T does not have the tower property.The proof for b < 1 and a > 0 is similar.

Open questions
There remain many open questions about random power towers.As far as convergence is concerned, Theorem 1.2 is quite broad.It only leaves out the case of a = 1 and b = ∞.While our condition for convergence when a = 1 and b > e 1 e does not involve b in any way, our proof required boundedness of A 1 .However, there do exist unbounded distributions for A 1 such that the corresponding power tower converges.A k be the corresponding power tower.Then T i converges almost surely.
To show convergence of T i , it suffices to show that lim i→∞ E log ⋆ (T i ) < ∞, since this implies that the probability of T i going to infinity is 0. .Assuming this is true for i, we prove it for i + 1 (making use of the fact that log ⋆ (x + y) ≤ 1 + log ⋆ x + log ⋆ y in the third line): T i e ⋆ (16n) .
This, and other similar examples, suggest that in order to check convergence of T i for unbounded A 1 , we need to have some comparison between the weight of the distribution of A 1 at 1 and at infinity.I conjecture that condition 2 from Theorem 1.2 may be relaxed by replacing b < ∞ with E log * A 1 < ∞.
We also did not touch on questions of convergence rate at all.For the "contracting on average" random power towers, Diaconis and Freedman's results also give us an exponential convergence rate.Are there cases that do not have an exponential type of convergence?If so, what convergence rates are possible?
We only scratched the surface of the inverse question.If T were a mixture of powers of uniform distributions, we should be able to apply similar techniques to those used in the proof of Theorem 3.1, though the complexity of checking monotonicity increases substantially.Can one tell in general whether T has a tower distribution by approximating it with uniform distributions?

Figure 1 : 1 ee 1 e
Figure 1: Iterating exponential functions showing the four possible behaviors: a. Increasing convergence to a fixed point when c ∈ [1, e 1 e ]; b.Increasing to infinity when c > e 1 e ; c. Alternating convergence to a fixed point when c ∈ [e −e , 1]; d.Oscillation in the limit when c < e −e .

Figure 2 :
Figure 2: Limiting behavior of c, c c , c c c , . . .for c on either side of e −e ≈ 0.06

4 .Corollary 1 . 3 ( 5 .− 1 > 1 Corollary 1 . 4 (
< e −e If b ≤ 1 and a < B(b), then T n converges almost surely.Conversely, if b < e −e and a ≥ B(b), then T n diverges by oscillation, with the even and odd subsequences converging to distinct limits.If a < 1 < b ≤ ∞, then T n converges almost surely.For the purposes of checking given a distribution for A 1 whether T n converges, the following corollaries are easier to use (especially since we can do without the rather technical condition of case 2): Convergence conditions).If any of the following holds, then T n converges almost surely: 1. a = 0 2. b ∈ [e −e , e 1 e ] 3. a < 1 ≤ b 4. b < e −e and a < exp W a = 1 and b is finite and lim inf t→1 + P (A 1 ≤ t) log ⋆ 1 t Divergence conditions).If any of the following holds, then T n diverges almost surely: 1. a > 1 and b > e

Theorem 2 . 3 .
Let f i : [a, b] → [a, b] be a sequence of i.i.d.randomly distributed nondecreasing functions, where [a, b] is a closed interval in the extended real line R ∪ {±∞}.Furthermore, assume that P

If b ≤ 1
and a < B(b), then T n converges almost surely.Conversely, if b < e −e and a ≥ B(b), then T n diverges by oscillation, with the even an odd subsequences converging to distinct limits.

Lemma 2 . 4 .Lemma 2 . 5 .
If 0 ≤ a < b ≤ 1 and there exists k ∈ N such that AT 2k−1 (b, a) > AT 2k (a, b), then T n converges almost surely.Conversely, if AT 2k−1 (b, a) ≤ AT 2k (a, b) for all k, then T n almost surely diverges by oscillation.If 0 hence T n converges, as desired.Conversely, we suppose AT 2k−1 (b, a) ≤ AT 2k (a, b) for all k, which implies AT odd (b, a) ≤ AT even (a, b).Let c ∈ (a, b] such that P (A 1 ≥ c) > 0. Therefore, we have the following with nonzero probability:

n→∞ T n converges with probability 0 , 1 = 1 . 1 . 1 (.
proving the lemma.Proof of Lemma 2.5.Baker and Rippon[2] have proven that lim n→∞ AT n (a, b) converges if and only if ϕ a,b (x) = a b x has exactly one fixed point c such that |ϕ ′ a,b (c)| ≤ 1.Furthermore, they showed for each b, there exists a constant a 1 such that this happens for all a ≤ a 1 and for no a ∈ (a 1 , b).Furthermore, ϕ a1,b (x) has a fixed point c such that ϕ ′ a1,b (c) = 1.If AT n (a, b) converges, by monotonicity and the fact that a ̸ = b, this implies lim n→∞ AT n (b, a) > lim n→∞ AT n (a, b), and in particular that AT 2n−1 (b, a) > AT 2n (a, b) for some n.Conversely, if lim n→∞ AT n (a, b) does not converge, we must have a ∈ (a 1 , b), and by continuity of a b x in all variables and its monotonicity properties, we must have that AT even (a, b) is larger than the fixed point of x → b x for all a ∈ (a 1 , b).This implies that AT even (a, b) 1/ATeven(a,b) ≥ b, and thus AT even (a, b) ≥ b ATeven(a,b) = AT odd (b, a).Hence, for all n, we must have AT 2n (a, b) ≥ AT 2n−1 (b, a).Therefore it suffices to show that ϕ a,b (c) = c and |ϕ ′ a,b (c)| ≤ 1 has exactly one solution if and only if 0 ≤ a < B(b).Baker and Rippon also proved that for each b ≥ e −e , this holds for all a ∈ [0, b], and for each b < e −e , there exists a constant a 1 ∈ (0, b) such that this holds for a < a 1 and fails for a ∈ [a 1 , b].Furthermore, they proved that there is a fixed point c of ϕ a1,b such that ϕ ′ a1,b (c) = (log a 1 )(log b)b c a b c Therefore, it suffices to prove that a = B(b) is the only number in (0, b) that solves the system a b c = c (log a)(log b)b c a b c = Solving this system is not obviously possible in any kind of closed form.Using the Lambert W -function, it becomes straightforward.Start by rearranging the second log(a b c )a b c be either of the two real branches of the W function.Using the fact that exp(W (x)) = x W (x) for any branch of the W function, this gives us c = Thus, the requirement a = c b −c

Lemma 3 . 4 .
and thus the distribution of T is the unique one such that X T d = T , hence T d = Y , so the distribution of X is the inverse tower distribution of Y .Suppose that X, Y are independent, nonnegative, bounded random variables such that X d = Y X .Let a = inf(suppX) and b = sup(supp(X)).If b < ∞, and (a, b) ̸ = (0, 1), then inf(supp(Y )) = max(a

If a = 0 1 a and b 1 a
and b = 1, then inf(supp(X)) = 0 and sup(supp(Y )) ≤ 1.If a = 0 and b ̸ = 1, we treat the quantities a as being the limit as a approaches 0 from above.

1 a ≤ b 1 b
and a a ≤ b b are necessary conditions for X having a tower distribution.These are not guaranteed by a ≤ b.For instance, if X has support [ 1 5 , 1 2 ], then there can be no

1 r 1 r
−α β−α for x ∈ (α r , β r ) and hence the probability density function of T has the form p T (x) = Cx p−1 x ∈ (a, b) , a = α r and b = β r (or the other way around, if r < 0), and C chosen such that the total integral is 1.Thus, it suffices to show that a random variable T with a density function of the form given in (3.1) has a tower distribution if and only if 1 ∈ [a, b] and 1 p ∈ [0, 1 1+p log b ], which is equivalent to b < e and p ≥ 1 1−log b .In that case, we claim

1 e . If c > e 1 e
and P (A

b
)) p = |C| lim x→∞ b p − a p p(log x) p = 0. Hence, to prove the theorem it suffices to show that there exists a distribution function satisfying (3.3) if and only if 0 ≤ a ≤ b < e and p ≥ 1 1−log b .

b 1 b
).Since that is a decreasing function, it suffices to check at x = b , hence F is a distribution function if and only ifp − 1 − p log b ≥ 0 ⇐⇒ p ≥ 1 1 − log bwhich is the claim of the theorem.Next, we deal with the case of a > 0 and 1 ∈ [a, b], after which we will conclude by showing that when a > 0 and 1 / ∈ [a, b], equation(3.3)    has no solution, completing the proof of the theorem.

1 b 1 b
and F (x) = 1 for x > b , so we have

FF + (a 1 b ) = a p b p ( 1 −F
for n ∈ N. Define F ± (x) = lim h→0 ± F (x + h) to be the left-and right-limits of F .By induction, we can show for all n.This is trivial for n = 0 because log a) − a p b p = − a p b p log a ≥ 0 = F − (a 1 b ) and the recurrence relation for F gives us the inductive step: ) ≥ F − (b a n b n+1 ) for all n, and hence F is non-decreasing in all the cases where p ≥ 1 1−log b , as claimed.Finally, we check that (3.3) has no solutions when 1 / ∈ [a, b] and a > 0. To do this, we notice that for x ∈ (a 1/b , b 1/b ), (3.3) becomes F (x) = x bp b p − a p b p − bx bp log(x) when x ∈ (a 1/a , b 1/a ) F (x) = − x ap a p + 1 + ax ap log(x)

(3. 5 )Suppose a > 1 . 1 b 1 b
Define f n recursively by f 0 (x) = 1 and f n (x) = x bp b p − a p b p − bx bp log(x) b p + a p b p f n−1 (x b a ).Note that, if p > 0, lim x→∞ f n (x) = −∞ for all x, and if p < 0, lim x→0 + f n (x) = ∞.(These may be shown inductively.)By Lemma 3.4, we have that F (x) = 1 for x > b .Therefore, for x ∈ (bbx bp log(x) b p + a p b p = f 1 (x)and inductively F (x) = f n (x), for all x ∈ (b ).We observe that there exists N ∈ N such that I = (b By Lemma 3.4, F (x) = 0 for x < a

1 a+ b p a p F (x a b ) = − x ap a p + 1 +
, so by equation(3.5), for allx ∈ I f N (x) = F (x) = − x ap a p + 1 + ax ap log(x) a p ax ap log(x) a p .

Example 4 . 1 .
Let {A i } i∈N be an i.i.d.sequence with distribution given by A 1 = e ⋆ n w.p. 1 2 n+1 for n ∈ N e 1 e⋆(16n) w.p. 1 2 n+1 for n ∈ N and let T i = i E k=1
and T n diverges to infinity a.s.if

1 ,
as claimed.This obviously satisfies(3.3),so it remains to show that it is a distribution function if and only if p ≥ Note that E log ⋆ (A 1 ) = we show inductively that E log ⋆ T i < 8(1+ and