The Chain Records

Chain records is a new type of multidimensional record. We discuss how often the chain records occur when the background sampling is from the unit cube with uniform distribution (or, more generally, from an arbitrary continuous product distribution in d dimensions). Extensions are given for sampling from more general spaces with a self-similarity property.


Introduction
Consider independent marks X 1 , X 2 , . . . sampled from the uniform distribution in Q d = [0, 1] d . We define a mark X n to be a chain record if X n breaks the last chain record in X 1 , . . . , X n−1 . More precisely, record values and record indices are introduced recursively, by setting T 1 = 1, R 1 = X 1 and Here, ≺ denotes the standard strict partial order on R d defined in terms of component-wise orders by x = (x (1) , . . . , x (d) ) ≺ y = (y (1) , . . . , y (d) ) iff x = y and x (i) ≤ y (i) for i = 1, . . . , d .
It is easy to see that, in any dimension d, the terms of (T k ) are indeed well defined for all k, that is the chain records occur infinitely many times.
The term chain record is chosen to stress the most straightforward 'greedy' method of constructing a chain in a random partial order. This notion of multidimensional record has not been explored so far, although the definition is an obvious restatement of the classical recursive definition of a strict lower record (1; 37). The point advocated here is that the chain records are closer relatives to the classical univariate records than various other types of multidimensional records studied in the literature (4; 5; 13; 19; 20; 21; 26; 30; 31; 32; 34; 35). Apart from the theory of extreme values, the chain records are of interest in some other contexts, like search trees (14; 17), partially exchangeable partitions (38) and fragmentation processes (7). We shall exploit these connections to analyse the frequency of chain records and related properties of their values.
The chain records are intermediate between two other types of multidimensional records. We say that a strong record occurs at index n if either n = 1, or n > 1 and X n ≺ X j for j = 1, . . . , n − 1.
In the terminology of partially ordered sets, a strong record X n is the least element in the point set {X 1 , . . . , X n }. Since repetitions in each component have probability zero, X n is a strong record if and only if there is a strict lower record in each of d components, that is there are d marginal records simultaneously. We say that a weak record occurs at index n if either n = 1, or n > 1 and X j ≺ X n for j = 1, . . . , n − 1.
A weak record X n is a minimal element in the set {X 1 , . . . , X n }. Obviously, each strong record is a chain record. Also, each chain record is a weak record, as it follows easily by induction from transitivilty of the relation ≺. Thus, denoting N n , N n and N n the counts of strong, weak and chain records among the first n marks, respectively, we have N n ≤ N n ≤ N n .
To illustrate, for the two-dimensional configuration of points in Figure 1 the weak records occur at times 1, 2, 3, 5, 6, 7, 8, 9, the strong records occur at times 1, 8, the marginal records occur at times 1, 2, 3, 5, 6, and the chain records occur at times 1, 5, 8. Clearly, all three types of records coincide in dimension one, but for d > 1 they are very different. For instance, unlike two other types, the chain records are sensitive to arrangement of marks in the sequence: a permutation of X 1 , . . . , X n−1 may destroy or create a chain record at index n.
To show quantitative differences we shall compare how often the records of different kinds may occur.
Recall that in dimension d = 1 the question about the frequency of records is settled by the Dwass-Rényi lemma (8, Lemma 5.1), which states that the indicators of records I n = 1(N n > N n−1 ) are independent, with probability p n = P(I n = 1) = 1/n for a record at index n. This basic fact can be concluded by combinatorial arguments from exchangeability and uniqueness of the minimum among n sample points. It follows then in a standard way that N n is asymptotically Gaussian with both mean and variance asymptotic to log n.
Properties of the strong-record counts for sampling from Q d are also rather simple. By independence of the marginal rankings we have a representation N n = I 1 + . . . + I n with independent Bernoulli indicators and p n := P(I n = 1) = n −d . Thus For d > 1 the series p n converges, hence the total number of the strong records is finite with probability one. This implies that N n converge almost surely to some random variable which is not Gaussian.
Counting the weak records is a more delicate matter since the indicators I n = 1(N n > N n−1 ) are not independent. However, there is a simple device to reduce counting the weak records to known results about the minimal points. To this end, let ξ 1 < . . . < ξ n be n uniform order statistics, independent of the X j 's. Observe that the sequence (X 1 , ξ 1 ), . . . , (X n , ξ n ) is distributed like a uniform sample of n points from Q d+1 , arranged by increase of the last coordinate. A minute thought shows that X j (j ≤ n) is a weak record if and only if (X j , ξ j ) is a minimal point among n points in d + 1 dimensions. Using this correspondence and (4, Equation (3.39)) we have . Similarly, from the results in (5) we can conclude that the variance Var [N n ] is of the same order (log n) d , and that N n is asymptotically Gaussian. Asymptotic expansions of the variance of the number of minimal points are given in (3).
Thus the strong records are much more rare and the weak records are much more frequent than the classical records. In this paper we show that, as far as the frequency is concerned, the chain records in any dimension d are more in line with the classical records: Proposition 1. For sampling from Q d with uniform distribution the number of chain records N n is approximately Gaussian with moments The CLT will be proved in Section 3. Above that, we will derive exact and asymptotic formulas for the probability of a chain record p n = P(N n > N n−1 ) and will discuss some point-process scaling limits, generalizing that in the one-dimensional case.
The counting results remain the same if sampling from the uniform distribution in Q d is replaced by sampling from any continuous product distribution in R d . The situation changes radically for sampling from more general probability measures µ on R d , since the properties of record counts are no longer 'distribution free'. For instance, p n ∼ cn −2/(1+ρ) (log n) −ρ/ρ+1 for the bivariate correlated Gaussian distribution (26, p. 13) (in which case the total number of strong records is finite), while p n ∼ c/n for the bivariate Cauchy distribution (20, p. 515) (in which case the total number of strong records is infinite). The distribution of N n for various classes of measures µ was studied e.g. in (5; 6; 20; 33).
More generally, one can define records for sampling from some probability space (X , F, µ) endowed with a strict partial order ≺. In these general terms it is easy to give a criterion for there to be infinitely many chain records. Also, Proposition 1 and scaling limit results generalize for a class of sampling spaces which possess a self-similarity property, versions of which were exploited in a number of related contexts (12; 14; 24).

The heights at chain records and stick-breaking
For x ∈ Q d the quadrant L x := {y ∈ Q d : y ≺ x} is the lower section of the partial order at x. The height h(x) is the measure of L x , which in the case of uniform distribution under focus is equal to the product of coordinates. The height is a key quantity to look at, because the heights at chain records determine the sojourns T k+1 − T k (which may be also called inter-record times). Let H k = h(R k ) be the height at the kth chain record. Because a new chain record R k+1 is established as soon as L R k is hit by some mark, we have, exactly as in the classical case (8, Theorem 4.1): Lemma 2. Given the sequence (H k ) the sojourns T k+1 − T k are conditionally independent, geometrically distributed with parameters H k , k = 1, 2, . . .
Note that we can express the record counts and indicators through (T k ) as which implies that the occurences of the chain records are completely determined by the law of (H k ). Note also that N n ≥ k is equivalent to T k ≤ n. The lemma has the following elementary but important consequence.
Corollary 3. Given (H k ), the conditional law of (I n ) is the same as for the classical records from Q 1 with the uniform distribution.
The heights at chain records undergo a multiplicative renewal process, sometimes called stickbreaking. Define random variables W 1 , W 2 , W 3 , . . . by setting Explicitly, the density of W is and the Mellin transform is as follows by noting that H 1 is the product of d independent uniform variables.
It is an easy exercise to check for small sample sizes that the I n 's are not independent for d > In this case P(I n = 1) = θ/(θ + n − 1).
Proof. Let v n = P(I n = 0), u n = 1 − v n . The 'if' part follows from the well-known sequential description of the Ewens partition (39).
To prove the 'only if' part start by observing that by Lemma 2 and from the assumption of independence of the indicators By independence of W 1 = H 1 , W 2 = H 2 /H 1 and W 1 = d W 2 the middle expression in (6) can be factored as This is expanded using and evaluated by (5). Upon simplification (6) yields the identity For u 2 strictly between 0 and 1, the factor at u n is positive thus (u n , n ≥ 2) is uniquely determined by u 2 . Setting u 2 = θ/(θ + 1) the recursion is solved by u n = θ/(θ + n − 1). 2

Random partitions and the CLT
The cube Q d is decomposed in disjoint layers Q \ L R 1 , L R 1 \ L R 2 , . . .. We define an ordered partition Π of the set N in disjoint nonempty blocks by assigning two integers i, j to the same block of the partition if and only if X i , X j fall in the same layer. The blocks are ordered by increase of their minimal elements, which are T 1 , T 2 , . . .. For instance, for the configuration of points in Figure 1 the partition restricted to the set of 10 integers has three blocks {1, 2, 3, 4, 6, 10}, {5, 7, 9}, {8}. The partition Π is partially exchangeable, as introduced in (38), which means that for each n the probability of each particular value of Π| Eventually all H k 's will enter the resulting sequence. It is easy to see that given (H k ) the distribution of (U j ) | (H k ) is the same as the conditional distribution of (U j ) given the subsequence of its record values (H k ). Now, the interval ]0, 1] is broken into subintervals ]H k+1 , H k ] (with H 0 := 1), and the partition Π is defined by assigning integers a, b to the same block if and only if the ath and the bth member of the sequence (U j ) | (H k ) belong to the same subinterval.
In the classical case, (H k ) is the stick-breaking sequence with uniform factor W , and we have , so the insertion does not alter the law of the uniform sequence. In this case Π is the exhangeable partition whose law corresponds to the Ewens sampling formula with parameter θ = 1. If we take stick breaking with the beta factor W as in Proposition 5 we obtain the general Ewens partiton, which can be characterized by the property of independent indicators (36) (up to two trivial cases obtained by sending θ to 0 or ∞, respectively).
The construction of (U j ) | (H k ) and Π does not impose any constraints on the law of the sequence (H k ), which can be an arbitrary nonincreasing sequence (the induced Π is then the most general partially exchangeable partition (38)). With this in mind, we shall take a more abstract approach and digress from the geometric description of records. Suppose (H k ) is defined by stick-breaking (2), with a generic random factor W assuming values in ]0, 1[ . Let (T k ) be as in Lemma 2 and define N n by (1). As above, T k 's and N n can be interpreted in terms of a random partition Π.
Proposition 6. Suppose W has finite logarithmic moments Then for n → ∞ the variable N n is asymptotically Gaussian with moments We also have the strong law for which only m < ∞ is required.
Proof. Our strategy is to show that N n is close to K n := max{k : H k > 1/n}. By the renewal theorem (16) K n is asymptotically Gaussian with the mean m −1 log n and the variance σ 2 m −3 log n because K n is just the number of epochs on [0, log n] of the renewal process with steps − log W j .
By the construction of (U j ) | (H k ), we have a dichotomy: U n ∈ ]H k , H k−1 ] implies that either U n will enter the transformed sequence or will get replaced by some H i ≥ H k . Let U n1 < . . . < U nn be the order statistics of U 1 , . . . , U n . It follows that Let ξ n be the number of uniform order statistics smaller than 1/n. By definition, H Kn+1 < 1/n < H Kn , hence K n and ξ n are independent and ξ n is binomial(n, 1/n). By (i), we have N n ≤ K n + ξ n where ξ n is approximately Poisson(1), which yields the desired upper bound.
Now consider the threshold s n = (log n) 2 /n and let J n := max{k : H k > s n }. By (ii), if the number of order statistics smaller than s n is at least J n then N n ≥ J n . Because log n ∼ log n − 2 log log n the index J n is still asymptotically Gaussian with the same moments as K n .
On the other hand, the number of order statistics smaller than s n is asymptotically Gaussian with moments about (log n) 2 . Hence elementary large deviation bounds imply that N n ≥ J n with probability very close to one. This yields a suitable lower bound, hence the CLT. Along the same lines, the strong law of large numbers follows from N n ∼ K n .
Similar limit theorems have been proved by other methods for the number of blocks of exchangeable partitions (25), and for the length of a size-biased path in search trees (14; 18).
Proposition 1 follows as an instance of Proposition 6 by computing the logarithmic moments of the density (3)

Poisson-paced records
The probability p n of a chain record at index n is equal to the mean height of the last chain record before n. In terms of the quad-tree, p n is the probability that X n belongs to the path in the direction of the negative quadrant. To compute p n we shall exploit the same kind of a continuous time process as in (24), which may be interpreted as the process of a tagged particle in a fragmentation with self-similarity index 1 (7). See (14; 15; 17; 18) for other approaches to functionals of quad-trees.
Let (τ n ) be the increasing sequence of points of a homogeneous Poisson point process (PPP) on R + , independent of the marks (X n ). The sequence ((X n , τ n ), n = 1, 2, . . .) is then the sequence of points of a homogeneous PPP in Q d × R + in the order of increase of the time component, which now assumes values in the continuous range R + . Let N t be the number of chain records and B t the height of the last chain record on [0, t], that is Clearly, (B t ) is the predictable compensator for ( N t ), in particular The effect of poissonization on the process of records amounts to just replacing geometric sojourns in Lemma 2 by exponentials. We again digress from the detailed geometric description, and construct a process (B t ) by first letting (H k ) to be a sequence of distinct visited states and then requiring that the sojourns in these states be conditionally independent, exponential(H k ). Assuming further that (H k ) is derived by stick-breaking (2) the process (B t ) is Markov timehomogeneous with a very simple type of behaviour. Given B t = b the process remains in state b for some rate-b exponential time and then jumps to a new state bW , with W a stereotypical random factor with values in ]0, 1[ . Immediate from this description is the following self-similarity property: the law of (B t ) with initial state B 0 = b is the same as the law of the process (bB bt ) with B 0 = 1. In this form the process is well defined for arbitrary initial state b > 0 and arbitrary W with values in the open interval. See (23) for features of this process related to the classical records and (7) for more general self-similar (also called semi-stable) processes derived from subordinators. Proposition 6 translates literally as a CLT for the number of jumps of (B t ) within a large time interval (in the language of (7), this is the number of dislocations of a tagged particle in a self-similar fragmentation process with finite dislocation measure).
By the self-similarity of (B t ) the moments satisfy a renewal-type equation The series solution to this equation with the initial value m k (0) = 1 is with g(λ) = E[W λ ], as one can check by direct substitution (see (10)).
By (9, Theorem 1) the random variables tB t converge, as t → ∞, in distribution and with all moments to a random variable Y whose moments are given by , k = 1, 2, . . . where The law of Y is determined uniquely by the moments (8). This variable has a 'perpetuity series' representation where E k 's are exponential, W j 's for j > 0 are as before, W 0 has density (which is the stationary density for the stick-breaking process with factor W ) and all variables are independent. Equivalently, we may write Y in the form of an exponential functional (11) for (S t ) a compound Poisson process with initial state S 0 = − log W 0 and the jumps distributed like − log W .
Connecting discrete and continuous time models we have which in terms of the chain records is the poissonization identity saying that m 1 (t) is the probability that the first arrival after t is a chain record. This implies, upon equating coefficients of the series, (1 − g(j + 1)) .
We remind that (10) with general g(λ) = E[W λ ] yields the probability that n is the minimal element in some block of Π, for Π a partition of N introduced in Section 3, Specializing to the chain records from Q d we substitute g(λ) = (λ + 1) −d in the above formulas. Factoring we see that the series (7) is a generalized hypergeometric function of the type d F d . Formula (8) becomes The law of Y may be considered as a kind of extreme-value distribution, because it is the limit distribution of the height at the chain record last before time n. In the case d = 1 we recover well-known Y = d E with E standard exponential, and for d = 2 we get Y = d EU with E and U independent exponential and uniform random variables.
For d = 1 we obtain from (10) the familiar p n = 1/n, and for d = 2 we obtain surprisingly simple expression p n = 1/(2n) (for n > 1). For d > 2 the formulas for p n do not simplify. The depoissonization of the k = 1 instance implies, quite expectedly, Formula (10) can be compared with the analogous formulas for the occurences of strong and weak records. The last two follow by elementary combinatorics, but we do not have a direct combinatorial argument for (10).

Scaling limits
Let b > 0 be a scaling parameter which we will send to ∞. A more explicit construction of R is the following. Let H be the unique scale-invariant (that is, satisfying bH = d H, b > 0) point process on R + , whose restriction to [0, 1] has the same distribution as {W 0 , W 0 W 1 , W 0 W 1 W 2 , . . .} where the W j 's are independent, W k = d W and W 0 has the stationary density (9). Let {ξ k , k ∈ Z} be the points of H which may be labelled so that ξ 0 = W 0 is the maximum point of H ∩ [0, 1], and ξ −1 > 1. Assign to each ξ k an arrival time σ k := k i=−∞ E i /ξ i where the E i 's are independent standard exponential variables, also independent of H. Then let R := {(ξ k , σ k ), k ∈ Z}. The hyperbolic invariance of R is obvious from the construction and the scale invariance of H.
Next is a counterpart of Proposition 5.
Proposition 8. If one of the cooordinate projections of R is a Poisson process, then the law of W is beta(θ, 1) for some θ > 0, in which case both projections are Poisson processes with intensity θds/s, s > 0.
Proof. If the vertical coordinate projection of R is Poisson then the intensity is θds/s for some θ > 0, in consequence of scale invariance, but then W is beta(θ, 1) and the horizontal projection is the same Poisson, as is known in connection with the Ewens partition. The second part of the claim follows by observing that the law of the horizontal projection uniquely determines the law of W . Indeed, let ξ 1 < ξ 2 be two leftmost points of the horizontal projection on [1, ∞[ . By the construction of R we have ξ 1 /ξ 2 = d (E 1 /E 2 )W , with exponential E 1 , E 2 an E 1 , E 2 , W independent. Hence the law of W can be recovered from that of R. As noted by Charles Goldie, the component-wise logarithmic transform sends the chain records in Q d to the sequence of sites visited by a d-dimensional random walk whose components are independent one-dimensional random walks with exponentially distributed increments. Equivalently, one can consider the upper chain records from the product exponential distribution in d dimensions. In this regime, R k /k converge almost surely to the vector with unit coordinates and, subject to a suitable normalization, the process of record values can be approximated by a d-dimensional Brownian motion.

Chain records from general random orders
Let (X , F, µ, ≺) be a probability space endowed with a measurable strict partial order ≺. The measurability condition means that the graph of ≺ belongs to the product sigma-algebra F ⊗ F, and it implies that the lower sections L x = {y ∈ X : y ≺ x} are all measurable and that the height h(x) = µ(L x ) is a measurable function from X to [0, 1]. To avoid measure-theoretical pathologies like nonmeasurability of the diagonal we assume in the sequel that the space is sufficiently regular; one can think e.g. of a Borel subset of some Euclidean space. In this general setting, the chain records in a sample X 1 , X 2 , . . . from (X , F, µ, ≺) and the associated variables H k , R k , T k , I n are defined in full analogy with that in Q d . We allow measures µ with atoms, hence some marks in the sample may be repeated.
Each measurable subset A ⊂ X of positive measure inherits the structure of a partially ordered probability space, endowed with the conditional measure µ A := µ(·)/µ(A) and the height func- Note that by transitivity each L x is a lower set, and every union of lower sets is a lower set.
The next lemma states that the height function restricted to the sample is strictly monotone.
Proof. By transitivity, x ≺ y implies L x ⊂ L y and h(x) ≤ h(y) for almost all (x, y). Thus we only need to show that the event X 1 ≺ X 2 excludes h(X 1 ) = h(X 2 ). Suppose this does not hold, then P((X 1 ≺ X 2 ) ∧ (h(X 1 ) = h(X 2 ))) > 0 . Then by Fubini there is a set A of positive measure such that for x ∈ A we have P((X 1 ≺ x) ∧ (h(X 1 ) = h(x))) > 0 , and since in this event L X 1 ⊂ L x we have P((X 1 ≺ x)∧(L X 1 = L x )) > 0 . By exchangeability the latter is also true with X 2 substituted for X 1 , hence by independence P((X 1 ≺ x, X 2 ≺ x) ∧ (L X 1 = L X 2 = L x )) > 0 for almost all x ∈ A. This implies P((X 1 ∈ L X 2 ) ∧ (X 2 ∈ L X 1 )) = P((X 1 ≺ X 2 ) ∧ (X 2 ≺ X 1 )) > 0 , which contradicts to the asymmetry of ≺.
It follows that (H k ) is strictly decreasing almost surely. Lemma 2 is still valid in the general framework, but the law of (H k ) typically depends in a very complex way on the sampling space. As the first instance, we discuss features of H 1 and relate them to the properties of the chain records. Introduce the distribution function In a standard way, D is right-continuous with left limits, nondecreasing and satisfies D(1) = 1 and D(0) ≥ 0. Proof. If D(0) = 0 the process (X n ) never enters the set {x ∈ X : h(x) = 0} hence each chain record is eventually broken, and the sequence of chain records is infinite. If D(0) > 0 then with positive probability h(X 1 ) = 0 and the record process terminates at T 1 = 1.
Example 11. The distribution D may have jumps even when µ has no atoms. An example of this situation is R 2 with the standard order and with the measure supported by the union of the unit square and the segment connecting points (1, 0) and (2, −1), such that a half of the mass is spread uniformly over the square and another half is spread uniformly over the segment. Observe that D(0) = 1/2, with probability 1/2 the total number of chain records is infinite, and with probability 1/2 it is finite, depending on whether X 1 hits the square or the segment.
This readily implies This contradicts to the asymmetry of partial order, thus D(1) = D(1−).
Suppose now that s is a growth point for D. Then for every ǫ there is a set A ⊂ X of positive and letting ǫ → 0 we obtain D(s) ≥ s.
It remains to exclude the possibility that D has a flat which ends with crossing the diagonal by a jump. Suppose s − D(s−) > 0 for some s. Then B = {x ∈ X : h(x) ≤ s} is a lower set of positive measure, whose height function h B has distribution with an atom at 1. This is impossible by the first part of the argument. Partially ordered probability spaces (X , F, ≺, µ) for which D is uniformized have the characteristic property that the event {X 1 ≺ X 2 } coincides with {h(X 1 ) < h(X 2 )}. In this sense the height h plays the role of a 'utility function' representing the relation ≺ restricted to the sample. Another characterization of this situation is that the relation ≈ defined by There is a canonical minimal order ≺ on ([0, 1], B, dx) (see previous example) related to (R, B, µ, <). For x, y ∈ [0, 1] let x ≺ y iff D(x) < D(y). This makes two distinct point x and y incomparable if they belong to the same flat of D. Each flat of D corresponds to some atom of µ, and has the length equal to the size of this atom.

Comparing the records of distinct types
In the setting of Section 6 one can define also weak and strong records, as in the Introduction for Q d . Let p n , p n , p n be the probabilities that a weak, a strong or a chain record occurs at time n, and let N n , N n , N n be the number of records among the first n marks for each of these types, respectively. Obviously, The expected total number of records of each kind in the infinite sequence X 1 , X 2 , . . . is given by the corresponding infinite series. For weak and strong records, the divergence of the series is a necessary and sufficient condition that infinitely many records occur almost surely, see (30, Theorem 2.1). The probabilities p n , p n and p n are nonincreasing with n, as it follows by observing that a record occurs when X n hits a certain subset of X determined by X 1 , . . . , X n−1 , and these subsets are nonincreasing.
By definitions and exchangeability N n ≤ N n ≤ N n , p n ≤ p n ≤ p n and p n ≤ 1/n ≤ p n .
Complementing these relations we will show the inequality which makes a sense of the statement that the chain records are more rare than the standard records from Q 1 . As a nice exercise, the reader is suggested to prove (14) for strict lower records in a sample from some discrete distribution on R. For the probability of a weak record we have the integral representation which follows by conditioning on h(X n ) = s. A similar representation exists for p n in terms of the distribution of the upper section of ≺ at a random mark X 1 , but there is no general simple formula for p n .
Introduce the ratios with the convention that H 0 = 1 and H k = W k = 0 if the number of chain records is less than k.
Thus H k = W 1 · · · W k . Let (U k ) be a sequence of independent uniform [0, 1] random variables. Lemma 15. It is possible to define (H k ) and (U k ) on the same probability space to achieve

Proof.
Let D x be the distribution of height for the lower set L x with conditional measure µ(·)/µ(L x ). If µ(L x ) = 0 we let D x ≡ 1. Let D ← and D ← x be the generalized inverses of D and D x , respectively. By Lemma 12 we have D x (s) ≥ s thus D ← x (u) ≤ u, u ∈ [0, 1]. Let (X n ) and (U k ) be independent. Define W ′ 1 := D ← (U 1 ) and W ′ k+1 := D ← R k (U k+1 ) for k ≥ 1. By properties of the quantile transform we have the distributional identity and the same is true for conditioning on R 1 , . . . , R k since (R k ) is Markovian. It follows by Proposition 16. For N * n the number of classical records from Q 1 and N n the number of chain records from a general space X we have P(N n ≥ k) ≤ P(N * n ≥ k).
Proof. Let H * k = U 1 · · · U k , H * 0 = 1. Suppose that (T * k ) is such that given (H * k ) all T * k+1 − T * k are independent geometric(H * k ), and T * 1 = 1. Define N * n := max{k : T * k ≤ n}. Then (H * k ), (T * k ) and N * n may be identified with the record-values, records times and record counts for the standard records from Q 1 . In the setup of Lemma 15 we have H k ≤ H * k , hence T * k is stochastically smaller than T k , hence each N * n is stochastically larger than N n .

Self-similar spaces
The special feature which makes the occurences of chain records from Q d lending itself to study is the stick-breaking representation (2) with i.i.d. factors, which in turn follows from a selfsimilarity property of the cube. This property holds for a wider class of spaces introduced below.
We define a partially ordered probability space (X , F, µ, ≺) to be self-similar if for almost all x ∈ X the conditioned space (L x , F| Lx , µ Lx , ≺ | Lx ) is isomorphic to the whole space (X , F, µ, ≺). The isomorphism is understood as a measure-preserving bijection φ defined almost everywhere and such that the events {X 1 ≺ X 2 } and {φ(X 1 ) ≺ φ(X 2 )} coincide up to a zero event. For a self-similar space the factors in (2) are i.i.d. copies of W whose distribution is D.
Bollobás and Brightwell (12) introduced box-spaces with the property that all intervals y, x := {z ∈ X : y ≺ z ≺ x} (y ≺ x) are isomorphic to the whole space. Let o be the unique least element of X (if X has no such element we attach o to the space), then L x = o, x . Thus our definition of self-similar space requires isomorphism of a smaller class of intervals, hence every box-space is self-similar.
We demonstrate next various constructions of self-similar spaces.
Example 17. (R d as a product space) The cardinal example is, of course, R d with coordinate-wise order and a continuous product distribution µ = ⊗ d j=1 µ (j) . The transforma- The family of self-similar spaces is closed under the operation of cartesian product, with the product order defined analogously to the case Q d = [0, 1] × · · · × [0, 1].
Example 18. (discrete measures) For (z k ) a decreasing sequence and 0 < p < 1, q = 1 − p where δ x is the Dirac mass at x. Clearly, h(z k ) = q k . The probability transform yields the measure of the same type, but with supporting sequence z ′ k = q k . The space (R, <) with any such measure is self-similar, as it follows from the elementary properties of the geometric distribution. Computing the moments m and σ 2 is easy from the Mellin transform g(λ) = pq λ /(1 − q λ+1 ). From the moments formula follows that Y is a product of independent factors: exponential(1) and a variable with density ds/(ms) s ∈ [q, 1]. For q → 1 the second factor degenerates and the distribution of Y converges to exponential(1), as is intuitively clear since µ approaches the uniform distribution on [0, 1], hence the records must be similar to that from Q 1 .
If (R, B, µ, <) is self-similar then µ is either continuous or it is geometric as in Example 18. Indeed, applying the probability transform we reduce to the case µ[0, s] = D(s) of a uniformized measure µ. From the fact that H 2 /H 1 and H 1 are i.i.d. and (27, Corollary 6.5) follows that the support of µ is a multiplicatively regenerative set, but any such nonrandom set is either [0, 1] or a geometric sequence, as is obvious from (27, Section 6).
Bunge and Goldie (8, p. 297) asked if the Ewens sampling formula with θ = 1 may be given some interpretation in terms of records. In (28) an interpretation was given for arbitrary θ > 0 in terms of the classical records and a 'F α -model' for independent (but not i.i.d.) marks. The next example gives an interpretation in terms of suitable chain records, but only for 0 < θ ≤ 1. By Lemma 12 the parameter values θ > 1 cannot appear in this way, since then beta(θ, 1) is subdiagonal and cannot coincide with some height distribution.
Analogous examples with the independence property of indicators can be also constructed in higher dimensions.
Another partial order is the 'interval order' defined by ]a, b[ ≺ ]a ′ , b ′ [⇐⇒ b ≤ a ′ . With this order and the above density the space of intervals is a box-space (12, p. 63). For this space we have Both constructions of interval spaces generalize to arbitrary box-spaces.
The last formula specializes to (16) for β = α − 2, d = 2. This is no coincidence: an inspection of the interval space shows that it fits in the present example.
Example 22. (simplexes) This example is suggested by a construction of simplex trees in (14). Let X be a d-dimensional simplex with µ the uniform distribution on X . Fix a (d − 1)dimensional face of X and for x ∈ X let L x be the interior of the convex hull of x ∪ F . Define a partial order by y ≺ x ⇐⇒ y ∈ L x . The law of W is D(s) = 1 − (1 − s) d . This distribution is beta(1, d) (d = 1, 2, . . .), but the parameter stands here on the inappropriate side, hence the record indicators are independent only for d = 1.
Bollobás and Brightwell (12) also defined a larger class of spaces which have all intervals isomorphic to one another. By analogy, we can introduce a larger class of spaces which have (almost) all lower sections isomorphic to one another, but not necessarily isomorphic to the whole space.
Examples are easy to provide, one can take for instance any lower set in a self-similar space. For this larger class of spaces (2) still holds with independent factors, but only W 2 , W 3 , . . . must be i.i.d. The CLT for the number of chain records readily extends to this more general situation.