Reconstruction of Line-Embeddings of Graphons

Consider a random graph process with $n$ vertices corresponding to points $v_{i} \sim {Unif}[0,1]$ embedded randomly in the interval, and where edges are inserted between $v_{i}, v_{j}$ independently with probability given by the graphon $w(v_{i},v_{j}) \in [0,1]$. Following Chuangpishit et al. (2015), we call a graphon $w$ diagonally increasing if, for each $x$, $w(x,y)$ decreases as $y$ moves away from $x$. We call a permutation $\sigma \in S_{n}$ an ordering of these vertices if $v_{\sigma(i)}<v_{\sigma(j)}$ for all $i<j$, and ask: how can we accurately estimate $\sigma$ from an observed graph? We present a randomized algorithm with output $\hat{\sigma}$ that, for a large class of graphons, achieves error $\max_{1 \leq i \leq n} | \sigma(i) - \hat{\sigma}(i)| = O^{*}(\sqrt{n})$ with high probability; we also show that this is the best-possible convergence rate for a large class of algorithms and proof strategies. Under an additional assumption that is satisfied by some popular graphon models, we break this"barrier"at $\sqrt{n}$ and obtain the vastly better rate $O^{*}(n^{\epsilon})$ for any $\epsilon>0$. These improved seriation bounds can be combined with previous work to give more efficient and accurate algorithms for related tasks, including: estimating diagonally increasing graphons, and testing whether a graphon is diagonally increasing.


Introduction
In this paper, we propose and analyze new algorithms for estimating latent vertex labellings given an observed random graph. We first discuss a motivating simpler problem, often called the seriation problem (this dates back at least to the 1899 paper [34]; see also e.g. [24], [23]). The basic seriation problem considers a Robinsonian similarity matrix, i.e. a symmetric n by n matrix A = [a i,j ] 1≤i,j≤n with the property: there exists a permutation σ ∈ S n so that every row of the permuted matrix A σ = [a σ(i),σ(j) ] 1≤i,j≤n is unimodal with the maximum occurring at the diagonal. It then asks: how can we find such a permutation σ? This problem turns out to have an elegant and computationally-tractable solution. For any embedding φ : {1, 2, . . . , n} → [0, 1] of the indices of A into the line segment [0, 1], we define the induced permutation σ ∈ S n by the equation breaking ties arbitrarily. It turns out that, under mild conditions, the correct permutation σ is of the form whereφ is an eigenvector of a matrix related to A [1]. This seriation problem occurs in a number of contexts, where we have some collection of objects {1, 2, . . . , n} which we would like to order, (e.g. ordering types of artifacts by their ages), and some measure A = [a ij ] as to whether a pair of objects are similar (e.g. their co-occurrence in tombs). A special case is graph seriation, where the aim is to determine whether a permutation of the vertices exists so that the corresponding adjacency matrix has the Robinsonian property, and to find such a permutation.
Of course, real data is noisy. Even if the expected value of the similarity matrix is Robinsonian, the observed similarity matrix may not be and a "perfect" ordering may not exist. Our aim is to solve a very generic "noisy" version of the graph seriation problem. Following [11], we introduce noise by studying a natural generalization of the above example based on some models from Bayesian nonparametrics. Recall that a graphon is just a symmetric measurable function w : [0, 1] 2 → [0, 1] (these were introduced in [28]; see [27] for a broader survey). Under suitable measurability conditions, a graphon defines an algorithm for sampling a random graph of any size n: (1.2) We write G ∼ w if G is a random graph obtained from graphon w in this way.
In [11], the authors say that a graphon w is diagonally increasing if it satisfies for all x < z < y. Note that this implies that w(x, ·) is unimodal with the maximum occurring at x, so this definition matches that of the Robinsonian property for matrices.
We think of the following simple and well-studied family of graphons as our prototypical examples, and will use this family as a running example in the text: These graphons satisfy (1.3) as long as 0 ≤ q ≤ p ≤ 1 (though reconstructing an ordering is clearly impossible if p = q or d ∈ {0, 1}). Very similar random graphs have been extensively studied (see e.g. [2,5,14,36]). Special cases of this noisy seriation problem (with points sometimes embedded on a circle instead of an interval) appear in many areas. In some instances, an approximate ordering with an error rate similar to that presented in this paper has been obtained. However, generally such results only apply to a narrow class of random graphs, and the restriction of the problem to this specific class allows for the use of highly specialized methods. A few examples beyond archaeology include genomics [2,35], ranking [17], data visualization [32], and ecology and sociology [24].
Graphs sampled from a diagonally increasing graphon have a natural lineembedded structure, defined by the latent variables (U 1 , . . . , U n ). Namely, vertex i will have a higher probability of linking to vertex j if U j is closer to U i . Consequently, the permutation that reveals the "almost" Robinsonian property of the adjacency matrix will be the permutation σ(·, (U 1 , . . . , U n )) induced by the values U i . We call this permutation the line-embedded permutation. This raises the main question of our paper: how accurately can we recover the line-embedded permutation based on an observed graph G? Remark 1.1. The embedding φ = (U 1 , . . . , U n ) and its "reverse" 1 − φ = (1 − U 1 , . . . , 1 − U n ) are both equally valid line-embeddings for a sampled graph G, and it is impossible to distinguish between the line-embedded permutation σ(·, φ) and its reverse σ rev = σ(·, 1 − φ) based on an observed graph. In informal discussion we will often implicitly mod out by this reverse operation and talk about "the unique" line-embedded permutation without further comment.
A natural idea is to apply a spectral method as used in [1] for the noiseless matrix seriation problem. A related approach was studied in the recent work [36] in the special case that w is of the form (1.4) and d = 0.5, q = 0. Theorem 3 of this paper says that, with high probability, there exists a set I ⊂ {1, 2, . . . , n} of size |I| = n(1 − o(1)) so that max i∈I |σ(i) − σ(i, (U 1 , . . . , U n ))| =Õ(n 0.5 ). (1.5) However, the authors indicate that their methods cannot be easily extended to other graphons or, indeed, other parameters of (1.4). It was also not clear if this bound could be improved by other methods. Our paper introduces and analyzes two new algorithms for the noisy graph seriation problem, which approximate the line-embedded permutation for graphs sampled from a large general class of diagonally increasing graphons. The paper has two main results, which bound the error in the approximate permutation returned by our algorithms. In Theorem 1, we extend the result from (1.5) to a very general class of graphons, and obtain an error term of max 1≤i≤n |σ(i) − σ(i, (U 1 , . . . , U n ))| =Õ(n 0.5 ).
In Theorem 2, we greatly improve this bound for a slightly smaller class of graphons, showing max 1≤i≤n |σ(i) − σ(i, (U 1 , . . . , U n ))| =Õ(n ) (1.6) for any fixed > 0. This smaller class still contains the most popular statistical models of graphons, including those of the form (1.4) in the special case q = 0.

Remark 1.2 (Errors in Positions and Orderings)
. We pause to explain why this result may be surprising. In practice, many latent-position models are analyzed by algorithms that first estimate all latent positions and then plug this estimator into a formula for a quantity of interest (in seriation this quantity of interest is the ordering, but see also [26] for applications of the same approach to other problems). This approach seems sensible under the condition that the first step is not much less accurate than the second step. However, this condition turns out to fail in the context of seriation. In a fairly strong sense that we make precise in Section 6, it is not possible to reconstruct the latent positions U 1 , . . . , U n themselves with an error that is similar to the error on the positions given in (1.6). We believe that such a discrepancy between the achievable error in reconstructing ordering and the achievable error in reconstructing latent positions is important for practical algorithm development in the area, as it suggests that beginning with an embedding step could result in a statistically inefficient algorithm.
See Section 6 for further discussion and a more precise version of this heuristic.

High-level algorithm descriptions
We sketch our main algorithms and give heuristics for why they work. Our first algorithm (Algorithm 1) proceeds as follows: 1. We begin by computing a new graph G (2) α from the observed graph G, as follows: (a) We square the adjacency matrix; its entries represent the number of common neighbours of pairs of vertices. Since the number of common neighbours are the sum of many independent Bernouilli trials, the entries of this squared matrix concentrate around their expected values.
(b) We convert the square matrix into a binary matrix, and thus a graph, by thresholding at some level roughly αn. We show the thresholded matrix is "almost" Robinsonian, in the sense that any violations occur in the very narrow bad region where the expected value of the squared matrix is very close to the threshold value αn.
2. We then take many small random subgraphs of G (2) α and attempt to order them using a deterministic algorithm. Since G (2) α is "almost" Robinsonian, most of these subgraphs will be exactly Robinsonian. 3. We then align our orderings of these small subgraphs. (Recall from Remark 1.1 that individual graphs can't distinguish between σ true and σ eurt , so our subgraph orderings will generally not all be aligned to the same one.) This turns out to be straightforward as long as there are enough small subgraphs to guarantee substantial overlap. 4. Finally, we merge all of our subgraph orderings into a large ordering, essentially by a voting procedure. All of the steps up to this point preserve quite a bit of randomness, and so we again obtain concentration bounds for the result.
Sections 3.3 and 3.4 give a more detailed sketch of our analysis of Algorithm 1, giving precise error bounds in place of informal phrases such as "almost." We believe it is possible for the reader to have a nearly-complete understanding of the algorithm by reading these two short sections; the remainder of Section 3 is concerned with filling in unsurprising details or ruling out failure modes that "obviously" should not occur.
The second algorithm (Algorithm 6), which achieves much smaller error bounds, proceeds in a sequence of stages.
1. The process starts with a coarse ordering achieved using Algorithm 1 on a small random subgraph. 2. In each subsequent stage, we use the approximate ordering on a small graph to obtain an ordering on a slightly larger graph with a slightly smaller error. These iterative steps use the graph itself (not its square). The additional condition of Theorem 2 guarantees that for every pair of vertices i, j, there is an interval I i,j ⊂ [0, 1] so that any vertex k with U k ∈ I i,j may link to i, but cannot link to j. Vertices of the previous iteration with U k -values estimated to be in this interval provide a signal of the true ordering of i and j. 3. In the final stage we return the permutation based on a final ordering on the full graph.
for all i, j, or satisfies for all i, j. Say that a permutation σ has error less than D if there exists a correct ordering σ correct so that, for all 1 ≤ i, j ≤ n satisfying σ correct (i) > σ correct (j)+D, we also have σ(i) > σ(j).

A simple assumption
In this section, we give a simple-to-state sufficient condition for our first main result, Theorem 1. This sufficient condition is satisfied for many graphons (including those of the form (1.4)), but is far from tight; we relax these conditions in the statement of Theorem 3 in Section 3.1. We begin by restricting our attention to the simple class of uniformly embedded graphons: In this context, f is called the link probability function of w.
We then make the following technical assumption on the link function: In addition, f is decreasing, and there exist constants where d = min{0.5, 2d}. This shows a sketch of w(0, ·) and w (2) (0, ·) for a graphon of type (1.4) that satisfies Assumption 1.5. Note that w (2) violates the diagonally increasing condition, but w (2) α (shown in red) does not. Remark 1.6. Denote by w (2) the usual "square" of a graphon (see Definition 2.1). We note that the left hand side of Equation (1.7) equals inf s∈[0,d] w (2) (0, s), and the right hand side equals w (2) ( 1−d 2 , 1+d 2 ). Also, since w(x, y) ≥ c for all x, y, the lower bound on α implies that α > c 2 .
Condition (1.7) is needed to guarantee that the thresholded squared graph used in our algorithm is close to diagonally increasing. It is not generally true that the square of a diagonally increasing graphon is diagonally increasing. See Figure 1 for an illustration of a simple counterexample of type (1.4). However, even if the squared graphon itself is not diagonally increasing, we can often assure that the thresholded squared graphon is diagonally increasing for some well-chosen α. See the appendix for details on the example in Figure 1, including the choice of α.
We note that this obstacle is real (not merely an artifact of our proof technique), and other approaches to seriation have similar obstacles. For example, spectral seriation requires the Fiedler eigenvector of a matrix associated with A related to the graph to be monotone. However, the Fiedler eigenfunction associated with a graphon satisfying (1.3) is not always monotone -some additional conditions really are necessary. (iv) Uniformly embedded graphons where f is concave from 0 to d < 0.5 and constant from d to 1, and in addition, We also mention one simple graphon type for which the more general Assumption 3.5 holds but which is not uniformly embedded: where (x) and r(x) are differentiable boundary functions that are each other's inverse, have slope bounded away from zero on the appropriate domain, and satisfy Note that graphs sampled from graphons of type (v) can be interpreted as graphs where points are sampled from [0, 1] according to a non-uniform distribution, while links are formed according to a simple graphon of type 1.4. If this nonuniform distribution has invertible CDF F , then the associated graphon may be written as w(x, y) = w unif (F (x), F (y)), where w unif is a graphon of type 1.4.
We include these simple examples primarily as illustration. We suspect that most real-world data will be much more complicated and will not resemble graphs generated by any graphon with a simple formula -that is, most datasets will look quite weird and irregular in some way. We also note that most papers on "noisy" or "statistical" seriation have studied small parametric families of random graphs, but we allow a much larger (nonparametric) class of graphons. See Section 1.6.2 for further discussion.

Main results
In the next section, we will present our main new algorithm (Algorithm 1). The first of our main results is the following bound on its error: Theorem 1 (Reconstruction for General Graphons). Fix a graphon w and constant α that satisfy Assumption 1.5. Let G n ∼ w be a graph of size n ∈ N. Then, when Algorithm 1 is executed with parameters as in Equation (3.2) on input graph G n and value α, the output is a permutation "σ" on {1, 2, . . . , n} with error D that satisfies This matches (up to logarithmic terms) the rate found in [36] for the special case of graphons of the form (1.4). However, it turns out that a slightly different algorithm can give much better rates for graphons of that form, and indeed any graphons satisfying the additional assumption: Assumption 1.8 (Sharp Boundaries). Say that a graphon w has sharp boundaries if: 1. There exists some δ > 0 so that for all x, y ∈ [0, 1]. 2. There exists some B > 0 so that for all x, y ∈ [0, 1], Note that this additional assumption holds for some examples of each type given in Remark 1.7. In particular, it holds for type (ii) when q = 0 and for type (iii) when c = 0. We don't know if this condition is close to optimal; see the discussion following Algorithm 6 for an explanation of where it is used and discussion in Section 1.6.4 for an explanation of a similar condition that turns out to be required for a closely-related problem.
We have: Algorithm 6 consists of two parts: first a coarse ordering of a small subset of the vertices is obtained using the method from Theorem 1, then a refinement algorithm (Algorithm 7) is used to obtain an ordering of all vertices with the desired error level. These parts are independent, and the refinement algorithm could be used to improve and extend any coarse ordering of a subset of the vertices.
To be more precise, we will use the definition: Definition 1.9. Fix a function F which takes a finite graph as input and gives an ordering of the vertices as output. Fix a diagonally-increasing graphon w, and let G 1 , G 2 , . . . ∼ w be a sequence of graphs on 1, 2, . . . vertices. Say that the function F is efficient for w if F(G n ) has error less than √ n log(n) 5 w.e.p.
For shorthand, we write "F-Algorithm 6" for Algorithm 6, with step 2 replaced by the assignment: Corollary 1.10. Fix a graphon w that satisfies Assumptions 1.8, let F be efficient for w in the sense of Definition 1.9, and let G n ∼ w be a graph of size n. Fix > 0. When F-Algorithm 6 is executed with parameters given by (4.21) on input G n and value α, the output is a permutation "σ" on {1, 2, . . . , n} with error We can think of two situations in which Corollary 1.10 might be useful: 1. Naturally, other mathematical work may provide algorithms that are efficient in the sense of Assumption 1.9 (see e.g. [36,33]). 2. In some situations we may use expert knowledge. For example, one might send a small random sample to experts who can order them by hand (likely based on features not captured by the graph). The large remainder of the data could be ordered automatically as in Algorithm 6 after Step 2. It is difficult to model expert knowledge precisely, but we know that in practice it is often very useful to be able to incorporate strong (but expensive) evidence about a (small) subsample.

Remark 1.11 (Optimistic Conjecture on Optimal Reconstruction
). We conjecture that the best possible reconstruction error is O( √ n log(n) C ) in the situation of Theorem 1 and O(log(n) C ) in the situation of Theorem 2. However, the arguments in the present paper seem to have no hope of finding the optimal constants C, C .
It is natural to ask when it is possible to attain a "super-small" error that is much less than O( √ n log(n) C ). We conjecture that this can occur in many situations where Assumption 1.8 fails, but also that recovering such a supersmall error would require modifications to our main refinement algorithm. We would be interested in knowing if it is possible to come close to covering all situations in which a "super-small" error occurs by using a single algorithm, but make no conjectures on this question. Remark 1.12 (Dependence on Parameters). We note that Theorems 1 and 2 have assumptions with constants c, d, δ, B that don't appear explicitly in the theorem statements. Recall that both theorems say that a certain inequality occurs w.e.p.-that is, it occurs for all n larger than some random integer N . The distribution of this random integer N depends on the constants c, d, δ, B. In general, N may become very large if these constants are close to the edge of their ranges (e.g. if δ is very close to 0). This dependence is essentially unavoidable. For example, when c is very close to f (0), the graphon is very close to the constant graphon w const (x, y) ≡ c. It is obviously not possible to reconstruct the ordering of vertices given observations from a constant graphon, and it is straightforward to check that one requires at least n (f (0) − c) −0.5 to reliably distinguish between a graphon w and the constant graphon w const (x, y) ≡ c.

Running time
It is straightforward to check that both algorithms have running times that are polynomial in n with parameters as stated. We have not made a serious effort to optimize the parameters for running time, and in several places we make choices in order to make our (already-long) proofs slightly shorter. With that in mind: 1. Algorithm 1 has running time that is O(n 4 polylog(n)), dominated by Algorithm 4. See Remark 2.2 for a quick discussion on how this may be improved to O(n 3 polylog(n)). 2. For the parameters appearing in Theorem 2, Algorithm 6 has running time that is O(n 2 polylog(n)). Furthermore, the running time is dominated by steps 3-6; these can be substantially parallelized. More generally, this O(n 2 polylog(n)) running time holds as long as the parameter p 1 appearing in Algorithm 6 is less than n −0.5 .

Main contributions and related work
We highlight what we view as the main contributions of this paper, in the context of the enormous literature on latent position models: 1. Many of the most-similar previous papers studied parametric families of graphons, often with few parameters (see e.g. [5], [36]). Often, graphs drawn from the family can be seen as slightly perturbed versions of graphs with a highly symmetric structure. The current paper studies a large nonparametric family, and importantly the algorithm does not require the user to identify a specific subfamily. This is important from a practical point of view, since simple-to-describe parametric families such as (1.4) are typically not realistic models and it is often difficult to identify a reasonable model. One consequence of this generality is that the conditions for our main result are longer and more complicated. This may cause our conditions to appear more technical or restrictive than previous work, even though the opposite is true. This is largely a consequence of the fact that it takes more space to write down a collection of inequalities for generic graphons than it does to write down a collection of inequalities for a small number of real-valued parameters. 2. Many previous works on latent position models proceed by first estimating latent positions, then plugging this estimate into the definition of a discrete object of interest (a permutation in the case of seriation, a community structure in the case of e.g. spectral clustering [26]). It is clear that estimating the full embedding is not any easier than estimating the discrete structure, since the discrete structure is a deterministic function of the true embedding. In this paper, we show that there is in fact a large gap in the difficulty of these two problems -estimating the latent positions is sometimes much harder than estimating the discrete structure. We hope that this observation is useful for the design of future algorithms.
In the rest of the section, we give a more detailed look at some specific parts of the literature on latent position models.

Consequences for efficiency of downstream tasks
We note that the problem considered in this paper has the following form: there is some "true" latent position U 1 , . . . , U n of each vertex, and we wish to estimate some explicit and reasonably nice function of these latent positions (in this case, the induced permutation as in Equation (1.1)). It is natural and very common to try to do this sort of task by the following two-step procedure: first finding an estimateÛ 1 , . . . ,Û n of the true latent positions U 1 , . . . , U n , and then plug this estimate into the function (typical algorithms for the closely-related problem of spectral clustering are of this form; see e.g. the famous overview [38]).
The algorithms studied in this paper are not of this two-step form, and this is not an accident. In Section 6, we prove that the best estimateÛ 1 , . . . ,Û n of U 1 , . . . , U n will have error Ω(n −0.5 log(n) −C ) for a broad class of models. This implies that there is no way to recover the error bound in Theorem 2 by estimating the latent positionsÛ 1 , . . . ,Û n and then propagating this error bound to an estimateσ =σ(Û 1 , . . . ,Û n ) of σ true that depends only on these estimated latent positions. This fact was somewhat surprising to us, as it is common to analyze the performance of algorithms on "downstream tasks" such as clustering or ordering using this sort of two-step error bound (see e.g. the recent paper [26], which include such a "downstream analysis" for a closelyrelated embedding problem). Our general observation that two-step procedures may not be optimal (even where they are common in practice) is not new -see e.g. [31] and the references therein.

Seriation
A graph for which the adjacency matrix is a Robinsonian similarity matrix is known as a unit interval graph. The graph seriation problem for unit interval graphs can be solved efficiently using a graph-theoretical algorithm. In particular, in [13], the author gives a linear time algorithm that returns the line-embedded permutation. The algorithm uses three sweeps of LexBFS, a special Breadth-First Search algorithm first proposed in [37]. Corneil's 3-sweep LexBFS algorithm is used as an important subroutine in our algorithm. The matrix seriation problem can also be solved efficiently. either with a combinatorial algorithm (see [23] for an O(n log n) algorithm) or with spectral methods, as proposed in [1].
If the matrix is not exactly Robinsonian, the seriation problem becomes intractable. Previous work on seriation-with-error considers an optimization problem: find a Robinson matrix and a permutation so that the p distance between the permuted matrix and the Robinson approximation is minimized. This problem is NP-hard in general (see [3] for the case p < ∞ and [9] for the case p = ∞). In [10] the authors propose a (deterministic) approximation algorithm for the ∞ case. If the matrix exhibits certain additional nice properties, then the seriation problem can be solved in polynomial time as a quadratic assignment problem, see [22,6]. In [18,25,15], noisy seriation is presented as a convex optimization problem over a polytope related to the set of permutations. An approximate solution is reached using a relaxation of this optimization problem. Finally, [39,19] study related problems in the special case of {0, 1}-valued matrices.
The article [16], like our work, considers a statistical problem where we must recover the ordering associated with some generating process. In [16], the given matrix is assumed to be a permuted version of a Robinson matrix, plus an additive error matrix of Gaussian noise. This approach falls in the general categories of shape constrained regression and latent permutation learning. In [17], the authors use spectral seriation to reconstruct a ranking from a set of pairwise comparisons.
The articles [2,5,14,36] are most similar to the present work. All of them study essentially the same problem as the current paper -reconstructing an ordering based on an observed graph. The major difference is that they all consider graphons that lie in small-dimensional parametric families, all quite similar to our example (1.4). 1 In most cases, it is not clear how their algorithms or analyses could be extended to the more general setting considered in this paper. This is important in practice, since we do not expect many real-life datasets to be generated by any graphon with a simple formula such as (1.4)real graphs are often quite messy!
The article [33] appeared after this paper and studies essentially the same problem. Like this paper, it also considers a nonparametric family of graphons. Its main result is that the popular spectral clustering algorithm gives a consistent estimate of the ordering under certain conditions. The main difference between our papers is that the approach in [33] works for a different class of graphons and achieves a slower rate of convergence.

Testing graphons
Our initial motivation for this problem was finding efficient tests as to whether a random graph was simulated from a graphon that could be embedded (either in a one-dimensional space as in here, or more generally). Naive approaches fall into the trap discussed in 1.6.1.
In [11] and [21], the authors construct a test of line-embeddability. Precisely, they define a function Γ * on the space of graphs so that, for a sequence of random graphs G n ∼ w sampled from a graphon w, we have lim n→∞ Γ * (G n ) = 0 if and only if w is a.e. diagonally increasing. Furthermore, Γ * is obtained from a function Γ that is a testable parameter in the sense of [27], and thus it can be efficiently estimated from small samples. Unfortunately, a problem remained: computing Γ * exactly requires the computation of a vertex ordering that is "optimal," in some sense made precise in [11]. The algorithms from this paper provide an approximate solution to a very similar seriation problem, which seems to be useful in providing a rigorous and efficient method for estimating Γ * ; the details will be contained in future work.

Estimating graphons
Consider a graph sampled from a graphon w as in Equation (1.2) with driving randomness {U i }, {U i,j }. In the present paper, we focus on estimating the ordering σ for which U σ(1) < U σ(2) < . . . < U σ(n) . There is a larger literature on estimating the graphon -the full n by n matrix {w(U i , U j )} n i,j=1 . The papers [20,7,29,30,21] all define and analyze estimators that can be used to estimate graphons (some, especially [7], also apply to much larger classes of random matrices). We summarize some main points on the relationship between these results and ours.
It is clear that good estimates of a graphon give some information about its ordering -if you had a perfect estimate of e.g. a graphon of the form (1.4), it would be trivial to recover a perfect ordering. However, this relationship is not very robust, and breaks down quickly in the presence of even small error. Due to either the weak norm [20,7,29,30] or the slow convergence rate [21], existing graphon estimates can't be used to obtain good order estimates, even with unlimited computational resources.
Although our results are not equivalent, we suspect that our estimates of the ordering may lead to improved graphon estimates in some situations. The two papers with the best bounds on the convergence rates of their estimators, [20,21], both use estimators that seem to be computationally intractable. Furthermore, both are intractable for essentially the same reason: they require the computation of an ordering of the vertices that is "optimal" in a sense that is very similar to our problem. 2 Our improved bounds on the ordering problem may lead to computationally-tractable estimators that come close to matching the efficiency of [20,21].
Although there is no general equivalence between estimating graphons and orderings, it seems likely to us that sufficiently good graphon estimates could be used to obtain consistent order estimates, even when this "translation" is not statistically efficient. Even inefficient estimates could be used as initial steps in Algorithm 6, as discussed in and immediately before Corollary 1.10.
As one final remark, these papers indicate that the rate at which graphon estimation procedures can converge depends on the "smoothness" of the underlying graphon. This suggests that Assumption 1.8 may not be entirely an artifact of our proof technique.

Paper guide
In Section 2, we state the main algorithms studied in this paper. In Section 3, we prove a version of Theorem 1 with weaker conditions; we defer a proof that Theorem 1 actually follows from this result until Appendix A. In Section 4 we discuss why it is possible to beat theÕ( √ n) reconstruction barrier and prove Theorem 2.
In the remainder of the paper, we discuss some auxillary results that are helpful in applying and understanding our main results. In Section 5, we give simple guidelines for finding a "good" thresholding value α. In Section 6, we show that it is generally not possible to beat theÕ( √ n) reconstruction barrier by using an "induced" permutation of the form (1.1).

Algorithms for general graphons
In this section, we state the main algorithms studied in this paper, interspersed with comments on what is "typically" happening at various important stages.

Some further notation
Much of our work will be related to taking powers of graphs and graphons, then thresholding the result: Definition 2.1 (Powers of Graphs and Graphons). For a graph G, denote by A = A(G) its adjacency matrix. We will use A (2) to denote the square of the adjacency matrix. Entry A (2) [i, j] denotes the number of common neighbours of vertices i and j, i.e. the number of vertices adjacent to both i and j.
Similarly, define the "product" w 1 w 2 of two graphons w 1 , w 2 by and define w (2) = w w.
For any graphon w and α ∈ [0, 1], define the thresholded graphon w α by For parameter 0 < α < 1 and graph G = (V, E), define the "thresholdsquare" graph G Similarly, define the thresholded graphon w (2) α by Although our goal is to estimate a permutation σ, most of our calculations will be for pairwise comparison functions F : V 2 → {−1, 0, 1}. The idea is that a comparison F corresponds closely to a permutation γ if F is close to the following function F γ : To go from a comparison function to a permutation, define: when the values of γ F are distinct; when there are ties, break them arbitrarily and use the above formula. We note that, for any permutation σ, we have σ Fσ = σ. We can use these formulae to move between permutations and comparison functions, and we extend notation in the obvious way. To make one frequentlyused example explicit: we say that a comparison function F : V 2 → {−1, 0, 1} has error less than D if there exists a correct permutation σ true so that, for all We will also often define permutations on some subset S ⊂ V . We say that a permutation σ on S agrees with a permutation η on V if F σ (i, j) = F η (i, j) for all i, j ∈ S. Abusing notation slightly, we will also say that σ agrees with η on some D ⊂ S 2 if F σ (i, j) = F η (i, j) for all (i, j) ∈ D (typically, D will be pairs that are sufficiently far apart in the order η).
Finally, for a vertex set V ⊂ N, denote bỹ the list of ordered pairs of vertices. We note that a functionF :Ṽ → {−1, 0, 1} can be extended to a unique antisymmetric function F : V 2 → {−1, 0, 1}; we use this extension without comment.

Algorithms for Theorem 1
Our main algorithm, Alg. 1, uses Alg. 2 to find an initial coarse comparison F based on G (2) α , and then "fills in" the remaining comparisons by taking larger subsamples and merging them by a voting procedure (Alg. 5). This will give a comparison F that agrees with the line-embedded permutation σ true or its reversal σ eurt for all pairs sufficiently far apart, which implies that σ F is close to either σ true or σ eurt .

Algorithm 1: Main Estimation Algorithm
parameters: Sample size m ∈ N, running time t, threshold ζ, and truncation level α.
Most of the work is done in the function SparseSketch (Alg. 2). The most important step of Algorithm 2 is the repeated call to OrderedSubsample (Alg. 3), which samples small subgraphs of G (2) α and then estimates orders on these small subgraphs.
The motivation for repeated subsampling is that, with high probability, the vast majority of these subgraphs typically have a special property that G (2) α does not have: they will be unit interval graphs (see Section 3.3.1). This special property allows us to use an existing efficient algorithm for finding the lineembedded permutation, referred to here as LexBFS, that only applies to unit interval graphs.
Having used this efficient algorithm for our subgraphs, it is then necessary to stitch together their estimated orderings. The function GlobalOrder (Alg. 4) is needed because we expect some of the small samples to be ordered according to the true line-embedded permutation, and others according to its total reversal; they must be aligned to either all agree with the true line-embedded permutation, or all agree with its total reversal.
The number of iterations is chosen such that each pair of vertices appears together in many of the samples. After aligning all the samples, we then simply count how often each pair is ordered 'up' or 'down', and decide the final ordering by majority vote.
OrderedSubsample (Alg. 3) uses a graph algorithm to order the vertices of a small random subgraph of G.
The LexBFS algorithm, which quickly recognizes and orders unit interval graphs, was first described in [13]. The version of the algorithm in that paper doesn't detect if a graph is disconnected, but (since the algorithm is based on BFS) it can be easily modified to do so. More information about unit interval graphs and this algorithm is given below in Section 3.3.1.
GlobalOrder (Alg. 4) is used to globally align the estimated orderings for individual subgraphs.
The sets L (j) and R (j) capture the vertices in sample j with highest and lowest U -values, respectively. Sample size is chosen so that there is sufficient overlap between these sets between samples. If L (i) overlaps with L (j) , then this is an indication that the samples i and j are aligned, and H(i, j) is set to 1, whereas if L (i) overlaps with R (j) , then this indicates that the samples have if Z = 1 then 5: 12: end for 13: Return the function F :Ṽ → {−1, 0, 1} given by opposite alignment, and H(i, j) is set to −1.
When it exists, the function a can be found very quickly by a greedy search (step 9). In practical implementations, it may be better to choose an algorithm that is more robust to small errors (e.g. solve the usual convex relaxation of the problem, then round the solution to the closest value in {−1, 1}).

Remark 2.2.
The cost of Algorithm 4 is dominated by the cost t 2 of constructing the t by t matrix H. We note that the choice of number of samples

Algorithm 4: GlobalOrder
{We will show that w.e.p. such a function always exists.} 10: Return a.
t and their sizes m are driven by two concerns: the total number of samples mt must be at least n 2 polylog(n) to ensure that each pair of vertices appears together in many samples, and the size of the sample m must be small enough that the subsampled graphs have certain good properties with high probability (primarily, they must be Robinsonian).
We expect that in fact subgraphs up to size m = √ n polylog(n) would have these good properties. If true, this would allow us to replace t by t = O(n 1.5 polylog(n)), reducing the computational cost of the algorithm by a factor of n.
Finally, we describe LocalRefinement (Alg. 5), which refines the initial sketch provided by Algorithm 2. We use the notation Algorithm 5 is based on the following observations. Recall that Algorithm 5 takes as input a "rough" ordering from some comparison F . Now consider i, j ∈ V so that U i < U j . Then any vertex k with U k > U j is closer to U j than to U i , so more likely to be a neighbour of j than of i. Thus, we expect that j will have more neighbours than i that are higher up in the ordering than j. Similarly, we expect that j will have fewer neighbours than i that are lower down in the ordering than i. Therefore, we expect to be able to order i, j by counting the number of elements of the neighbourhoods N (i) \ N (j) and N (j) \ N (i) that 8: end if 9: end for 10: Return F . are higher than i, j according to the rough ordering F .

Algorithms for Theorem 2
Algorithm 1 described in the previous subsection gives an ordering with error approximately √ n. Here we describe the iterative refinement algorithm, which sequentially reduces this error.
Algorithm 6 creates a coupled sequence of random graphs G 1 ⊂ G 2 ⊂ . . . ⊂ G k = G. Algorithm 1 is used with input G 1 to obtain an initial ordering of the vertices in G 1 . Then, in each successive step, Algorithm 7 is called to replace the order F i on G i by the more-accurate order F i+1 on the larger graph G i+1 . The iterative step is based on G itself, not G (2) α . All of the work in Algorithm 6 is done in Algorithm 7, the refinement algorithm that is called in the loop. It makes use of the following consequence of Assumption 1.8: if U i < U j , then there is a region R ⊂ [0, 1] so that any vertex with U -value in R can only be a neighbour of one of i, j. If U i and U j are sufficiently far apart, then R is large and will contain many vertices; these will provide the signal that indicates the true ordering of i and j. Of course, we don't know which vertices have U -values in R. But we know that the vertices that provide the signal must be at the extremes of the ordering in the neighbourhood of i or j, so we can use the ordering from the previous iteration to find these vertices. The key idea is that, in each loop, we use our current estimate of σ Algorithm 6: Iterative-Improvement Estimate to improve our estimate of R, then use our improved estimate of R to further improve our estimate of σ.
To make this precise, we introduce some definitions. Given an ordering σ of a set V , a set S ⊂ V , and a parameter c < |S|, we define the sets of the c elements of S with the highest and lowest rank according to σ: Using these and an estimated ordering to approximate the sets R, L, we can then estimate a new ordering of i, j by counting the number of neighbours N (i), N(j) in these approximations and comparing the results. We note that this comparison, occurring in steps 4-25 of Algorithm 7, is more complicated than one might expect from this description. The additional steps allow us to "prune out" and then ignore comparisons that we are not sufficiently sure of, preventing this uncertainty from "infecting" other calculations.

Proof of strengthened version of Theorem 1
In this section, we will prove a slightly stronger version of Theorem 1. Since this section is fairly long and the proof does not follow the algorithm line-by-line, we give a quick guide to the subsections of this section in order. Sections 3.3 and 3.4 are particularly important, as they give the "big-picture" explanation of why the algorithm works and what needs to be checked: 3.1 We state a version of Theorem 1 that holds under weaker conditions, and state those (highly technical) conditions.
3.2 We describe the probability space that will be used for the remainder of the proof. 3. 3 We give a detailed description of several key properties of "typical" realizations of the random graph G, as well as key properties of several derived objects that appear in Algorithm 1, and explain why those properties ensure that Algorithm 1 works. This section also includes technical lemmas showing that these key properties occur with high probability. 3. 4 We give a sketch of the most important parts of our proof, and how it depends on the typical properties described in the previous section. 3.5 With the preliminaries out of the way, we analyze the output of Algorithm 3, the sub-algorithm that orders small subgraphs of G.
3. 6 We analyze the output of Algorithm 4, the sub-algorithm that aligns the orderings of the small subgraphs returned by Algorithm 3 (this alignment is necessary because even "good" outputs of 3 will be randomly aligned with either σ true or σ eurt ). 3.7 We analyze the output of Algorithm 2, the sub-algorithm that calls Algorithms 3 and 4 to obtain an estimated order of the full graph G, and give bounds on the error that hold whp (though these bounds may still be quite large). 3. 8 We complete the analysis of the remaining sub-algorithm of Algorithm 1, which refines the estimate obtained in Algorithm 2. This analysis shows that whp the refined ordering has error bounded as stated in Theorem 3. This completes the proof of Theorem 3.

A stronger result
Rather than proving Theorem 1 directly, we prove a related result with weaker (but more complicated) assumptions. This theorem applies to graphons in general, not just graphons with a uniform embedding. We verify that the stronger assumptions of Theorem 1 imply the more general assumptions presented in this section in Appendix A. While weakening the assumptions in this way will provide a stronger final result, our main motivation was to replace the assumptions of Theorem 1 (which are easy to read but hard to work with directly) with the weaker assumptions of Theorem 3 (which are harder to read but easier to work with in our proof). Before we discuss the weaker assumptions, we give an alternative characterization of {0, 1}-valued graphons. It is not hard to verify (see also the discussion following Definition 2.1 of [12]) that, if w is a Thus, the boundaries define w up to the values exactly on points of the form ( (x), x) and (x, r(x)); of course this collection of points has measure 0, and thus does not influence the distribution of samples from w.
In this section, which leads up to the proof of Theorem 3, we will apply this characterization in particular to w (2) α , which is almost everywhere defined by the boundaries r α and α defined as follows: We now state the definitions required for our weaker assumptions. For a set S ⊂ R d , denote by Vol(S) the d-dimensional volume S. Then define: In other words: if w is uniformly (A, δ)-good at α, then the values of w(x, ·) don't concentrate around α. We will not apply this assumption to w directly; we apply it to w (2) . Next, we use a simple notion of "connectedness" for graphons: For graphons w, to be ( , α)-connected means that the region where w achieves values greater than α contains a strip of width around the diagonal. We will see later that this implies that, asymptotically almost surely, large graphs sampled from the α-thresholded version of such a graphon are connected.
The following "separation" property implies that the neighbourhoods of farapart vertices are easy to distinguish: Diagonally increasing graphs may have many indistinguishable rows -as an extreme example, the complete graph is Robinsonian! The following assumption rules out this sort of near-complete indistinguishability by requiring very different points to have disjoint neighbourhoods: Definition 3.4 (Splitting graphons). Fix 0 < < 1. We say that a {0, 1}-valued graphon w has an -split if for all In graphs sampled from a graphon with an -split, vertices with U i -values that are far apart cannot have any common neighbours. This guarantees that a certain "line-embeddedness" is preserved in the graph and its samples, and a correct permutation can be extracted from the samples.
Note that most of these assumptions are properties of w (2) α , not w (2) . In particular, we do not need that w (2) is diagonally increasing, only that the thresholded {0, 1}-valued graphon w (2) α is. This is an important distinction: it is not true, in general, that the square of a diagonally increasing graphon is itself also diagonally increasing (see Appendix A, Remark A.8 for a simple example). Even if w (2) is itself not diagonally increasing, there still may be a value of α for which w (2) α is diagonally increasing. Finally, we define the set of parameters for our main algorithm that gives the desired result for Theorem 1.
We are now ready to state the stronger theorem.
Theorem 3 (Ordering on Large Sample). Fix a graphon w and constant α that satisfy Assumption 3.5. Let G n ∼ w be a graph of size n ∈ N. Then when Algorithm 1 is executed with parameters as in (3.2) on input graph G n and value α, then the output is an ordering "σ" on {1, 2, . . . , n} with error that satisfies w.e.p.

Construction of probability spaces
We set some notation that will be used throughout the remainder of Section 3. Fix notation as in the statement of Theorem 3.
We will use these two sequences to couple all of the graphs described in this section as follows: for any graph of size n ∈ N sampled from a graphon in this section, we will always assume that the graph is sampled by using the elements of this fixed pair of sequences with indices 1 ≤ i, j ≤ n, in the representation (1.2). In particular, if G, G ∼ w are graphs of sizes n < n , then this coupling gives a natural embedding G ⊂ G of the graphs. For fixed n, we denote by σ true the line-embedded permutation associated with the sequence {U i } n i=1 , and denote by F true the associated comparison function. Permutation σ eurt is the total reversal of σ true .
Denote by H α ∼ w (2) α a "correct" thresholded graph, drawn from w (2) α (the thresholded square graphon as in Definition 2.1) using the same random vari- Recall the definition of the threshold-square graph G 2}. Finally, we set F = F 1 ∩ F 2 to be the σ-algebra generated by all of these random variables.

Proof sketch and main estimates for bad sets
In this section, we will show that the graphs G, G (2) α resulting from the random draw {U i }, {U i,j } will have certain good properties with extreme probability, and sketch out how these properties are used to guarantee that the algorithm succeeds.

Unit interval graphs and the LexBFS Algorithm
The heart of Algorithm 3 is the 3-sweep LexBFS unit interval graph recognition algorithm of [13]. We recall the basic properties of unit interval graphs and this algorithm that will be used. A graph G is a unit interval graph if and only if there exists a total order of the vertices so that the adjacency matrix A of G with respect to that order is diagonally increasing (where we set the diagonal elements A i,i = 1). We will refer to such orders as interval orders. If the graph is connected, then the interval order is unique, up to its total reversal and to permutation of any duplicated neighbourhoods. More precisely, for a total order of the vertices represented by a permutation σ, let A σ be the matrix defined as: Then σ is an interval order if and only if A σ is diagonally increasing. If G is connected, then A is irreducible. In that case, if σ and τ are both interval orders, then either A τ = A σ (which allows for the possibility that some rows are identical), or A τ = A σ trev , where σ trev is the total reversal of σ.

Properties of latent variables
Algorithm 1 is based on the assumption that the square thresholded graph G (2) α has similar properties to H α , and that the vertices in V are fairly uniformly spread over the interval [0, 1]. This, together with our assumptions on w (2) and w (2) α will then guarantee that small samples from G (2) α have similar good properties. This will imply that, with high probability, the "sketch" returned by Algorithm 2 has no incorrect comparisons. In particular, we want to show that the following properties hold w.e.p.: That is, on a coarse scale, G (2) α looks like it was sampled from w (2) α . 2. Any random uniform subsample S ⊂ V that is independent of F and has |S| ≥ log(n) 5 will induce a connected graph G n , there are many vertices that are connected to i but not j in G Note that these are all purely properties of {U i }, {U i,j } (even property (2), which is about typical draws of S after observing the σ-algebra F). When these conditions hold, we know that when i, j appear in the same subsample S of size |S| = log(n) 5 , the following will all occur with probability quite a bit larger than 1 2 : α (S) will agree with their "correct" neighbourhood in H α (S). (ii) The subgraph G (2) α (S) will be connected. (iii) If U i and U j are significantly far apart, the neighbourhoods of i, j can be distinguished in G (2) α (S), and so in particular it is possible to compare them locally.
When these three events all occur for a given i, j ∈ S, we will have that G (2) α (S) is a connected unit interval graph whose interval orderings agree with one of the correct line-embedded permutations of w (2) α (S) for all vertices that are sufficiently far apart. The interval ordering can be retrieved with a standard, efficient graph-theoretic algorithm as discussed in Section 3.3.1; moreover, the algorithm will detect when G (2) α (S) is not a connected unit interval graph. Thus, when (i)-(iii) occur, we will be able to reconstruct the correct ordering of sufficiently distant pairs i, j ∈ S.
A precise statement of Conditions (1)-(4) above will be given in Definition 3.6, and Conditions (i)-(iii) will be given in Definition 3.8 in the following subsection. First we introduce the necessary notation. For fixed n and α, define the "bad set" Thus, B(n, α) is the subset of [0, 1] 2 where w (2) takes value very close to α.
We define the related "bad witnesses" of a vertex 1 ≤ i ≤ n by We will see that w.e.p.
. We also define the collection of "good witnesses" for a subset S ⊂ {1, 2, . . . , n} and pair of vertices i, j ∈ S by: When S = {1, 2, . . . , n}, we will drop the subscript. These are the vertices that will allow us to distinguish i, j.
Finally, we define a cover of [0, 1] which will be useful in showing that subgraphs of G Note that these intervals form a cover of [0, 1]. Moreover, if U i ∈ I k and U j ∈ I k ∪ I k+1 , then H α (i, j) = 1. This implies that, for any set S ⊂ V , We now give formal definitions of the collection of bad events: α and H α be as defined above. Moreover, let R, be as in (3.5). We define: The event A c 1 ∩ A c 2 implies property (1) above, while A c 3 , A c 4 and A c 5 imply properties (2), (3) and (4), respectively. The bad events mostly involve random variables that are the sum of the independent variables {U i },{U i,j }, and by straightforward application of standard concentration inequalities we can show that w.e.p. none of these bad events occur. The lemmas and proofs establishing this fact can be found in the Appendix. Here, we only state their corollary: Proof. This follows from Lemmas B.3, B.4, B.5, B.6, and B.7.
We can therefore assume that our graph G and the derived graph G (2) α have the properties we need for the algorithm to succeed.

Properties of subsampled graphs
In Algorithm 2, small subgraphs are sampled repeatedly from the square thresholded graph G (2) α . In this section, we show that, if A c holds and thus G (2) α has properties as expected, with high probability these samples will have similar good properties. To this end, we define bad events for the sample, and will then proceed to bound the probability that they occur.
Let S be a uniformly chosen subset of V , of size m = log(n) 5 . We denote by the collection of elements with no "bad" comparisons. Choosing indices to roughly match the "global" bad events A 1 , . . . , A 5 , we define: Definition 3.8 (Bad Events for Samples). Let graphon w, value α, random variables {U i }, {U i,j }, and graphs G (2) α and H α be as defined above. Moreover, let R, {I k } 4R−3 k=0 be as in (3.5). Finally, we let S ⊂ V and m = |S|.
We think of these events as functions of S, viewing the variables {U i } and {U i,j } that determine the graph G as "fixed." We have seen that A c holds w.e.p. in Corollary 3.7. We will see here that events A i are also all unlikely. The proofs are again simple, and are deferred to Appendix B.2. We need slightly more detailed results (e.g. that these events remain rare even conditional on the occurrence of certain vertices), so we keep the lemma statements here. Lemma 3.9. Let S be a uniformly chosen subset of V , of size m = log(n) 5 . Then on the F-measurable event A c , Furthermore, for any fixed p, q ∈ V (G) satisfying (U p , U q ) ∈ B(n, α), we have where again both probabilities are on A c .  .7).
and also |S \ S | ≤ log(n) 3 on A c . Moreover, for any fixed p, q ∈ V (G), where again both probabilites are on the event A c .
We see that a number of other bad events are "almost" contained in A 3 .
The next lemma shows that w.e.p., for all pairs of vertices i, j that are sufficiently far apart, we can distinguish i and j since they have distinct neighbourhoods in G (2) α . Lemma 3.12. Let S be a uniformly chosen subset of V , of size m = log(n) 5 .
Similarly, it follows directly from A c 5 that the number of vertices between i, j in a uniformly chosen subset S is roughly proportional to |U i − U j |: That is, w.e.p. for all i, j ∈ S, |{k :

Proof sketch
In the next sections, we will prove the correctness of all the Algorithms that are used to prove Theorem 3. The proofs involve four major ideas, which we outline here. We will assume that A c holds, so G and G (2) α have good properties.
• Most of the small samples can be correctly ordered. With probability 1 − o(1), the random subgraphs of size m = log(n) 5 sampled in Algorithm 2 have certain good properties (e.g. they are connected, Robinsonian, and have many vertices with distinct neighbourhoods); when this occurs, Algorithm 2 will return an ordering that agrees with σ true or σ eurt for all vertex pairs sufficiently far apart. • The ordered small samples can be aligned. At this point, we have many subsamples that each individually agree with either σ true or σ eurt . The next step is to "align" these subsamples by reversing some of them, so that either the entire collection agrees with σ true , or the entire collection agrees with σ eurt . The main observation is that it is easy to align a particular pair of samples if they share vertices with latent position close to either 0 or 1; it turns out to be possible to align all samples by greedily aligning pairs in this way. • Aligned sketches give a coarse ordering. Once the small samples are aligned, say with σ true , then a majority vote is used in Algorithm 2 to determine a sketch, or coarse ordering, of all the vertices, represented by a comparison F . The number of small samples taken is such that w.e.p. each pair of vertices is sampled many times, and in the majority of the samples containing this pair, the order agrees with σ true . Thus, pairs of vertices that are sufficiently far apart will be ordered according to σ true in the vast majority of the samples in which they occur. • Coarse ordering is refined using good witnesses. Finally, in Algorithm 5, the comparison F returned by Alg. 2, is refined. This refinement is based on the good witnesses of the vertices. Since w (2) α is diagonally increasing, and G (2) α is very similar to a sample from w (2) α , we can use these good witnesses to infer the ordering of two vertices s, t. Namely, if σ true (s) < σ true (t), i.e. if s comes before t in the true ordering, then t will have more neighbours than s higher up in the ordering, while s will have more neighbours lower in the true ordering. Thus, by considering the vertices that are neighbours of s but not of t, or vice versa, and using the coarse ordering as a proxy for the true ordering, we can infer the correct ordering of s and t.
We note that the organization of this description doesn't line up exactly with the organization of Algorithm 1. This is somewhat unavoidable -one step of the proof may apply to many different parts of the algorithm, and the length of a part of the algorithm does not correspond closely to its importance.

Analysis of Algorithm 3
In Algorithm 3, a relatively small subgraph S of size m = log(n) 5 is sampled from G (2) α . Algorithm 3 applies a graph algorithm to determine whether the sample is a unit interval graph. If the algorithm succeeds, it will return an interval ordering. We prove in this section that, if no bad events occur, then with high probability the algorithm will succeed and give an ordering which agrees with one of the two correct line-embedded permutations of the vertices.
Throughout this section, we will assume that A c holds, and study the properties of the subset S on this event. We will not condition on (A 3 ∪ A 4 ∪ A 5 ) c , but we recall here that, by the lemmas in Section 3.3.3, this event holds w.e.p.. Since this holds w.e.p., we will be able to use the following properties of S when required in our proofs, incurring only a negligible penalty in our probability estimates each time: Note that the second and third property above follow from the event (A 3 ) c , together with Lemma 3.11. The fourth property follows from our assumption that we are on A c : If A c 1 holds, then this implies that discrepancies between G (2) α (S) and H α (S) can only occur between vertices i, j where i ∈ B(n, j, α), and, by definition, S does not include any such pairs.
We do not assume that we are on (A 1 ) c , and do not use this event freely in the following calculations. We do this because Lemma 3.9 only implies that (A 1 ) c occurs with probability going to 1, but does not show it holds w.e.p. Algorithm 3 is called many times in Algorithm 2, and it may indeed happen that A 1 occurs for a small number of sketches.
We will show that the ordering returned by Algorithm 3 agrees with σ true or σ eurt for all pairs of vertices that are sufficiently far apart, and hence we define: Definition 3.14. Given S ⊂ V (G), a permutation σ of S, and a permutation τ of V , we say that σ agrees with τ to precision level d, if, for all i, j ∈ S so that α and parameter m succeeds, and furthermore returns a total order σ of the sampled set S which agrees at precision level log(n) −1 with σ true or with σ eurt . Then Proof. Algorithm 3 starts by sampling S ∼ Unif({T ⊂ V : |T | = m}). By (3.8), we have that G (2) α (S) = H α (S) is connected w.e.p., and thus G (2) α (S) is a connected unit interval graph. On this event, Algorithm 3 does not fail, and returns an interval ordering σ of G (2) α (S) = H α (S). Since H α (S) is a subgraph of the interval graph H α , the interval ordering of H α (S) agrees with σ true or σ eurt ; assume wlog that it agrees with σ true . This means that σ agrees with σ true , except possibly for pairs of vertices with identical neighbourhoods in G 3 . This implies that i and j have distinct neighbourhoods in G (2) α (S) and thus also in G (2) α (see discussion in Section 3.3.1).

Consequently, for pairs
In other words, σ agrees with σ true at precision level log(n) −1 , completing the proof.
Algorithm 3 is called many times in Algorithm 2, to produce a collection of sets and orderings (σ (j) , S (j) ) t j=1 . Since B 1 may not occur w.e.p., some of the set-ordering pairs in this collection may not agree with σ true or σ eurt at the required level of precision. We will need a minimal amount of agreement to prove that Algorithm 4, the alignment algorithm, works correctly based on all samples. Therefore, we now define a weaker version of agreement between orderings, and show that this weaker property holds w.e.p.. Note that, by part (4) of Assumption 3.5, we have that r α ( /2) < α (1− /2), and thus L ∩ R = ∅. This implies that any permutation σ can roughly agree with at most one correct permutation. Proof. Consider a run of Algorithm 3 that succeeds; let S be the returned subset, and let σ be the returned permutation (note that σ represents an interval order of G (2) α (S)). Let j ∈ S be so that σ(j) = 1. By (3.8), w.e.p. the neighbour set of j has size |{q ∈ S :

Definition 3.16. Define
Since σ is an interval ordering of G (2) α (S), all neighbours of j must form an interval I in σ, and L(S, σ) ⊂ I. Since σ is an interval ordering, this also implies that all vertices in I (and thus L(S, σ)) are mutually adjacent in G (2) α . We have shown that w.e.p. L(S, σ) is a clique, but have not yet shown that even a single element s ∈ L(S, σ) has a small associated latent variable U s . Before starting the argument that this is true, we give some basic facts about S . By (3.8), w.e.p. we have both that G (2) α (S ) = H α (S ), and that H α (S ) is connected. Since H α is a unit interval graph, so is H α (S ). Since σ is an interval order of G (2) α (S), σ| S is an interval order of G (2) α (S ) = H α (S ). Therefore, σ| S on S must agree with σ true or σ eurt . We assume wlog that it agrees with σ true . The next step is to check that S has some intersection with the collection L = s ∈ S : U s ≤ 1 log(n) (3.9) of vertices with "small" latent variables. From (3.8), we have w.e.p. the inequalities so |L ∩ S | ≥ |L| − |S \ S | > 0 for large enough n.
Having shown thatL ∩ S = ∅, we consider s ∈L ∩ S and show that it is in L(S, σ). For any k ∈ S with σ(k) < σ(s), one of the following must hold: 1. σ true (k) < σ true (s). In this case U k < U s , so k ∈L. From (3.10) above, we have that |L| ≤ log(n) 4 + 2 log(n) 3.5 .
Since σ| S agrees with σ true , we must be in one of the following subcases: (a) k ∈ S \ S . As argued above, |S \ S | ≤ log(n) 3 .
Combining these three cases, we see that there are at most (log(n) 4 +2 log(n) 3.5 ) + log(n) 3 + 2 log(n) 4 vertices k with σ(k) < σ(s). For large n, this amount is smaller than 4 log(n) 4 = ζ, and so σ(s) ≤ ζ. This implies s ∈ L(S, σ), and so we have shown that We are now ready to show that L(σ, S) ⊆ L. For n large enough, log(n) −1 < r α ( /2), and thus, by definition,L ⊂ L. To complete the argument, let s ∈ (L ∩ S ) ⊂ L(σ, S), and consider any other vertex t ∈ L(σ, S). As argued above, L(σ, S) is a clique in G (2) α , so we know that s, t are connected in that graph. Since s ∈ S and t ∈ S, we have that t ∈ B(n, s, α). Therefore, on where the last inequality holds for all n sufficiently large. Therefore, t ∈ L. Collecting our assumptions throughout the proof of the theorem, we have shown that w.e.p. L(σ, S) ⊂ L on A c . An analogous argument shows that, under the same assumptions, R(σ, S) ⊂ R. Thus, the result follows from Corollary 3.7.

Analysis of Algorithm 4
In the previous section we showed that w.e.p. the output (σ, S) from Algorithm 3 roughly agrees with either σ true or its total reversal, σ eurt . In this section we show that the function a which is the output of Algorithm 4 correctly identifies whether the alignment is with σ true or with σ eurt . Suppose Algorithm 4 is called with input (σ (j) , S (j) ) t j=1 , and parameter ζ = 4 log(n) 4 . Then on the F-measurable event A c , (3.13) Proof. Recall that, for 1 ≤ j ≤ t, L (j) and R (j) as computed in step 4 of Algorithm 4 correspond to the sets L(S (j) , σ (j) ) as in Definition 3.16. By Lemma 3.17, the condition t j=1 {σ (j) roughly agrees with σ true or σ eurt } (3.14) holds w.e.p. on A c . By Definition (3.16), if σ (j) roughly agrees with σ true , then L (j) ⊂ L = {i : Similarly, if σ (j) roughly agrees with σ eurt then L (j) ⊂ R and R (j) ⊂ L (as argued, by Assumption 3.5, σ (j) can roughly agree with at most one of σ true , σ eurt ). and consequently H(j, k) is set equal to 1 in step 8 of Algorithm 4, then σ (j) and σ (k) agree with the same correct permutation ({σ true or σ eurt }), and thus a * (j)a * (k) = 1. Similarly, if L (j) ∩ R (k) = ∅ or R (j) ∩ L (k) = ∅, then a * (j)a * (k) = −1. Thus, condition (3.14) implies that the following holds for the function H computed in Algorithm 4 w.e.p.: That is, for each pair of samples j, k for which H(j, k) is non-zero, H correctly identifies whether j and k are aligned with the same true ordering or with opposite orderings. Now consider the graph G H with vertex set V H = {1, 2, . . . , t} and G H (j, k) = |H(j, k)|. That is, j and k are adjacent in G H if sketch j and k could be aligned by the algorithm. In light of (3.15), to prove the result, it is enough to prove that G H is connected w.e.p..
As an extension of (3.9) in Lemma 3.17, define S (j) = {i ∈ S (j) : ∀j ∈ S (j) , i / ∈ B(n, j, α)}, and By (3.11) in the proof of Lemma 3.17, we have that w.e.p. both Together with an analogous argument forR, we have w.e.p. that either Note that in the above we give a two-case definition for the sets P (j) , Q (j) ; we clarify that we always choose the choice that makes both non-empty when such a choice is possible, choosing arbitrarily otherwise. By Lemma 3.17, w.e.p. there will be a unique choice making both non-empty.
There is an edge (i, j) in G H as long as (P (i) ∪ Q (i) ) ∩ (P (j) ∪ Q (j) ) = ∅. We make the following claim: w.e.p. for any a, b ∈ V so that U a , U b < log(n) −1 , there exists 1 ≤ ≤ k so that {a, b} ⊂ P ( ) ∪ Q ( ) . Before proving the claim, we show that this will complete the proof. For any 1 ≤ i, j ≤ k, take a ∈ (P (i) ∪Q (i) ) and b ∈ (P (j) ∪ Q (j) ). Let be so that {a, b} ⊂ P ( ) ∪ Q ( ) . Then there are edges (i, ) and ( , j) in G H . This shows that G H is connected.
It remains to show that our claim is true. Fix a, b ∈ V so that U a , U b < log(n) −1 . Let β = inf{w (2) B(n, α), and thus a ∈ B(n, b, α) and b ∈ B(n, a, α).
To show that a, b ∈ P ( ) ∪ Q ( ) , by 3.16 it suffices to show that a, b ∈ S ( ) . We have that Moreover, on A c , |B(n, a, α) ∪ B(n, b, α)| ≤ 2 √ n log(n) 2 , and thus on A c , and for n large enough. Thus the expected number of sketches for which {a, b} ⊂ S (j) is at least t(m 2 /(2n 2 )) ≥ log(n) 11 . Since the sets {S (j) } are independent, by the Chernoff bound we obtain that, w.e.p. at least one of the sketches contains both a and b. This completes the proof.

Analysis of Algorithm 2
We will give a bound on the output of Algorithm 2. Define the sets For a comparison F , define the event: that F gives the correct comparison for all pairs of sufficiently-distant vertices in Q 2 . We have: α be as defined in Equation 2.1, and let F be the output of Algorithm 2, on input G (2) α and parameters m = log(n) 5 and t = (n log(n)) 2 . Then on the F-measurable event A c , Proof. Let {(S (j) , σ j )} t j=1 be the sequence of subsets and preorders appearing immediately after the alignment correction in Step 12 of Algorithm 2. Let C be as in Algorithm 3 at the end of its run. Let c 1 = log(n) 7 , and let N :Ṽ → N be the function counting the number of times vertices s and t occurred together in a sample, that is, N (s, t) = |{j : s, t ∈ S (j) }|.
Next, consider a pair (p, q) ∈ Q 2 , so (U p , U q ) ∈ B(n, α), and let S be a uniformly-chosen size-m subset of {1, 2, . . . , n}, as sampled by Algorithm 3 when it is called for the i'th time in Step 3 of Alg. 2. If p, q ∈ S, then p, q ∈ S (i) if Alg. 3 succeeds. By Lemma 3.15, on the event A c we have e (i) (p, q)|F] ≥ 0.9(m/n) 2 t = 0.9 log(n) 12 ≥ 2c 1 .

By the Chernoff bound, on
Let B 3 be the event that, when Algorithm 4 is called with input ( in step 10 of Algorithm 2, the returned function a agrees with one of the true functions a * or −a * as defined in the statement of Lemma 3.18. By Lemma 3.18, Considering the event B 3 holds, we assume without loss of generality that a = a * for all arguments. This implies that, after the realignment in step 10 of Algorithm 2, σ (i) roughly agrees with σ true for each i = 1, . . . , t. Recall the event B 1 as defined in Lemma 3.15; we define B 1 (i) to be the event that B 1 holds for the call to Algorithm 3 that produced the pair (S (i) , σ (i) ).
Next, fix (p, q) ∈ Q 1 , so |U p −U q | ≥ log(n) −1 , and assume without loss of generality that σ true (p) < σ true (q). On the event B 1 (i), σ (i) agrees at precision level log(n) −1 with σ true . Defining f (i) (u, v) = e (i) (p, q)1 B 1 (i) and applying Lemma 3.15, we have for large enough n, 9N (p, q) on A c for all n sufficiently large. Applying the Chernoff bound and then (3.19), on A c . This event is exactly the complement of B 2 (F ), so this completes the proof.

Analysis of Algorithms 5 and 1 and proof of Theorem 3
Define the set The following lemma shows that Algorithm 5, as called by Alg. 1 and run with the correct input and parameters, gives as output a comparison function that correctly orders all pairs in Q 3 , i.e., all pairs that are sufficiently far apart.

Lemma 3.20 (Correct refinement).
Let w be a graphon and α ∈ (0, 1) be so that Assumptions 3.5 hold, and let G ∼ w be a graph of size n. Consider a run of Algorithm 1 with parameters as in (3.2) on input G, α; let F be as it appears on line 3 of Algorithm 1. Then Proof. We assume now that A c and B 2 (F ) both hold. Fix p, q ∈ V , and assume without loss of generality that U p < U q , so F true (p, q) = 1. Let D p,q and D q,p be as computed in Alg. 5, namely Let B = B(n, p, α) ∪ B(n, q, α) and c 2 = 5 √ n log(n) 2 . Since A c 2 holds, |B| ≤ 2 √ n log(n) 2 < c 2 /2. We next wish to show that F (p, q) = 1 for all pairs p, q with max(|N (p) \ N (q)|, |N (q) \ N (p)|) > 2c 2 . We begin with the case |N (p) \ N (q)| > 2c 2 . In this case, (N (p) \ N (q)) \ B = ∅. For each v ∈ (N (p) \ N (q)) \ B, we have that G (2) α (q, v) = 0, and, since v ∈ B(n, q, α) and we assume A c holds, H α (q, v) = 0. Thus, by Conditions 3.5, By a similar argument, So v is a neighbour of p but not of q in H α . Since H α is diagonally increasing, this implies thatU v < U q , and so σ true (v) < σ true (q), so F true (v, q) = 1.
In our first case, we have just shown that F (v, q) = 1 for all v ∈ (N (p)\N (q))\ B satisfying F true (v, q) = 1. Moreover, by our assumption

Reconstruction of line-embeddings
Therefore in this case, D p,q > c 2 and so F (p, q) = 1. By essentially the same argument, in the case |N (q) \ N (p)| > 2c 2 , we have D q,p < −c 2 . Combining these two cases, we have that and thus max(|N (p) \ N (q)|, |N (q) \ N (p)|) > 2c 2 . Applying Inequality (3.23), this implies completing the proof.
Finally, we check that σ F is never much less accurate than F : Lemma 3.21. Let F be any comparison that agrees with a permutation σ true at precision level d. Then σ F agrees with σ true at precision level 2d.
Proof. Assume wlog that σ true (i) = i for all i. Then for j > i + 2d, we have The proof of Theorem 3 now follows directly from the lemmas. Let σ be the result returned by Algorithm 1, and set Q 3 as above, and Define: to be the event that σ is correct for all vertices that are sufficiently distant. We have:

Lemma 3.22 (Merge Correctness).
Let w be a graphon and α ∈ (0, 1) be so that Assumptions 3.5 hold, and let G ∼ (n, w). Let σ be the output of Algorithm 1, run with parameters as in (3.2) on input G, α. Then Proof. Let F be the output of Alg. 2 as called in line 2 of Alg. 1. Let F be the output of Alg. 5 when called in line 3 of Alg. 1. By Lemma 3.20, if B 2 (F ) and A c both hold, then F agrees with the true ordering on Q 3 . Applying Lemma 3.21, this implies Applying Corollary 3.7 and Lemma 3.19 to this inclusion gives We have shown that all pairs in Q 3 are ordered correctly w.e.p.; now we show that this implies all pairs in Q 4 are also ordered correctly w.e.p.. Suppose that (p, q) ∈ Q 4 and U p < U q . By definition of Q 4 , Putting these two inequalities together, on A c we have Therefore, by Lemmas 3.20 and 3.21, each pair (p, q) is ordered correctly on A, and so by Inequality (3.8) and Corollary 3.7, This completes the proof.
Theorem 3 now follows immediately from Lemma 3.22.

Iterative error reduction
In this section, we discuss and analyze Algorithm 6, and prove Theorem 2. Throughout this section, we fix a graphon w that satisfies Assumptions 1.8 and 3.5, and use the constants introduced in those assumptions. We assume that the input graph G being analyzed is sampled from G ∼ w and we fix the associated notation from this choice as in the start of Section 3.2; for example, we denote by {U i }, {U ij } the random variables used to construct G in representation (1.2). Finally, we let other notation follow as in Algorithm 6; for example, we let {B j } be the Bernoulli random variables introduced in step 1, and sets V i be the sets constructed in steps 2 and 5. Let We observe here that, conditional on {B j }, we have G i ∼ w is a random graph of size |V i | that is sampled from w.

Remarks 4.1 (Strengthening of Theorem 2). Just as Theorem 3 is a stronger version of Theorem 1, our arguments can be used to give other versions of Theorem 2.
As there are two interesting versions, and both can be described entirely in terms of a small number of simple substitutions, we describe these substitutions in this remark rather than copying large amounts of nearly-identical text: 1. Non-Uniform Graphons: All references to Assumption 1.5 in the theorem statement and proof can be replaced by references to the (strictly weaker) Assumption 3.5.

Other Initial Sketches:
Step 2 of Algorithm 6 calls Algorithm 1, but in fact any algorithm can be used for this initial sketch. The conclusions of Theorem 2 remain true as stated if all of the following substitutions are made: • Replace the call to Algorithm 1 in step 2 of Algorithm 6 by a call to another algorithm, which we call OtherSketch.
• Replace Assumption 1.5 in the theorem statement with the following two assumptions: (i) part (3) of Assumption 3.5, and (ii) a new assumption, which we will call Assumption OS, that the output of OtherSketch satisfies the error bound appearing in Theorem 1.
• In the proof, replace all references to Theorem 3 by references to Assumption OS.

Algorithm 7: basic concepts
We start by discussing Algorithm 7, which builds a new ordering σ 2 on V 2 by looking at the edges in G 2 and the old ordering σ 1 on V 1 ⊂ V 2 . Our main goal is to show the following: if Algorithm 7 is called with the right choice of parameters C 1 , C 2 , C 3 , and σ 1 agrees with σ true at certain precision level d 1 , then σ 2 agrees with σ true at a finer precision level d 2 < d 1 . We now outline the proof and define the basic concepts. As in earlier informal descriptions, we elide the distinction between a permutation and its total reverse. First, we need to define the boundaries where w drops to zero. Let δ be as in Assumption 1.8, and define new boundary functions r δ , δ : [0, 1] → [0, 1] by: These definitions will replace the very similar boundaries defined in (3.1), which were defined with respect to the square w (2) . By Assumption 1.8, w(x, y) = 0 if y > r δ (x) or y < δ (x).
To start the heuristic analysis, assume that σ 1 is an ordering of V 1 which agrees with σ true at precision level d 1 . We now show how σ 1 is used in steps 1-12 of Algorithm 7 to find an ordering σ 2 of V 2 \V 1 which agrees with σ true at smaller precision level d 2 , and has certain other helpful properties to be specified later.
Assume without loss of generality that U i < U j , so we have that σ true (i) < σ true (j). Since w is diagonally increasing, From Assumption 1.8 we have that Assume first that and thus where the second inequality holds for all n large enough that log(n) > 2 B . Let j). Because of this "sharp" distinction between w(U i , x) and w(U j , x), the region I true is very informative in distinguishing the ordering of i, j. Moreover, many points in V 1 will have latent positions in I true (i, j) (roughly p 1 d 2 n log(n) −1 points on average).
Step 2 of the algorithm computes the set R(i, j), defined as the C 1 neighbours of either i or j in V 1 that are ranked highest according to σ 1 . The idea behind the algorithm is that R(i, j) is a small set that must contain many points with latent positions in this strongly-distinguishing set I true (i, j); when this works, we expect the vertex with larger value U j to have significantly more neighbours in R(i, j).
The main motivation behind our definition of R(i, j) is thus to include the entire "signal" in I true , while including as few additional "noisy" vertices as possible. We will prove the following estimates, which make this intuition rigorous:

Large Signal: Denote by
the set of vertices that strongly distinguish between i and j on the right. We will show that Dist R (i, j) is relatively large. In particular, we show in Lemma 4.2 below that w.e.p. |Dist R (i, j)| ≥ p 1 d 2 n log(n) −2 .
As pointed out earlier, Dist R (i, j) ∩ N (i) = ∅, so all vertices in R(i, j) ∩ Dist R (i, j) are neighbours of j and not of i. This leads to the definition: We will show in Lemma 4.3 that, for appropriately chosen C 2 , w.e.p.
Moreover, we show in Lemma 4.4 that w.e.p.
2. Small Noise: There are many vertices k ∈ R(i, j) that are not in the "high-signal" region Dist R (i, j). Due to the fact that G is sampled from a diagonally increasing graphon, which satisfies inequality (1.3), we expect to see at least as many neighbours of j as neighbours of i in R(i, j) \ Dist R (i, j), but there may be very small (or no) signal in this discrepancy. Indeed, the expected discrepancy is exactly 0 for our test-case graphon (1.4). Thus, we view these edges as essentially adding noise to our comparison. We define We will show in Lemma 4.5 below that, w.e.p. Noise R (i, j) < C 2 . In particular, this noise term will always be small compared to the signal term in Inequality (4.6). When these high-probability events all occur, the function F (2) as defined in Steps 3-9 of Algorithm 7 agrees with σ true Finally, L, Dist L , Signal L , Noise L are the obvious analogues of R, Dist R , Signal R , Noise R on the left, and the obvious analogues to the above bounds hold by the same arguments for the case where (4.1) does not hold (that is, where r δ (U j ) − r δ (U i ) < δ (U j ) − δ (U i )). Therefore, w.e.p. the ordering σ 2 on V 2 \ V 1 obtained in Step 12 of the algorithm agrees with σ true at precision level d 2 .
The second part of the algorithm (steps 13-24) then extends this ordering to all of V 2 .
p. there will be many neighbours k of j with latent position U k greater than that of any neighbour of i. Thus the top elements of N (i) and N (j) in V 2 \ V 2 , ranked according to σ true , will differ significantly in rank. Since σ 2 is a good approximation of σ true , this will also be true for these same elements when ranked according to σ 2 . Thus, as will be made precise in Lemma 4.7, the ordering σ as returned by the algorithm will agree with σ true at precision level d 2 log(n) 2 .

Algorithm 7: correctness
We are now ready to prove the statements in the previous section. We retain the same notation, and now give conditions on the parameters which we assume to hold throughout the remainder of this section: Note that the definition of C 1 , C 2 , C 3 as given above agrees with the definition of these parameters in Step 5 of Algorithm 6 in its first iteration (with i = 1). The first three conditions are satisfied by our choice of the sequence p i , d i in Equation (4.21). We use the following immediate consequences of (4.9): log(n) 13 n C 1 ≤ p 1 n log(n) −1 + 1, and Moreover, we assume throughout that n is large enough so that log(n) exceeds any of the constants appearing in this section.
For the correctness proof of Algorithm 7, we define certain bad events on the variables U i and U i,j , and show that w.e.p. they do not occur. Let On the event A c 6 , the number of vertices in each interval is concentrated around its expected value. The event is a property of variables U i , and is closely related to the event A 5 as given in Definition 3.6. It is a straightforward consequence of Chernoff's inequality and a union bound that A c 6 occurs w.e.p. We will also need this kind of nice behaviour for all our sampled sets V i . For a given p, let S = S(p) ≡ {i : B i ≤ p} and define By the Chernoff bound, for any given sequence p = p(n) ∈ (0, 1) satisfying lim sup n→∞ − log(p(n)) log(n) < 1, (A 6 (p)) c holds w.e.p. This also means that it holds simultaneously w.e.p. for any sequence p 1 , p 2 , . . . , p k , as long as k grows at most polynomially quickly in n.
We can use A 6 to establish bounds on the size of Dist R (i, j), where i and j are so that U i > U i +d 2 . We assume without loss of generality that (4.11) Lemma 4.2. W.e.p. for all i, j satisfying (4.11), Proof. Let i and j as stated. By definition, |I true (i, j)| = d 2 log(n) −1 , and so by the choice of parameters as given in (4.9) the set Dist We also note that 1 Since w.e.p. A 6 (p 1 ) c holds, the lemma follows.
We will next check that all moderately-long intervals contain roughly the expected number of neighbours of any given vertex (recall that N (i) is the usual graph neighbourhood of i in G). More precisely, for a set S ⊂ V , an ordering σ on S, and a pair k, ∈ S, define the interval I(S, σ, k, ) = {s ∈ S : σ(k) < σ(s) < σ( )} and define I(S, σ) = {I(S, σ, k, ) : k, ∈ S} to be the collection of such intervals. For any vertex i ∈ V and set I ⊂ V , let For sets S, T ⊆ V , and ordering σ of S, define We now use this to show that Signal R (i, j) is sufficiently large.
Proof. We first show that A c 7 occurs w.e.p. for the relevant parameters. Let H be the σ-algebra generated by the variables {U i } and {B i }, and let H(G 1 ) be the σ-algebra of the variables {U i } and {B i }, and {U i,j : i, j ∈ V 1 }. Recalling that the size of the set |I ∩N (i)| appearing in the definition of A 7 can be written as a sum of independent Bernoulli random variables in these σ-algebras, 3 then applying Hoeffding's inequality and a union bound, we immediately have , j), and thus W (j, Dist R (i, j)) ≥ δ|Dist R (i, j)|. Also, by (4.9), C 2 /δ ≥ log(n) 3 for n sufficiently large. We can then directly conclude that Since the complements of the events A 6 (p 1 ) and A 7 (V 1 , V 2 \ V 1 , σ true ) occur w.e.p., the result follows.
In addition, we show that this large signal is indeed contained in R(i, j): That is, k is the most highly ranked vertex in the signal that is not included in R(i, j).
For simplicity of notation, let r j = r δ (U j ). By definition, U k ∈ I true , so U k ≥ r j − d 2 log(n) −1 . Each ∈ R(i, j) is a neighbour of i or j, so U < r j . Moreover, since ∈ R(i, j) and k ∈ R(i, j), it follows from the definition of R(i, j) that σ 1 ( ) > σ 1 (k). We assumed that σ 1 agrees with σ true at precision level d 1 . So then either and k are correctly ordered by σ 1 (in which case We saw immediately following (4.10) that (A 6 (p 1 )) c occurs w.e.p., completing the proof.
From this lemma and Lemma 4.3, we conclude that w.e.p., (4.14) Finally, we show that the noise in R(i, j) is smaller than this lower bound on the signal.
Lemma 4.5. W.e.p. for all i, j ∈ V 2 \ V 1 satisfying (4.11), Proof. We first show that w.e.p. U j < U k for all k ∈ R(i, j). By Assumption 3.5, w (2) is ( , α)-connected, and it follows that there must be an > 0 so that w(x, y) > 0, and thus w(x, y) ≥ δ, for all x, y with |x − y| ≤ . Consider the where the last inequality holds for all n sufficiently large. It follows that w.e.p. j has at least C 1 neighbours with U -values greater than U j , and so U j < U k for all k ∈ R(i, j) as desired.
We now proceed to bound the noise, i.e. the excess number of neighbours of i over the neighbours of j in R(i, j). Since w is diagonally increasing, it follows that there must be an interval I ∈ I(V 1 , σ 1 ) so that R(i, j) = (I ∩ N (i)) ∪ (I ∩ N (j)). By our choice of parameters (4.9), for all sufficiently large n We then have where (4.15) is used in the third line. The result follows from (4.12).
This completes the proof of the "correctness" of the first step of Algorithm 7, as summarized in the next lemma. For sets S ⊆ V , parameter d ∈ (0, 1), and ordering σ of S, define That is, (A 8 (S, σ, d)) c holds if σ agrees with σ true at precision level d.
Lemma 4.6. Suppose parameters C 1 , C 2 , p 1 , p 2 , d 1 , d 2 are chosen according to (4.9). Let V 1 , V 2 be sampled from V at rate p 1 , p 2 , respectively, and assume σ 1 is an ordering of V 1 , derived only from G[V 1 ], which agrees with σ true at precision level d 1 . Then, for σ 2 as computed in step 12 of Algorithm 7, That is, w.e.p. σ 2 is an ordering of V 2 \ V 1 that agrees with σ true at precision level d 2 .
We consider first the case In this case, i, j satisfy (4.11), and we conclude from (4.13), Lemma 4.4 and 4.5 that w.e.p.
and thus F (i, j) is set to 1 in Steps 4-9 of Algorithm 7. If (4.17) does not hold, then there may not be a sufficiently long interval I true to the right of U j , but by Assumption 1.8 there must be a corresponding interval for all x ∈ I true . By applying the arguments above to the values 1 − U i , we see then that w.e.p. |L(i, j) ∩ N (i)| − |L(i, j) ∩ N (j)| ≥ C 2 , and thus F (i, j) is set to 1 in Steps 4-9 as above. Therefore, in both cases w.e.p. i and j will ordered in σ 2 according to σ true . The result now follows by a union bound.
Finally, we show that the extension of σ 2 to all of V 2 has the desired precision level.
Lemma 4.7. Suppose p 1 , p 2 , d 2 , C 3 are as given in (4.9), and let V 1 , V 2 be sets sampled from V at rate p 1 , p 2 , respectively. Suppose σ 2 is an ordering of V 2 \ V 1 that agrees with σ true at precision level d 2 . Then That is, w.e.p. any ordering of V 2 extending σ 2 and based on the function F (2) as computed in steps 16-20 of Algorithm 7 agrees with σ true at precision level Proof. Fix i, j ∈ V 2 satisfying U j ≥ U i + d 2 log(n) 2 and (4.11). By the same argument leading to Inequality (4.2), this gives We will first show that, for all s ∈ I 2 (j), σ 2 (s) > t(i). Let k ∈ N (i) be the vertex at the top of the range of N (i) according to σ 2 -that is, σ 2 (k) = t(i).
Since k ∈ N (i), we have that U k ≤ r δ (U i ), and so for s ∈ I 2 (j) This implies that s and k are correctly ordered by σ 2 and thus t(i) = σ 2 (k) < σ 2 (s) as desired. This correct ordering implies that k ∈ I 2 (j), and so we see that t(j) − t(i) ≥ |I 2 (j) ∩ N (j)|, so we have the containment (4.18) From (4.9), we know that For large enough n, if (A 6 (p 2 ) ∪ A 6 (p 1 )) c holds then and if A 7 (V 2 , V 2 \ V 1 , σ true ) holds, then Putting together (4.18), (4.19) and (4.20), Since we have already shown P[ Similarly, if U i < U j and the opposite of (4.2) holds, then P[{b(j) − b(i) < C 3 }] ≤ n −Ω(log(n)) . Therefore, w.e.p. F (2) (i, j) is set to 1 in Step 19 of Algorithm 7. The result follows by a union bound.
The correctness of Algorithm 7 now follows directly from the results in this section.
Lemma 4.8. Let V 1 , V 2 be sets sampled at probabilities p 1 , p 2 , respectively, according to the variables {B i }, and suppose σ 1 is an ordering of V 1 that agrees with σ true at precision level d 1 . Suppose Algorithm 7 is executed with parameters C 1 , C 2 , C 3 . Assume that the conditions (4.9) That is, w.e.p. σ is a total order of V 2 that agrees with σ true at precision level d 1 /(p 1 n) log(n).

Proof of Theorem 2
Finally, we show the correctness of Algorithm 6 and prove Theorem 2. That is, we show that, if Algorithm 6 is executed with a particular set of parameters, and given as input a large enough graph G, then w.e.p. the algorithm will return a total order of V with error at most n , where > 0 is any desired constant error exponent. The proof will follow directly from Theorem 3 and Lemma 4.8. First, we define the parameters. For any 0 < < 0.5, let integer k and sequence {p i , d i } k i=1 be as follows: The definition of d 2 according to (4.21) above differs by a factor log 2 (n) from the definition of d 2 as given in (4.9); this extra factor corresponds to the extra log(n) 2 appearing in the statement of in Lemma 4.7, which bounds the error introduced when extending the ordering. Note that {p i } is an increasing sequence, with p k = 1. We will use the lemmas from previous sections to show that, after each iteration of Algorithm 6, the returned ordering of the subgraph induced by V i agrees with the true ordering at precision level d i . The lemma below will have as corollary that d k n −1+ , from which our result will follow.

Lemma 4.9.
For the parameters as defined in (4.21), and n sufficiently large, we have that for all 1 ≤ i ≤ k, In particular, there exists 0 < < so that d k ≤ n −1+ .
Proof. We prove the statement by induction on i. The statement for i = 1 can be directly verified from the definition of d 1 . Suppose then that the statement holds for 1 ≤ i < k. Then For n large enough, n −0.5β ≥ log 1/2 (n). This completes the proof of the first statement. To prove the second statement, observe first that Let δ = 2 −(k+1) ( − 2 −k ) ∈ (0, ) and = − δ. It then follows that The result then follows for n large enough so that n −δ log(n) ≤ 1.
In fact, a computation very similar to that used in the above lemma shows that {d i } k i=1 is a decreasing sequence. In the following, we fix a value > 0 and take the sequence as defined in (4.21). We assume the Assumptions of Theorem 2 hold, and for all i = 1, . . . , k the set V i = {s : B s ≤ p i } is as in Algorithm 6.
Algorithm 6 first calls Algorithm 1 with input G 1 = G[V 1 ], where V 1 is sampled from V at rate p 1 . The algorithm returns a total order σ 1 of V 1 . We will first show that, if p 1 , d 1 are as defined above in (4.21), then w.e.p. σ 1 agrees with σ true at precision level d 1 . That is, Moreover, σ 1 only depends on Proof. From (4.21), we have that p 1 = n −(k−1)β and d 1 = n −0.5 (1−kβ) . Note that V 1 is a set of size n 1 = |V 1 | randomly sampled from [0, 1]. By Theorem 3, w.e.p. the returned ordering σ 1 agrees with σ true or its reverse at precision level n −0.5 1 log(n 1 ) 4 . If A 6 (p 1 ) c holds and n is large enough, then n 1 ≥ 0.5 p 1 n, and which is smaller than d 1 for all n sufficiently large. The result follows.
The next lemma shows that Algorithm 7, when called by 6 to extend the ordering from V i to V i+1 , improves the precision from d i to d i+1 . Proof. Fix 1 ≤ i < k and assume σ i agrees with σ true at precision level d i . Note that the parameters C 1 , C 2 , C 3 as set in Step 6 of Algorithm 6 and used in the call to Algorithm 7, satisfy the conditions (4.21), with p i and d i taking the role of p 1 and d 1 , respectively. Lemma 4.8 then shows that w.e.p.
Finally, we prove Theorem 2: Proof of Theorem 2 . By definition, V k = V , and Algorithm 6 returns the ordering σ = σ k of V . By the previous lemmas and a union bound, w.e.p. σ will agree with σ true (or its reverse) at precision level d k . By Lemma 4.9, d k = n 1− , where 0 < < .
The techniques from the previous section readily show that, w.e.p., if an ordering agrees with σ true at precision level d, then it has error at most dn log(n). Since n −( − ) log(n) < 1 for large enough n, it follows that w.e.p. σ has error at most n .

Finding α
The first part of our results relies on knowing a good thresholding value of α ∈ [0, 1]. To implement the algorithm and obtain our guarantees, we need to find a value α for which Assumption 3.5 holds. Fortunately, for the general class of graphons satisfying the stronger conditions of Assumption 1.5, the set of possible values for α will contain an interval of positive measure. We now list a few simple consequences of our properties that allow us to estimate a suitable value of α from the observed graph G. It is straightforward to check that for certain classes of graphons (including e.g. (1.4)), these tests will be able to find a suitable value of α; in general, however, the following tests are necessary but not sufficient for the value of α to be suitable.
We begin with condition (1) of Assumption 3.5, namely the requirement that w (2) α is diagonally increasing. We have seen that, if w (2) α is diagonally increasing, then with high probability, a sample of size log(n) 5 (n) from G (2) α will be a proper interval graph. Such samples are taken repeatedly in Algorithm 2, and these samples are then tested by a proper interval graph recognition algorithm. Thus, if Step 3 of Algorithm 2 fails more than a small percentage of the times, then it is likely that w (2) α is not diagonally increasing. When this occurs, the algorithm should be restarted with a new value of α.
The second condition is more straightforward: we can test whether w (2) is uniformly (A, δ)-good by testing if for several test values of 0 < δ < δ. Moreover, we can restrict the choice of δ and δ to values in the range of G (2) n−2 . To test whether w (2) α is ( , α)-connected, we can use the test To test whether w (2) α is B-separated or has an -split is harder, since we do not have the ordering of the vertices. To test whether w (2) α is B-separated, we could define for any vertex v the vector This vector should be close to the volume in the definition of B-separation. Let s v be the elements of s v , sorted in increasing order. If w (2) α is B-separated, then we would expect the inequalitỹ to hold for typical v ∈ V and 1 ≤ i < j ≤ n (and to be close to true for all such values).
If w (2) α has an -split, then pairs of vertices at extreme ends of the ordering cannot have any common neighbours. Precisely, for pairs of vertices i, j with We expect there to be about 2 n 2 /2 such pairs. On the other hand, if inf |x|,|y|<δ w (2) (U i + x, U j + y) > 0 for some δ > 0, then by the law of large numbers N (i) ∩ N (j) = ∅ for sufficiently large n. This suggest that we should check It is easy to integrate these tests in the algorithm, so that a suitable value of α can be found in a reasonable amount of trials.
We note that we have not searched for a value of δ that satisfies Assumption 1.8, as we don't need to know δ to run Algorithm 6. In practice, we expect that it is often easy to diagnose graphons that don't satisfy Assumption 1.8 as follows. Let F (2) be as in step (12) of Algorithm 7 on the first time that it is called by Algorithm 6; if Assumption 1.8 is satisfied, then with extreme probability the maximum size of the set on which disagreement occurs max i,j∈V2\V1 : F (2) will not be much larger than d 2 n. In other words, we can diagnose a failure of Assumption 1.8 by simply checking whether F (2) could plausiblY have precision better than d 2 or so.

Large error of embeddings
It is common to study the seriation problem using something like the following two-step procedure: 1. Compute an estimateÛ = (Û 1 ,Û 2 , . . . ,Û n ) of the latent positions U = (U 1 , U 2 . . . , U n ) of the vertices 1, 2, . . . , n, using e.g. a spectral embedding of the graph G. 2. Compute an estimateσ of σ true based only onÛ , typically using the formula:σ It is then straightforward to show that, ifÛ has small error, the error of the induced orderingσ will inherit a similarly-small error. See e.g. [26] for recent work that conducts this sort of analysis.
This approach can only give a near-optimal estimate of the error ofσ if the error of an optimal estimatorσ is not much smaller than the error of an optimal estimatorÛ (after appropriate scaling). We will show in this section that this is rarely the case for the seriation problem studied in this paper.
To make this precise, we need some notation that is not used in the rest of the paper. For a graphon w and sequences u (1) define G = G(w; u (1) , u (2) ) to be the usual graph obtain from the graphon w and these sequences from the formula (1.2). Next, for constant δ ∈ [0, 1) and graphon w, definẽ Similarly, for any sequence u ∈ R d , definẽ Notice that taking the (1 − δ)'th power is a monotone bijection on [0, 1], and in particular these transformations preserve both (i) the diagonally-increasing property of w and (ii) the ordering of u. We have the following theorem: 1. For all graphons w and sequences u (2) = {u (2) i,j } n i,j=1 , G(w; V, u (2) ) = G w c √ n ; U, u (2) . (6.1) 2. There exists a single permutation σ ∈ S n such that that is, the two sequences have the same ordering. 3. Finally, Proof. See Appendix C.
We give the following informal interpretation: it is possible to couple samples G,G from the two graphons w,w c √ n so that with high probability both (i) the graphs themselves are identical, but (ii) a positive fraction of the "true" latent vertex embeddings are at least c √ n apart. In particular, since the observed graphs G,G are identical, there is no way to simultaneously estimate both of "true" latent vertex embeddings without a positive fraction of errors being at least c √ n . In principle this conclusion could be avoided if e.g. w is assumed to belong to a class that does not includew δ , even for very small values of δ. While possible, such an assumption seems to be extremely strong -indeed, we have shown that sequences of samples from w,w c √ n can't be told apart by looking at their associated graphs! Furthermore, even if the particular perturbationw δ can be ruled out, a quick inspection of the proof of Theorem 4 will convince the reader that similar results hold for a variety of other small perturbations of w. Lemma A.3. Let w be a uniformly embedded graphon, with decreasing link probability function f . Then w (2) is uniformly continuous.
Next we consider an alternative representation of diagonally increasing graphons, which will be helpful in some cases. We saw earlier (see discussion before Equation Under Conditions 1.5, we also know that d s ≤ d for all s > c, and d s = 1 otherwise. In Lemma A.5, to follow, we use this alternative representation to show a crucial property of w (2) . First, we make a simple observation, which will help us to conclude the second half of the lemma.
Proof. Inspecting the alternative representation given in (A.5), we see that the contribution to w (2)  Thus φ s,t is piecewise linear, with each of the linear pieces having slope in {−1, 0, 1}.
We are almost ready to prove that w (2) (x, ·) is decreasing on [x + d, 1], based on the following proposition: Suppose that for some s, t ∈ [0, 1], y lies in the interior of a region on which φ s,t has slope 1; we will check that (s, t) ∈ R + (y). So see this, note that, if φ s,t has slope 1 at y, then the left boundary of I s (x) ∩ I t (y) cannot be given by y − d t , while the right boundary must be y + d t . The right boundary condition (and the fact that we are in the interior of a region with slope 1) implies that y + d t < 1 and y + d t < x + d s . This implies that d s > d t + (y − x) ≥ d t , and thus x − d s ≤ y − d t . Thus we must have that the left boundary of I s (x) ∩ I t (y) equals 0, and thus y − d t < 0, and so (s, t) ∈ R + (y).
Next, we check that, if y > x + d, then R + (y) = ∅. As argued earlier, it follows from Conditions 1.5 that for all s, d s ≤ d or d s = 1 otherwise. Suppose then that y − x > d and (s, t) ∈ R + (y). By definition, d s > d t + (y − x) > d and thus d s = 1. But then, d t < 1 − y < 1 so d t ≤ d ≤ x + d < y, which contradicts the condition y > x + d.
Combining the conclusions in these two paragraphs completes the proof.
By Proposition A.6 if x + d < y < y, then φ s,t (y) − φ s,t (y ) ≤ 0, and consequently, This shows part (i) of Lemma A.5, namely that w (2) We will next prove Inequality (A.7). The lower bound follows immediately from the fact that for all s, t, φ s,t has slope at least −1. To prove the upper bound, we begin by defining We now check that φ s,t (y) − φ s,t (y ) = −(y − y ) (A. 10) holds for all x + d < y < y < x + 2d and (s, t) ∈ R − (y). We will do this by showing that, for all (s, t) ∈ R − (y), φ s,t has slope −1 at any point z ∈ [y , y]. Fix (s, t) ∈ R − (y) and z ∈ [y , y]. Then (s, t) ∈ R − (z). The condition d s + d t ≥ z − x in the definition of R − (z) guarantees that I s (x) ∩ I t (z) is non-empty. Since d s ≥ d t and x < z, Thus, the left boundary of I s (x) ∩ I t (z) equals z − d t . On the other hand, z + d t ≥ z > x + d ≥ x + d s so the right boundary is not given by z + d t . Thus, φ s,t has slope −1 at z. This implies that φ s,t is linear with slope −1 on the interval [y , y], which implies (A.10). Finally, we bound the size of R − (y) for x + d < y < x + 2d. Fix y ∈ (x + d, x + 2d), and define c 1 ≡ f ( y−x 2 ). In this case, (y − x)/2 < d and thus c 1 > c. Then for all c 1 ≥ t ≥ s > c and for all y ≤ z ≤ y, (y − x)/2 ≤ d t ≤ d s < d, and thus (s, t) ∈ R − (y). Therefore, |R − (z)| ≥ (c 1 − c) 2 /2.
We first prove Lemma 3.9, restated here for convenience: We next prove Lemma 3.10, restated here for convenience: Lemma 3.10. Let S be a uniformly chosen subset of V , of size m = log(n) 5 , and let S be as defined in (3.7). We have P[A 3 | F] = n −Ω(log(n)) on the Fmeasurable event A c . That is, w.e.p. for all 0 ≤ k ≤ 4R−3, both |S∩V k | ≥ log(n) 5

4R
and also |S \ S | ≤ log(n) 3 on A c . Moreover, for any fixed p, q ∈ V (G), where again both probabilites are on the event A c . To prove the second bound, fix p, q ∈ V (G). An analogous argument applied to the set S − {p, q} gives the result. We next prove Lemma 3.11, restated here for convenience: Lemma 3.11. Let S be a uniformly chosen subset of V , of size m = log(n) 5 , and let S as defined in (3.7). For each i ∈ S, let D S (i) = |{j ∈ S : G (2) α (i, j) = 1}| be the number of neighbours of i in G Thus we get that |S ∩ V k | ≥ |S ∩ V k | − |S/S | ≥ log(n) 5 /(8R) for n sufficiently large. Thus V k ∩ S = ∅ for all k, which implies that H α (S ) is connected. By the choice of R, for any two vertices i, j ∈ V k , H α (U i , U j ) = 1. Fix i ∈ S. By definition, for each j ∈ S , H α (i, j) = G (2) α (i, j). Let k be so that U i ∈ I k . Then for all vertices j ∈ S ∩ V k , G (2) α (i, j) = 1. As argued above, there are at least log(n) 5 /(8R) such vertices. This shows that {min i∈S D S (i) < log(n) 5 8R } ⊂ A 3 . We next prove Lemma 3.12, restated here for convenience: on the F-measurable event A c . That is, w.e.p. for all i, j ∈ S so that |U i − U j | > 1 log(n) , ν S (i, j) ≥ log(n) 3 .
Proof. Fix 1 ≤ i, j ≤ n, so that U i < U j , and let W i,j = {k : U k ∈ (U i , U j )}. For each k ∈ W i,j , let e k = 1 {k∈S} . Then |S ∩ W i,j | = i∈Wi,j e k , and E[ i∈Wi,j e k | F] = (m/n)|W i,j |.
If A c 5 holds, then (m/n)|W i,j | ≤ m|U j − U i | + m log(n) √ n ≤ m|U i − U j | + 1 for n sufficiently large. By Azuma's inequality, on A c The result then follows by a union bound.