Universality for critical heavy-tailed network models: Metric structure of maximal components

We study limits of the largest connected components (viewed as metric spaces) obtained by critical percolation on uniformly chosen graphs and configuration models with heavy-tailed degrees. For rank-one inhomogeneous random graphs, such results were derived by Bhamidi, van der Hofstad, Sen [Probab. Theory Relat. Fields 2018]. We develop general principles under which the identical scaling limits as the rank-one case can be obtained. Of independent interest, we derive refined asymptotics for various susceptibility functions and the maximal diameter in the barely subcritical regime.


INTRODUCTION
Over the last decades, applications arising from complex systems in different fields have inspired a host of models for networks as well as models of dynamically evolving networks. One of the major themes in the study of these models has been in the nature of the emergence of the giant component. A classical example is the percolation process, where each edge of the network is independently kept with probability p, and deleted otherwise. As p increases from 0 to 1, the graph experiences a transition in the connectivity structure, i.e., there exists a "critical percolation value" p c such that for any ε > 0 and p < p c (1 − ε), the proportion of vertices in the largest component is asymptotically negligible, while for p > p c (1 + ε), a unique giant component emerges containing an asymptotically positive proportion of vertices [8,35,42,45,50].
Understanding the behavior at criticality is one of the key questions in statistical physics because the components exhibit unique and key features in the critical regime. In the physics literature, the critical behavior of percolation relates to studying optimal paths in networks in the so-called strong disorder regime. A wide array of conjectures and heuristic deductions of the associated critical exponents can be found in [21,22,28,41]. In a nutshell, these conjectures can be described as follows: The intrinsic nature of the critical behavior does not depend on the exact description of the model, but only on moment conditions on the degree distribution. There are two major universality classes corresponding to the critical regime and the nature of emergence of the giant depending on whether the degree distribution has asymptotically finite third moment or infinite third moment. For example, in case of power-law degree distributions (i.e., P(D ≥ x) ≈ x −(τ −1) the precise nature of the approximation left implicit), the nature of the critical behavior depends only on the power-law degree exponent τ : (a) For τ > 4, the maximal component sizes are of the order n 2/3 in the critical regime, whilst typical distances in these maximal connected components scale like n 1/3 ; (b) For τ ∈ (3,4), the maximal component sizes are of the order n (τ −2)/(τ −1) , whilst distances scale like n (τ −3)/(τ −1) .
The above conjectures have inspired a large and beautiful collection of works in probability theory. In a seminal work, Aldous [3] provided a detailed understanding for the vector of rescaled component sizes at criticality for Erdős-Rényi random graphs, and the scaling limits for component sizes are now well understood under quite general setups in both finite third-moment [16,32,51,56,57,59,61] and infinite third-moment [17,31,51,61] settings. We refer the reader to [30,Chapter 1], [43,Chapter 4] for detailed discussions about this topic. A recent and emerging direction in this literature aims at understanding the critical component structures, and distances within these components from a very general perspective. This line of work was pioneered by Addario-Berry, Broutin and Goldschmidt [1], where the largest connected components were shown to converge when viewed as metric spaces (see below for exact definitions). Subsequently, [10,11,13] have explored the universality class corresponding to [1], showing that the universality in the finite third-moment setting holds not only with respect to functionals like component sizes, but also the entire metric structure. On the other hand, in the infinite third-moment setting, a recent result [15] shows that the metric structure turns out to be fundamentally different. The results in [15] was obtained for one fundamental random graph model (rank-one model, closely related to the Chung-Lu [26,27] and Norros-Reittu model [23]) under the assumption that the weights follow a power-law distribution. In this paper, we explore the universality class corresponding to the candidate limit law established in [15]. Informally, the main contributions of this paper are as follows: £ Universality theorem: We establish sufficient conditions that imply convergence to the limits established in [15]. This is described later in Theorem 5.2. Since we need to set up a number of constructs, a formal statement is deferred until all of these objects have been defined. We refer to Theorem 5.2 as a universality theorem because it identifies the domain of attraction of the limit laws in [15]. Informally, the theorem implies that if a sequence of dynamic networks satisfies some entrance boundary conditions in the barely subcritical regime, and evolves approximately according to the multiplicative coalescent dynamics over the critical window, then the metric structure of the critical components are close to those for rank-one inhomogeneous random graphs. Theorem 5.2 is similar in spirit to [10,Theorem 3.4], but our result holds for the infinite third-moment degrees. Technically, we do not need additional restrictions as in [10,Assumption 3.3], since we compare the metric structures in the Gromov-weak topology, instead of the Gromov-Hausdorff-Prokhorov topology. The universality theorem holds under arguably optimal assumptions (see Remark 9).
£ Critical percolation on graphs with given degrees: Our primary motivation was to analyze the critical regime for percolation on the uniform random graph model (and the closely associated configuration model) with a prescribed degree distribution that converges to a heavy-tailed degree distribution. Limit laws for the metric structure of maximal components in the critical regime are described in Theorems 2.1 and 2.2. These results are proved under Assumption 1, which is the most general set of assumptions under which the component sizes were shown to converge in [31] (see [31,Section 2 and 3] for the applicability and necessity of these assumptions).
£ Barely subcritical regime: In order to carry out the above analysis and in particular to apply the universality theorem for percolation on configuration models, we establish refined bounds for component sizes, various susceptibility functionals, and diameters of connected components in the barely subcritical regime of the configuration model which are of independent interest; these are described in Theorems 2.3 and 2.4.
1.1. Organization of the paper. In Section 2, we describe the configuration model and critical behavior of percolation, which is the main motivation of this paper, and then describe the main results relevant to this model. Section 3 has a detailed discussion about the relevance of the results in this paper, some open problems, and an informal description of the proof ideas. We provide a full description of the limit objects and various notions of convergence of metric-space-valued random variables in Section 4. Section 5 describes and proves the general universality result. Section 6 proves results about the configuration model in the barely subcritical regime. Finally, Section 7 combines the above estimates with a coupling of the evolution of the configuration model through the critical percolation scaling window to finish the proof of Theorem 2.1.

CRITICAL PERCOLATION ON THE CONFIGURATION MODEL
In this section, we state our main results. In Section 2.1, we state the results about the metric structure of the largest critical percolation clusters of the configuration model. We defer full definitions of the limit objects as well as notions of convergence of measured metric spaces to Section 4. In Section 2.2, we state the results about the barely subcritical regime, and we conclude this section with an overview of the proofs in Section 2.3.

Metric structure of the critical components.
The configuration model. Consider n vertices labeled by [n] := {1, 2, ..., n} and a non-increasing sequence of degrees d = (d i ) i∈ [n] such that n = i∈[n] d i is even. For notational convenience, we suppress the dependence of the degree sequence on n. The configuration model on n vertices having degree sequence d is constructed as follows [18,55]: Equip vertex j with d j stubs, or half-edges. Two half-edges create an edge once they are paired. Therefore, initially we have n = i∈[n] d i half-edges. Pick any one half-edge and pair it with a uniformly chosen half-edge from the remaining unpaired half-edges and keep repeating the above procedure until all the unpaired half-edges are exhausted.
Let CM n (d) denote the graph constructed by the above procedure. Note that CM n (d) may contain self-loops or multiple edges. Let UM n (d) denote the graph chosen uniformly at random from the collection of all simple graphs with degree sequence d. It can be shown that the conditional law CM n (d), conditioned being simple, is same as UM n (d) (see [42,Proposition 7.15]). It was further shown in [46] that, if the degree distribution satisfies a finite second-moment condition (a condition which will hold in the context of this paper), then the asymptotic probability of the graph being simple converges to a positive limit.
Let us now describe the assumptions on the degree sequences. For p > 0, define the metric space p ↓ = (x 1 , x 2 , . . . ) ∈ R N + : with metric d(x, y) = i |x i − y i | p 1/p . Fix τ ∈ (3,4). Throughout this paper we use the following functionals of τ : Assumption 1 (Degree sequence). For each n ≥ 1, let d = d n = (d 1 , . . . , d n ) be a degree sequence (d i 's may depend on n, but we suppress n in the notation for clarity). We assume the following about (d n ) n≥1 as n → ∞: Remark 1. Assumption 1 is identical to [31,Assumption 1]. We refer the reader to [31,Sections 2 and 3] for discussions about the relevance and necessity of these assumptions. It was shown in [31,Section 2] that Assumption 1 is satisfied in two key settings, when (i) the degrees are taken to be an i.i.d. sample from a power-law distribution, and (ii) the degrees are chosen according to the quantiles of a power-law distribution. The first setting has been considered in [51], and the latter setting has been considered for the rank-one inhomogeneous random graphs in [15,17].
The component sizes of CM n (d) are known to undergo a phase transition [49,55] depending on the parameter . (2.4) When ν > 1, CM n (d) is supercritical in the sense that there exists a unique giant component with high probability, and when ν < 1, all the components have size o(n) with high probability and CM n (d) is subcritical. In this paper, when considering percolation on CM n (d), we will always assume that ν > 1, i.e. CM n (d) is supercritical. (2.5) Percolation refers to deleting each edge of a graph independently with probability 1 − p. In the case of percolation on random graphs, the deletion of edges is also independent from the underlying graph. Let CM n (d, p n ) and UM n (d, p n ) denote the graphs obtained from percolation with probability p n on the graphs CM n (d) and UM n (d), respectively. For p n → p, it was shown in [45] that the critical point for the phase transition of the component sizes is p = 1/ν. The critical window for percolation was studied in [31,32] to obtain the asymptotics of the largest component sizes and their surplus edges. In the infinite third-moment setting, CM n (d, p n ) lies in the critical window when, for some λ ∈ R, We now explain the precise meaning of convergence of components as metric spaces. Let C p (i) (λ) denote the i-th largest component of CM n (d, p n (λ)). A measured metric space is a metric space equipped with a measure on the associated Borel sigma-algebra. Each component C can be viewed as a measured metric space with (i) the metric being the graph distance where each edge has length one; (ii) the measure being proportional to the counting measure, i.e., for any A ⊂ C , the measure of A is given by µ ct,i (A) = |A|/|C p (i) (λ)|, where |A| denotes the cardinality of A. For a generic measured metric space M = (M, d, µ) and a > 0, aM denotes the measured metric space (M, ad, µ). We write S * for the space of all measured metric spaces equipped with the Gromov-weak topology (see Section 4.1) and let S N * denote the corresponding product space with the accompanying product topology. For each n ≥ 1, view n −η C p (i) (λ) i≥1 as an object in S N * by appending an infinite sequence of empty metric spaces after enumerating the components in CM n (d, p n (λ)). The main results for critical percolation on the configuration model are as follows: Theorem 2.1. Consider CM n (d, p n (λ)) satisfying Assumption 1, (2.5) and (2.6) for some λ ∈ R. There exists a sequence of random measured metric spaces (M i (λ)) i≥1 such that on S N * , as n → ∞, analogous results as Theorems 2.1 and 2.2 with µ w,i 's, we require w i 's to satisfy some regularity conditions (see Assumption 2 below). The reason will be discussed in Remark 15.

Remark 5.
The results above can be extended to the case P(D n ≥ x) ∼ L(x)x −(τ −1) , where L(·) is a slowlyvarying function. The scaling limits would be the same, however the scaling exponents will be different as observed in [31]. In particular, the width of the scaling window now turns out to be n −η L 1 (n) 2 (for some slowly varying L 1 (·)) instead of n −η , and results identical to Theorem 2.1 can be obtained by scaling the distances by n η L 1 (n) −2 .
2.2. Mesoscopic properties of the critical clusters: barely subcritical regime. One of the main ingredients in the proof of Theorem 2.1 is a refined analysis of various susceptibility functions in the barely subcritical regime (see (2.9) below for a definition) for the percolation process. The barely subcritical and supercritical regimes correspond to regimes that are just below or above the critical window. For the percolation process under Assumption 1, barely subcritical (supercritical) behavior is observed for p satisfying n η (p − p n (0)) → −∞ (n η (p − p n (0)) → ∞), where p n (0) is defined in (2.6) for λ = 0. These behaviors are well understood for Erdős-Rényi random graphs [48,Section 23], [19,50] and configuration models in the Erdős-Rényi universality class [40,52,59]. For barely supercritical configuration models in the heavy-tailed setting, the size of the emerging giant component was obtained in [44]. We provide a detailed picture about the component sizes and susceptibility functions in the subcritical regime below. We will prove general statements about the susceptibility functions applicable not just to percolation on the configuration model, but rather to any barely subcritical configuration model. Since percolation on a configuration model yields a configuration model [35,45], the above yields susceptibility functions for percolation on configuration model as a special case. To set this up we need a little more notation, where each vertex in the network is associated with both degree and weight, satisfying the following assumptions: Assumption 2 (Barely subcritical degree sequence). Let d = (d 1 , . . . , d n ) be a degree sequence and let w (·) : [n] → R be a non-negative weight function such that the following conditions hold: (i) Assumption 1 holds for d with some c ∈ 3 ↓ \ 2 ↓ , and (iii) (Barely subcritical regime) The configuration model is at the barely subcritical regime, i.e., there exists 0 < δ < η and λ 0 > 0 such that Let C (j) denote the connected component of CM n (d ) containing vertex j, and define and W i = k∈C i w k . Define the weight-based susceptibility functions as The definition in (2.10) takes care of the double counting in the definition of susceptibility functions. Also, define the weighted distance-based susceptibility as For a connected graph G, ∆(G) denotes the diameter of the graph, and for any arbitrary graph G, ∆ max (G) := max ∆(C ), where the maximum is taken over all connected components C ⊂ G. We simply write ∆ max for ∆ max (CM n (d )). The asymptotics of ∆ max is derived below: Theorem 2.4 (Maximum diameter). Under Assumption 2, as n → ∞, P(∆ max > n δ (log(n)) 2 ) → 0. Remark 6. By taking w i = 1 for all i ∈ [n], implies that W i = |C i |, and thus Theorem 2.3 hold also for the usual susceptibility functions defined in terms of the component sizes (cf. [47]). In the proof of Theorem 2.1, we will require a more general weight function, where w i is taken to be the number of half-edges deleted from vertex i due to percolation.

2.3.
Overview of the proof. We now summarize the key ideas of the proofs at a heuristic level.
Universality theorem. As discussed earlier, we first prove a universality theorem (Theorem 5.2) which roughly states that if one replaces the vertices in a rank-one inhomogeneous random graph by small metric spaces (called blobs), then the limiting metric space structure remains identical. The characterization of blobs leads to some asymptotic negligibility conditions, formally stated in Assumption 4, which simply says that the diameter of the individual blobs must be negligible compared to the typical distances in the whole graph. However, the typical distance can be cumulatively affected by the blobs, hence we get a different scaling factor for distances in Theorem 5.2 than in Theorem 5.1.
Mesoscopic or Blob-level analysis. Percolation on CM n (d) can be viewed as a dynamic process by associating i.i.d. uniform[0, 1] weights U e to each edge e, and keeping e if U e ≤ p. The parameter p ∈ [0, 1] can be interpreted as time. Now for p n = p n (λ n ), for some λ n → −∞, CM n (d, p n (λ n )) lies in the barely subcritical regime and the estimates for different functionals can be obtained using Theorem 2.3. We regard the components of CM n (d, p n (λ n )) as the blobs. Under the current scaling, the blobs shrink to zero, and the edges appearing in the dynamic process between the interval [p n (λ n ), p n (λ)] connecting the blobs give rise to the macroscopic structure of the largest components of CM n (d, p n (λ)). However, the effects of the blobs on the limiting structure are reflected via different functionals, which is the reason for referring to the properties of the blobs as mesoscopic properties.
Coupling to the multiplicative coalescent. Finally, the goal is to understand the macroscopic structure formed between blobs within the time interval [p n (λ n ), p n (λ)]. The merging dynamics of the components between [p n (λ n ), p n (λ)] can be heuristically described as follows: Let p 0 be a time when an edge appears. Then the two half-edges corresponding to the new edge are chosen uniformly at random from the open half-edges (half-edges deleted due to percolation) of CM n (d, p 0 −). Therefore, if (O i (p)) i≥1 denotes the vector of open half-edges in distinct components at time p, then the clusters corresponding to O i (p) and O j (p) merge at rate proportional to O i (p) × O j (p) and creates a new cluster with O i (p) + O j (p) − 2 open half-edges. Thus, the elements of the vector (O i (p)) i≥1 , seen as masses, merge approximately as the multiplicative coalescent (see Definition 3), in the sense that the dynamics experience a depletion of half-edges in the components. Now, we can run a parallel process where the paired half-edges are replaced with new dummy open half-edges to the corresponding vertices [10,31]. The dynamics in the latter process gives rise to an exact multiplicative coalescent and due to this fact, the modified graphḠ n can be shown to be distributed as a rank-one inhomogeneous random graph with the blobs being the mesoscopic components at time p n (λ n ). Now, the graphḠ n becomes the candidate for applying our universality theorem (see Theorem 7.14).
Structural comparison. Finally, we perform a structural comparison between CM n (d, p n (λ)) and G n to conclude Theorem 2.1. Let us consider the largest component C p (1) (respectivelyC p (1) ) of CM n (d, p n (λ)) (respectivelyḠ n ). By the above coupling (with dummy half-edges being added), C p (1) ⊂C p (1) and we know the asymptotic metric structure ofC p (1) . Now, the idea is to show that (a) (1) , the shortest path between them in C p (1) andC p (1) are identical. These two properties conclude the proof of Theorem 2.1 under the Gromov-weak topology.

DISCUSSION
Optimality of assumptions and Gromov-weak topology. As mentioned in the introduction, our goal is not only to consider critical percolation, but to explore the universality class for the scaling limits in [15] in the same spirit as it was done in [10] for the Erdős-Rényi universality class. Our universality theorem (Theorem 5.2) holds under optimal assumptions, and does not require additional restrictions such as [10,Assumption 3.3]. However, it is worthwhile noting that the universality theorem (and consequently Theorem 2.1) holds with respect to the Gromov-weak topology instead of the stronger Gromov-Hausdorff-Prokhorov (GHP) topology. This is not a restriction that we impose, but in fact there is a conceptual barrier. If the convergence in Theorem 2.1 would hold in the GHP-topology only under Assumption 1, then the limiting metric space would be compact for any θ ∈ 3 ↓ \ 2 ↓ , but additional restrictions are needed for the compactness of the limiting metric space and simply assuming θ ∈ 3 ↓ \ 2 ↓ does not suffice. See [5,Section 7], for an explicit conjecture about the compactness of such metric spaces by Aldous, Miermont and Pitman. In a follow-up work [12], we extend the scaling limit results in the GHP-topology by establishing the so-called global lower mass bound property [9, Theorem 6.1], which ensures that the components have sufficient mass everywhere and thus forbids the existence of long, thin paths, when the total mass of the component converges. However, one needs additional technical conditions in [12] on top of Assumption 1 to prove the global lower mass bound property.

Extensions, recent developments and open problems.
(i) The universality theorem is applicable to dynamically evolving random networks with heavytailed degrees, which evolve (approximately) as the multiplicative coalescent over the critical window, and satisfy some nice properties such as Theorem 2.3 in the barely subcritical regime. For this reason, we believe that the universality theorem and the methods of this paper are applicable to many known inhomogeneous random graph models with suitable kernels [20], as well as Bohman-Frieze processes which satisfy different initial conditions so that one gets a heavy tailed-degree distribution at criticality. We leave these as interesting open problems. (ii) In a recent work, Broutin, Duquesne and Wang [24] obtained structural limit laws for rank-one inhomogeneous random graphs which evolve as general multiplicative coalescent processes over the critical window. This framework unifies the scaling limits for the heavy-tailed and non heavy-tailed cases in terms of a single limit law. It will be interesting to prove a universality theorem for the limit laws in [24]. (iii) Recently, Conchon-Kerjan and Goldschmidt [29] derived the scaling limit of the maximal components at criticality for CM n (d) when the degrees form an i.i.d. sample from a power-law distribution with τ ∈ (3,4). The properties of the corresponding limiting object was investigated in a recent preprint by Goldschmidt, Haas, and Sénizergues [37]. The scaling limits in the i.i.d. setting has a completely different description of the limiting object compared to the one in this paper. It will be interesting to explore the connections between the results in the above paper and the current work. (iv) It turns out that the study of the component structures corresponding to critical percolation plays a crucial role in the study of the metric structure of the minimal spanning tree (MST) [2]. In fact, a detailed understanding of the metric structures in the critical window obtained in [1] played a pivotal role in the proofs of [2]. The connections to the MST-problem such as those outlined in [2] suggest that the scaling limit results in this paper will be useful in the study of metric structures for the MST for graphs with given degrees in the heavy-tailed regime. However, the MST problem in this regime is an open question.

CONVERGENCE OF METRIC SPACES, DISCRETE STRUCTURES AND LIMIT OBJECTS
The aim of this section is to define the proper notion of convergence relevant to this paper (Section 4.1), set up discrete structures required in the statement and in the proof of the universality result in Theorem 5.2 (Sections 4.2, 4.3, 4.4), and describe limit objects that arise in Theorem 2.1 (Sections 4.5 and 4.6).

Gromov-weak topology.
A complete separable measured metric space (denoted by (X, d, µ)) is a complete, separable metric space (X, d) with an associated probability measure µ on the Borel sigma algebra B(X). The Gromov-weak topology is defined on S 0 , the space of all complete and separable measured metric spaces (see [38,39], [15,Section 2.1.2]). The notion is formulated based on the philosophy of finite-dimensional convergence. Two measured metric spaces (X 1 , d 1 , µ 1 ), (X 2 , d 2 , µ 2 ) are considered to be equivalent if there exists an isometry ψ : support(µ 1 ) → support(µ 2 ) such that µ 2 = µ 1 • ψ −1 . Let S * be the space of all equivalence classes of S 0 . We (slightly) abuse the notation by not distinguishing between a metric space and its corresponding equivalence class. Fix l ≥ 2 and (X, d, µ) ∈ S * . Given any collection of points x = (x 1 , . . . , x l ) ∈ X l , define D(x) := (d(x i , x j )) i,j∈[l] to be the matrix of pairwise distances of the points in x. A function Φ : S * → R is called a polynomial if there exists a bounded continuous function φ : where µ ⊗l denotes the l-fold product measure. A sequence {(X n , d n , µ n )} n≥1 ⊂ S * is said to converge to (X, d, µ) ∈ S * if and only if Φ((X n , d n , µ n )) → Φ((X, d, µ)) for all polynomials Φ on S * . By [38, Theorem 1], S * is a Polish space under the Gromov-weak topology.

Super graphs.
Our super graphs consist of three main ingredients: 1) A collection of metric spaces called blobs; 2) A graphical superstructure determining the connections between the blobs; 3) Connection points or junction points at each blob. In more detail, super graphs contain the following structures (see Figure 1): x y Junction points Using these three ingredients, define a metric space (M ,d,μ) = Γ(G, p, M, X), withM = i∈[m] M i (the disjoint union of the M i 's) by putting an edge of length one between the pair of points {(X i,j , X j,i ) : (i, j) is an edge of G}. The distance metricd is the natural metric obtained from the graph distance and the inter-blob distance on a path. More precisely, for any x, y ∈M with x ∈ M j 1 and y ∈ M j 2 , where the infimum is taken over all paths (i 1 , . . . , i k−1 ) in G and all k ≥ 1, and i 0 = j 1 and i k = j 2 . The measureμ is given byμ(A) := i∈[m] p i µ i (A ∩ M i ), for any measurable subset A ofM . Note that there is a one-to-one correspondence between the components of G and Γ(G, p, M, X) as the blobs are connected.

4.3.
Space of trees with edge lengths, leaf weights, root-to-leaf measures, and blobs. In the proof of the main results we need the following spaces built on top of the space of discrete trees. The first space T IJ was formulated in [6,7] where it was used to study trees spanning a finite number of random points sampled from an inhomogeneous continuum random tree (as described in the next section). A tree t ∈ T IJ can be viewed as being composed of two parts: (1) shape(t) describing the shape of the tree (including the labels of leaves and hubs) but ignoring edge lengths. The set of all possible shapes T shape IJ is obviously finite for fixed I, J. (2) The edge lengths l(t) := (l e : e ∈ t). We will consider the product topology on T IJ consisting of the discrete topology on T shape IJ and the product topology on R E(t) , where E(t) is the number of edges of t. For each v ∈ L(t), the path [0+, v] can be viewed as a compact measured metric space with the measure being ν t,v . Let X denote the space of compact measured metric spaces endowed with the Gromov-Hausdorff-Prokhorov topology (see [15,Section 2.1.1]). In addition to the topology on T IJ , the space T * IJ with the additional two attributes inherits the product topology on R J due to leaf weights and X J due to the paths [0+, v] endowed with ν t,v for each v ∈ L(t). For consistency, we add a conventional state ∂ to the spaces T IJ and T * IJ . Its use will be made clear in Section 5. For all instances in this paper, the shape of a tree shape(t) will be viewed as a subgraph of a graph with m vertices. In that case, the tree will be assumed to inherit the vertex labels from the original graph. We will often write t ∈ T * m IJ to emphasize the fact that the vertices of t are labeled from a subset of [m]. Construct the metric spacet with elements inM (t) = i∈t M i , by putting an edge of length one between the pair of vertices {(X i,j , X j,i ) : (i, j) is an edge of t}. The distance metric is given by (4.2). The path from the leaf v to the root 0+ now contains blobs. Replace the root-to-leaf measure bȳ can be viewed as a subset of T * m IJ . In the proof of the universality theorem in Section 5, the blobs will be a fixed collection and, therefore, any t ∈ T * m IJ corresponds to a uniquet ∈ T * m IJ . respectively. An ordered rooted tree is a rooted tree where children of each individual are assigned an order. We define a random tree model called p-trees [25,58], and their corresponding limits, the so-called inhomogeneous continuum random trees, which play a key role in describing the limiting metric spaces. Fix m ≥ 1, and a probability mass function p = (p i ) i∈[m] with p i > 0 for all i ∈ [m]. A p-tree is a random tree in T m , with law as follows: For any fixed t ∈ T m and v ∈ t, write d v (t) for the number of children of v in the tree t.
Then the law of the p-tree, denoted by P tree , is defined as Note that a normalizing constant is not required in (4.3) to make it a probability distribution (see [25,Lemma 1]). Generating a random p-tree T ∼ P tree and then assigning a uniform random order on the children of every vertex v ∈ T gives a random element with law P ord (·; p) given by 4.4.1. The birthday construction of p-trees. We now describe a construction of p-trees, formulated in [25], that is relevant to this work. Let Y := (Y 0 , Y 1 , . . .) be a sequence of i.i.d. random variables with distribution p. Let R 0 = 0 and for l ≥ 1, let R l denote the l-th repeat time, i.e., This gives a tree which we view as rooted at Y 0 . The following striking result was shown in [25]:

Theorem 4.1 ([25, Lemma 1 and Theorem 2]). The random tree T (Y), viewed as an element in
, the tree constructed in the first R r steps. Further takeỸ = (Ỹ 1 , . . . ,Ỹ r ) to be an i.i.d. sample from p and then construct the subtree S r spanned byỸ. Then the above result (formalized as [25,Corollary 3] We will use this fact in Section 5 to complete the proof of the universality theorem.

4.4.2.
Tilted p-trees and connected components of NR n (x, t). Consider the vertex set [n] and assign weight x i to vertex i. Now, connect each pair of vertices i, j (i = j) independently with probability q ij := 1 − exp(−tx i x j ). The resulting random graph, denoted by NR n (x, t), is known as the Norros-Reittu model or the Poisson graph process [42]. For a connected component C ⊆ NR n (x, t), let mass(C) := i∈C x i and, for any t ≥ 0, let (C i (t)) i≥1 denote the components in decreasing order of their mass sizes. In this section, we describe results from [14] that give a method of constructing connected components of NR n (x, t), conditionally on the vertices of the components. This construction involves tilted versions of p-trees introduced in Section 4.4. Since these trees are parametrized via a driving probability mass function (pmf) p, it will be easy to parametrize various random graph constructions in terms of pmfs as opposed to the vertex weights x. Proposition 4.2 will relate vertex weights to pmfs.
Fix n ≥ 1 and V ⊂ [n], and write G con V for the space of all simple connected graphs with vertex set V. For fixed a > 0, and probability mass function p = (p v ) v∈V , define probability distributions P con (·; p, a, V) on G con V as follows: For i, j ∈ V, denote Then, for G ∈ G con V , where Z(p, a) is the normalizing constant. Now let V (i) be the vertex set of C i (t) for i ≥ 1, and note that (V (i) ) i≥1 denotes a random finite partition of the vertex set [n]. The next proposition yields a construction of the random (connected) graphs (C i (t)) i≥1 : Proposition 4.2 yields the following construction of NR n (x, t): The random graph NR n (x, t) can be generated in two stages: (S0) Generate the random partition (V (i) ) i≥1 of the vertices into different components.
(S1) Conditionally on the partition, generate the internal structure of each component following the law of P con (·; p (i) , a (i) , V (i) ), independently across different components.
Let us now describe an algorithm to generate such connected components using distribution (4.7). To ease notation, let V = [m] for some m ≥ 1 and fix a probability mass function p on [m] and a constant a > 0 and write P con (·) := P con (·; p, a, [m]) on G con m := G con [m] . To generate a sample G from P con , one needs to first generate a p-tree (with suitable tilt). The rest of the edges of G are surplus edges, which are generated by connecting the leaves to one of the vertices in their path to the root.
Let us now describe this process formally. As a matter of convention, we view ordered rooted trees via their planar embedding using the associated ordering to determine the relative locations of siblings of an individual. We think of the left most sibling as the "oldest". Further, in a depth-first exploration, we explore the tree from left to right. Now given a planar rooted tree t ∈ T m , let ρ denote the root, and for every vertex v ∈ [m], let [ρ, v] denote the path connecting ρ to v in the tree. Given this path and a vertex i In the terminology of [1,15], P(v, t) denotes the set of endpoints of all permitted edges emanating from v. Define (4.10) Let (v (1), v(2), . . . , v(m)) denote the order of the vertices in the depth-first exploration of the tree t.
Let y * (0) = 0 and y * (i) = y * (i − 1) + p v(i) and define where a is defined in (4.6). Define the function for t ∈ T ord m . Recall the (ordered) p-tree distribution from (4.4). Let T p m be a sample from P ord .
Using L(·) to tilt this distribution results in the distribution , t ∈ T ord m . (4.14) In the algorithm below, all the objects depend on the tree t, but we often suppress this dependence to ease notation.

Algorithm 2.
LetG m (p, a) denote a random graph sampled from P con (·). This algorithm gives a construction ofG m (p, a), proved in [15]: (S1) Tilted p-tree: Generate a tilted ordered p-tree T p, m with distribution (4.14). Now consider the (random) objects P(v, in the tree that will be joined to form the surplus edges. (S3) First endpoints: Fix j and suppose s j ∈ (y * (i − 1), y * (i)] for some i ≥ 1, where y * (i) is as given right above (4.11). Then the first endpoint of the surplus edge corresponding to (s j , t j ) is L j := v(i). (S4) Second endpoints: Note that in the interval (y * (i − 1), y * (i)], the functionĀ (m) is of constant value aG (m) (v(i)). We will view this value or height as being partitioned into sub-intervals of length ap u for each u ∈ P(v(i), T p, m ), the collection of endpoints of permitted edges emanating from L k . (Assume that this partitioning is done according to some preassigned rule, e.g., using the order of the vertices in P(v(i), T p, m )). Suppose that t j belongs to the interval corresponding to u. Then the second endpoint is R j = u. Form an edge between (L j , R j ). (S5) In this construction, it is possible that one creates more than one surplus edge between two vertices. Remove any multiple surplus edges. This has vanishing probability in our applications.

Definition 1.
Consider the connected random graphG m (p, a), given by Algorithm 2, viewed as a measured metric space via the graph distance and each vertex v is assigned measure p v . . The random graphG m (p, a) generated by Algorithm 2 has the same law as P con (·). Further, conditionally on T p, m , the following hold: , generate the second endpoints in an i.i.d. fashion where conditionally on L j = v, the probability distribution of R j is given by Create an edge between L j and R j for 1 ≤ j ≤ k.

Inhomogeneous continuum random trees.
In a series of papers [5,6,7] it was shown that p-trees, under various assumptions, converge to inhomogeneous continuum random trees (ICRTs) that we now describe. Recall from [34,53] that a real tree is a metric space (T , d) that satisfies the following for every pair a, b ∈ T :

Construction of the ICRT:
we will now define the inhomogeneous continuum random tree T β . We mainly follow the notation in [7]. Assume that we are working on a probability space (Ω, F, P β ) rich enough to support the following: (a) For each i ≥ 1, let P i := (ξ i,1 , ξ i,2 , . . .) be rate β i Poisson processes that are independent for different i. The first point of each process ξ i,1 is special and is called a joinpoint, while the remaining points ξ i,j with j ≥ 2 will be called i-cutpoints [7]. (b) Independently of the above, let U = (U (i) j ) i,j≥1 be a collection of i.i.d uniform (0, 1) random variables. These are not required to construct the tree but will be used to define a certain function on the tree.
An illustration of the ICRT construction with four point process The red points represent the joinpoint of the corresponding point process and the blue points the corresponding cutpoints. The last line contains the union of the four point processes. See Figure 3 for the corresponding tree. Write T β 0 for the corresponding tree after one has used up all the branches [0, η 1 ], {(η k , η k+1 ] : k ≥ 1}. Note that for every i ≥ 1, the joinpoint ξ i,1 corresponds to a vertex with infinite degree. Label this vertex i. The ICRT T β (∞) is the completion of the marked metric tree T β 0 . As argued in [7, Section 2], this is a real-tree as defined above which can be viewed as rooted at the vertex corresponding to zero. We call the vertex corresponding to joinpoint ξ i,1 hub i.
The uniform random variables (U (i) j ) i,j≥1 give rise to a natural ordering on T β (∞) (or a planar embedding of T β (∞) ) as follows: For i ≥ 1, let (T (i) j ) j≥1 be the collection of subtrees hanging off the i-th hub. Associate U (i) j with the subtree T (i) j , and think of T (i) j 1 appearing "to the right of" This is the natural ordering on T β (∞) when it is being viewed as a limit of ordered p-trees. We can think of the pair (T β (∞) , U ) as the ordered ICRT.
4.6. Continuum limits of components. The aim of this section is to give an explicit description of the limiting (random) metric spaces in Theorem 2.1. We start by constructing a specific metric space using the tilted version of the ICRT in Section 4.6.1. Then we describe the limits of maximal components in Section 4.6.3.
4.6.1. Tilted ICRTs and vertex identification. Let (Ω, F, P β ) and T β (∞) be as in Section 4.5. In [7], it was shown that one can associate a natural probability measure µ, called the mass measure, to T β (∞) , satisfying µ(L(T β (∞) )) = 1. Here we recall that L(·) denotes the set of leaves. Before moving to the desired construction of the random metric space, we will need to define some more quantities that describe the asymptotic analogues of the quantities appearing in Algorithm 2. Similarly to (4.10), It was shown in [15] that G (∞) (y) is finite for almost every realization of T β (∞) and for µ-almost every y ∈ T β (∞) . For y ∈ T β (∞) , let [ρ, y] denote the path from the root ρ to y. For every y, define a probability measure on [ρ, y] as Thus, this probability measure is concentrated on the hubs on the path from y to the root. Let γ > 0 be a constant. Informally, the construction goes as follows: We will first tilt the distribution of the original ICRT T β (∞) using the exponential functional to get a tilted tree T β, (∞) . We then generate a random but finite number N (∞) of pairs of points {(x k , y k ) : 1 ≤ k ≤ N (∞) } that will provide the surplus edges. The final metric space is obtained by creating shortcuts by identifying the points x k and y k . The construction mimics that of Algorithm 2. Formally the construction proceeds in four steps: (a) Tilted ICRT: Define P β on Ω by The expectation in the denominator is with respect to the original measure P β . Write (T β, .

) Second endpoints (of shortcuts) and identification:
Having chosen x k , choose y k from the path [ρ, x k ] joining the root ρ and x k according to the probability measure ) Identify x k and y k , i.e., form the quotient space by introducing the equivalence relation x k ∼ y k for 1 ≤ k ≤ N (∞) .
be the metric measure space constructed via the four steps above equipped with the measure inherited from the mass measure on T β, (∞) .
4.6.2. Scaling limit for the component sizes and surplus edges. Let us describe the scaling limit results for the component sizes and the surplus edges (#edges − #vertices + 1) for the largest components of CM n (d, p n (λ)) from [31]. Although we need to define the limiting object only for describing the limiting metric space, the convergence result will turn out to be crucial in Section 7 in the proof of Theorem 2.1, and therefore we state it here as well. Consider a decreasing sequence θ ∈ and Exp(r) denotes the exponential distribution with rate r. Consider the process for some λ ∈ R. Define the reflected version ofS λ . The processes of the form (4.21) were termed thinned Lévy processes in [17] since the summands are thinned versions of Poisson processes. Let (Ξ i (θ, λ)) i≥1 , (ξ i (θ, λ)) i≥1 , respectively, denote the vector of excursions and excursion-lengths of refl(S λ ∞ (t)) t≥0 , ordered according to the excursion lengths in a decreasing manner. Using [31,Fact 1], there are no ties among the excursion lengths almost surely. Denote the vector (ξ i (θ, λ)) i≥1 by ξ(θ, λ). The fact that ξ(θ, λ) is always well defined follows from [4, Lemma 1]. Also, define the counting process of marks N to be a Poisson process that has intensity refl S λ ∞ (t) at time t conditional on (refl S λ ∞ (u) ) u≤t . We use the notation N i (θ, λ) to denote the number of marks within Ξ i (θ, λ).
with respect to the topology on the product space 2 ↓ × N N . The limiting object in [31,Theorem 4] is stated in a slightly different form compared to the right hand side of (4.22). However, the limiting objects are identical in distribution with suitable rescaling of time and space, and by observing that rExp(r) d = Exp (1), where Exp(r) denotes an exponential random variable with rate r (see Appendix A). In fact, the arguments in Appendix A establish the following lemma which will be used extensively in Section 7:

Limiting component structures.
We are now all set to describe the metric space M i appearing in Theorem 2.1. Recall the graph G ∞ (β, γ) from Definition 2. Using the notation of Section 4.6.2, write ξ * i for ξ i ((µ(ν − 1)) −1 θ, (µ(ν − 1) 2 ) −1 ν 2 λ) and Ξ * i for the excursion corresponding to ξ * i . Note that ξ * i has the same distribution as (ν − 1)ξ i /ν, where ξ i is as in Proposition 4.3. Then the limiting space M i is distributed as where

UNIVERSALITY THEOREM
In this section, we develop universality principles that enable us to derive the scaling limits of the components for graphs that can be compared with the critical rank-one inhomogeneous random graph in a suitable sense. For the scaling limits in the basin of attraction of the Erdős-Rényi random graphs, such a universality theorem was proved in [10, Theorem 6.4], which was applied to deduce the scaling limits of the components for general inhomogeneous random graphs with a finite number of types and the configuration model with an exponential moment condition on the degrees. Here we focus on the universality class of the scaling limits in the heavy-tailed case. We first state the relevant result from [15] that was used in the context of rank-one inhomogeneous random graphs and then state our main result below. The convergence of metric spaces is with respect to the Gromov-weak topology, unless stated otherwise. Recall the measured metric spaces G m (p, a) and G ∞ (β, γ) defined in Definitions 1 and 2. (4.6). There exists a constant γ > 0 such that aσ(p) → γ.
Recall the definition of super graphs from Section 4.2 and denotẽ Remark 9. Assumption 4 only assumes that the diameter of the blobs are negligible compared to the graph distances inG bl m (p, a). This, in a way, is a necessary condition to ensure that the inherent structure of the blobs does not affect the limit. Theorem 5.2 shows that only Assumption 4 is also sufficient and additional assumptions as in [10,Assumption 3.3] are not required to prove universality in the Gromov-weak topology.
The rest of this section is devoted to the proof of Theorem 5.2.

5.1.
Completing the proof of the universality theorem in Theorem 5.2. To simplify notation, we writeG m ,G bl m , respectively, instead ofG m (p, a) andG bl m (p, a). Recall the definition of Gromov-weak topology from Section 4.1. Fix some l ≥ 1 and take any bounded continuous function φ : R l 2 → R. We simply write Φ(X) for Φ((X, d, µ)).

Key step 1. Let us write the scaled metric spaces asG
The above step, together with Theorem 5.1, completes the proof of Theorem 5.2.

Key step 2. For any
, and the same inequality also holds forG bl,s m . Thus, using Lemma 5.3, the proof of (5.3) reduces to showing that, for each fixed k ≥ 1, Main aim of this section. Below, we define a function g k φ (·) on the space T * IJ which captures the behavior of pairwise distances after creating k surplus edges. Under Assumption 4, we show that the introduction of blobs changes the distances within the tilted p-trees and the g k φ values negligibly. This completes the proof of (5.4).
For any fixed k ≥ 0, consider t ∈ T * I,(k+l) with root 0+, leaves i = (1+, . . . , (k + l)+) and root-toleaf measures ν t,i on the path [0+, i+] for all 1 ≤ i ≤ k + l. We create a graph G(t) by sampling, for each 1 ≤ i ≤ k, points i s on [0+, i+] according ν t,i and connecting i+ with i s . Let d G(t) denote the distance on G(t) given by the sum of edge lengths in the shortest path. Then, the function where ∂ is a forbidden state defined as follows: Given any t ∈ T * IJ , and a set of vertices v = (v 1 , . . . , v r ), we denote by t(v), the subtree of t spanned by v, i.e., t(v) is the subtree of t containing all vertices in v with minimal number of edges. We declare Thus, if t(v) = ∂, the tree t(v) necessarily has r leaves. Notice that the expectation in (5.5a) is over the choices of i s -values only. In our context, t is always considered as a subgraph of the graph on the vertex set [m] and thus we assume that t has inherited the labels from the corresponding graph. Thus t ∈ T * m I,(k+l) . There is a natural way to extend g k φ (·) to T * m I,(k+l) as follows: Considert ∈ T * m I,(k+l) and the corresponding t ∈ T * m I,(k+l) (see Section 4.3.3). Let 0+, i, (ν t,i ) i∈ [k+l] and (i s ) i∈[k+l] be as defined above. LetḠ(t) denote the metric space obtained by introducing an edge of length one between X i+is and X isi+ , where X ij has distribution µ i for all j ≥ 1, independently of each other and other shortcuts.
where the expectation is taken over the collection of random variables X i+is and X isi+ . At this moment, we urge the reader to recall the construction in Algorithm 2, Notice that the tilting does not affect the blobs themselves but only the superstructure. Recall also the definition of the tilting function L(·) from (4.13). Using the fact that (5.8) We first show that it is enough to prove Proposition 5.4 to complete the proof of (5.8), but before that we first need to state some results. The proofs of Facts 1 and 2 below are elementary and we omit the proof here. The proof of Proposition 5.4 is deferred to Section 5.2.
converges in distribution to some random variable.

Fact 2.
Suppose that (X m ) m≥1 is a sequence of random variables such that for every m ≥ 1, there exists a further sequence (X m,r ) r≥1 satisfying (i) for each fixed r ≥ 1, X m,r P − → 0 as m → ∞, and (ii) lim r→∞ lim sup m→∞ P(|X m − X m,r | > ε) = 0 for any ε > 0. Then X m P − → 0 as m → ∞.
Proof of (5.8) from Proposition 5.4. We apply Fact 1 with X m = L(T p m )1 {N (m) = k}, which is uniformly integrable by Lemma 5.5. Thus it is enough to show that Applying Lemma 5.5 again, the above reduces to showing We now apply Fact 2. Let Y m denote the term inside the expectation in (5.11). Further, sample the set of leaves V m independently r times on the same tree T p m and let Y i m denote the observed value First, to verify condition (ii), note that E p (X m,r ) = X m and therefore Chebyshev's inequality yields where C > 0 is a constant. Combining (5.12) and (5.13), the condition (ii) is verified. Next, condition (i) in Fact 2 is satisfied by Proposition 5.4 and (5.13). An application of Fact 2 concludes the proof of (5.11), and hence the proof of (5.8) follows.  To simplify the expression for dis(C m ), suppose that i(x) is an ancestor of i(y) on the path from 0+ to 1+. Then,

Comparing distances with and without blobs
for any x 0 ∈ M 0+ . This implies that where (X i,j ) i,j∈ [m] are the junction-points. Using Assumption 4 and (5.17), it is now enough to show that for any ε > 0, lim m→∞ P sup Denote the term inside sup above by Q k . Then, be an independent sequence such that ξ i is the distance between two points, chosen randomly from M i according to µ i . Further, let J and ξ be independent. Then R * can be thought of as the first repeat time of the sequence J.
Proof of Proposition 5.4 using Lemma 5.6. We use the objects defined in (5.14), (5.15) in the proof of Lemma 5.6 for all the path metric spaces with j ≤ k. We assume that we are working on a probability space such that the convergence (5.15) holds almost surely for all j ≤ k. To summarize, for fixed ε > 0 and for each j ≤ k, we can choose a correspondence C j m and a measure m j of [0+, j+] ×M j satisfying (i) (i, X ik ) ∈ C j m , for all i, k ∈ [0+, j+], (ii) dis(C j m ) < ε/2k almost surely, and (iii) D(m j ; ν j ,ν j ) = 0 and m j ((C j m ) c ) = 0. Recall the definitions of the function g k φ from (5.5a), (5.5b) and the associated graphs G(·),Ḡ(·). We simply write G andḠ for G(σ(p)T p m (V m )) and G( σ(p) Bm+1T p m (V m )), respectively. Let m ⊗k denote the k-fold product measure of m j for j ≤ k. We denote the graph distance on a graph H by d H . Note that where X i ∼ µ i independently for i ∈ [m], and the above expectation is with respect to the measure m ⊗k . Recall the notation while defining g k φ (·) in (5.5a), (5.5b). Notice that for any point k ∈ [0+, i+] and x k ∈ M k and x is ∈ M is , Now, for any path i+ to j+ in G, we can essentially take the same path from X i to X j inḠ and take the corresponding inter-blob paths on the way. The distance traversed inḠ in this way gives an upper bound on dḠ(X i , X j ). Notice that, by (5.24), taking a shortcut contributes at most ε/2k to the difference of the distance traveled in G andḠ. Also, traversing a shortcut edge contributes σ(p)B m /(B m + 1) and there are at most k shortcuts on the path. Furthermore, it may be required to reach the relevant junction points from X i and X j and that contributes at most 2σ(p)∆ max /(B m +1). Thus, for k + 1 ≤ i, j ≤ k + l, and sufficiently large m, By symmetry we can conclude the lower bound also, and the continuity of φ(·) (see [15,Theorem 4.18]) along with (5.23) completes the proof of Proposition 5.4.

MESOSCOPIC PROPERTIES: PROOFS OF THEOREMS 2.3 AND 2.4
At this moment, we urge the reader to recall the definitions from (2.9), (2.10), (2.11), and (2.12). The configuration model graphs considered in this section will be assumed to have degree sequence d and the vertices have an associated weight sequence w such that Assumption 2 is satisfied. We use the notation C, C to denote generic positive constants, whose values can be different in different lines. The rest of the section is organized as follows. In Section 6.1, we start by proving the required bound on the diameter in Theorem 2.4. In order to deal with the different terms in Theorem 2.3, we first obtain some moment estimates in Section 6.2, and these estimates are then used to prove asymptotics of s 2 in Section 6.3. The individual component weights are estimated in Section 6.4. In Section 6.5, we prove asymptotics of s 3 , and finally the mesoscopic typical distance is computed in Section 6.6.  If the maximum diameter is at least n δ (log n) 2 , then there exists a path of length n δ (log n) 2 , and therefore where the second step follows using (6.1). Thus the proof of Theorem 2.4 follows.

Moment bounds for total weights. Consider the size-biased distribution on the vertex set
[n] with sizes (w i ) i∈ [n] . Let V n and V * n , respectively, denote a vertex chosen uniformly at random and according to the size-biased distribution with respect to the sizes (w i ) i∈[n] , independently of the underlying graph CM n (d ). Let D n , W n (respectively D * n , W * n ) denote the degree and weight containing v. The reader should note the difference in notation that terms such as W i , C i with i in the subscript refer to the quantities defined in (2.10).
In this section, we prove the following moment bounds for W (V * n ), which will help us compute the expectation and variance of s 2 : Lemma 6.1. Under Assumption 2, the following holds: Proof of Lemma 6.1 (i). We use path-counting techniques for configuration models from [47, Lemma 5.1]. Let I l (v, k) denote the collection of x = (x i ) 0≤i≤l such that x 0 = v, x l = k and x i 's are distinct. Then, an identical argument to the proof of [47, Lemma 5.1] shows that, for any l ≥ 1, the expected number of paths of length exactly l starting from vertex v and ending at k is given by where the last step holds for l = o(n). Let A l (v, k) denote the event that there exists a path of length l from v to k and let A l (v, k) denote the event that there exist two different paths from v to k, one of length l and another one of length at most l − 1. Notice that Now, using (6.3) and Assumption 2, (6.4a) yields  where the final step follows from using the inclusion-exclusion principle. Also, the first term inside the sum in (6.3) is the probability that x = (x 0 , . . . , x l ) creates a path for some x. Thus, P(∃y ∈ I l (v, k) \ {x} : x, y both create paths from v and k).  path between a, b that is disjoint of x. Denote by A l (v, k, x, i) the event that the structure of type i (i=I, II, III, IV) in Figure 4 appears, where x ∈ I l (v, k). Using an argument identical to (6.5), and applying Assumption 2, it follows that where the (l − 1)(l − 2) factor is due to the possible choices of a, b, σ 3 (n) 2 is due to the two branch points for which three half-edges needs to be paired, and in the last step we have used the fact 6α − 3 + 3δ < 6α − 3 + 3η = 0 since δ < η. Similarly, with b = k, we get the Type-II structures in Figure 4, and thus Taking expectations with respect to V * n , all the terms in (6.9a), (6.9b), (6.9c) and (6.9d) are o(n δ ), where we use that E[(D * n ) 2 ] = O(n 3α−1 ). To compute the leading contribution to (6.4b), using (6.2) and (6.8), we lower bound where we have used the fact that d 1 l ≤ d 1 n δ (log n) 2 and inclusion-exclusion to obtain the third step, and (2.9), d 1 n η / n = c 1 /µ d (1 + o(1)) and the fact (ν n ) n δ (log n) 2 ≤ e −C(log n) 2 = o(1) in the last step. Thus, it follows that (1)), (6.11) and the proof of Lemma 6.1 (i) is now complete using (6.6).

Remark 10.
It may be worthwhile to point out that the upper bound (6.6) holds for any configuration model satisfying ν n < 1−n −ε 0 for some ε 0 > 0 and i∈[n] w 2 i = O(n). The rest of Assumption 2 is not required in the proof of this upper bound.
Proof of Lemma 6.1 (ii). Note that where D n (V * n , r) := Let us formulate a general upper bound on E[D n (V * n , r)] using similar computations as in (6.6). Note that if V * n Y k i for all i ∈ [r], then we must have a tree T with V * n as root and (k i ) i∈[r] as leaves. Let us "collapse" all the degree-two vertices in T except V * n . More precisely, we sequentially take a degree-two vertex (except V * n ), delete it, and create an edge between its neighbors. Denote the obtained tree by T = (V(T ), E(T )). Thus, T can be thought of as a rooted tree with V * n being its root, and (k i ) i∈[r] being its leaves. Also, T does not have any degree two vertices except possibly V * n . Further, note that r + 1 ≤ |V(T )| ≤ 2r, and thus r ≤ |E(T )| ≤ 2r − 1. Let m i (T ) denote the number of degree-i vertices in T and let d 0 (T ) be the degree of V * n in T . Let l e be the number of edges that are collapsed to create e ∈ E(T ). In that case, exactly l e − 1 degree-two vertices get collapsed in T . Using (6.2), we can restrict ourselves to the case l e ≤ n δ (log n) 2 , and the error due to such a restriction is given by ( w n ) r n 1+δ e −C (log n) 2 . For each T described above, i half-edges of V * n = v are being paired, for which there are (le−1) . (6.14) Note that Q n (T ) gives the contribution due to the pairing of the half-edges of the vertices in V(T )\ {V * n , k 1 , . . . , k r }, and R n (T ) is the total contribution due to degree-two vertices of possible trees T that could give rise to T after collapsing. Thus, d k w k r Q n (T )R n (T ) + C( w n ) r n 1+δ e −C (log n) 2 , (6.15) for constants C, C > 0.
Let us now apply (6.12) and (6.15) for the special case r = 2. Figure 5 describes the possible structures T . Application of (6.15) yields The two terms are O(n 3α+3δ−1 ) and O(n 3α+2δ−1 ) respectively. Also, where the first term is due to V * n = k 1 = k 2 , the second term is due to V * n = k 1 but V * n = k 2 or V * n = k 2 but V * n = k 1 , while the third term due to k 1 = k 2 but V * n = k 1 . The three terms are respectively O(n 3α−1 ), O(n 3α+δ−1 ) and O(n 3α+δ−1 ), where we have used (6.5) to compute the second term, and an analogous computation as in (6.5) to compute the final term (replacing w k by w 2 k in (6.5)). This proves our required upper bound that Proof of Lemma 6.1 (iii). We again use (6.12) and (6.15). For the third moment, the leading contributions to E[D n (V * n , 3)] arise from one of the structures given in Figure 6. We will use the fact that The contributions on E[D n (V * n , 3)] due to the first type of tree in Figure 6 are upper bounded by (6.19) where in the last step we have used the fact that 6α − 3 + δ < 6α − 3 + 1 − 2α = 2(2α − 1) < 0. The contributions due to the other three types of trees are respectively upper bounded by all of which are o(n 1+2δ ). Also, where the first term is due to |{V * n , k 1 , k 2 , k 3 }| = 1, the three cases in the second line are due to |{V * n , k 1 , k 2 , k 3 }| = 2 and the final two cases are due to |{V * n , k 1 , k 2 , k 3 }| = 3. Using the fact that max i∈[n] w i = O(n α ), we can use the estimates in (6.16) and (6.17) to show that the first term is O(n 4α−1 ), the next three terms are O(n 4α+δ−1 ), and the last two terms are O(n 4α+3δ−1 ). All these contributions are o(n 1+2δ ) and hence we conclude that E[(W (V * n )) 3 ] = o(n 1+2δ ).

Analysis of the susceptibility function s 2 .
Asymptotics of s 2 . The asymptotics of s 2 is a consequence of the Chebyshev inequality. Denote w n = i∈[n] w i . First, if E d denotes the conditional expectation given CM n (d ), then for any r ≥ 1, Therefore, using Lemma 6.1 and (2.9), it follows from Assumption 2 that where the equality of the second term in the third equality follows using similar arguments as in (6.21). Denote the last two terms of (6.23) by (I) and (II) respectively. To estimate (II), observe that, conditionally on the graph C (V * n ), the graph obtained by removing C (V * n ) from CM n (d ) is again a configuration model with the induced degree sequenced and number of verticesñ. Letν n denote the corresponding criticality parameter. In the proof of Lemma 6.1 (i), we observed that the upper bound holds wheneverν n < 1 − n −ε with ε ∈ (0, 1) (see Remark 10). To this end, let us show that there exists ε 0 ∈ (0, 1) and c 1 > 0 such that for all sufficiently large n, To see (6.24), first notice that Moreover, for any connected graph G, i∈G d i (d i − 2) ≥ −2 (this can be proved by induction) so that Next we use the following: There exists c 0 , c 1 > 0 (sufficiently small), and n 0 ≥ 1 such that for all n ≥ n 0 , P( j∈C (V * n ) d j ≥ n α+δ+c 0 ) ≤ e −n c 1 .
The proof of Fact 3 follows using the exploration process in Section 6.4, and martingale concentration inequalities such as [36] (see Appendix C for a detailed proof). The proof of (6.24) now follows using n = Θ(n), (2.9), and Fact 3.

Remark 11.
The method used to obtain the asymptotics of s 2 can also be followed verbatim to obtain the asymptotics of s pr . Indeed, notice that

(6.29)
A similar identity for the second moment of s pr also holds.
6.4. Barely subcritical masses. We now prove the asymptotics of W j in Theorem 2.3. The idea is to obtain the asymptotics for W (j) for each fixed j, and then show that W j = W (j) with high probability. Consider the breadth-first exploration of the graph starting from vertex j as follows: Algorithm 3. The algorithm carries along three disjoint sets of half-edges: active, neutral, dead.
(S0) At stage i = 0, the half-edges incident to j are active and all the other half-edges are neutral.
Order the initially active half-edges arbitrarily. (S1) At each stage, take the smallest half-edge e and pair it with another half-edge f , chosen uniformly at random from the set of half-edges that are either active or neutral. If f is neutral, then the vertex v to which f is incident, is not discovered yet. Declare the halfedges incident to v to be active and larger than all other active vertices (choose any order between the half-edges incident to v). Declare e, f to be dead. (S2) Repeat from (S1) until the set of active half-edges is empty.
Define the process S j n by S j n (l) = S j n (l − 1) + d (l) J l − 2, and S j n (0) = d j , where J l is the indicator that a new vertex is discovered at time l and d (l) is the degree of the discovered vertex, if any. Thus, when the exploration starts from vertex j, then S j n tracks the number of active half-edges. Let L := inf{l ≥ 1 : S j n (l) = 0}. By convention, we assume that S j n (l) = 0 for l > L. Let V l denote the vertex set discovered up to time l excluding j and I n i (l) := 1 {i ∈ V l }. Define I n j (l) ≡ 0. Also, let F l denote the sigma-field containing all the information upto time l in Algorithm 3. Note that Consider the re-scaled processS j n defined asS j n (t) = n −α S j n ( tn α+δ ). Then, using Assumption 2, The following three lemmas determine the asymptotics of W i and s 3 : Now the asymptotics of W j in Theorem 2.3 follows by an application of Lemma 6.4.
Next we provide a proof for Lemma 6.4. The proofs of Lemmas 6.2 and 6.3 follow using similar techniques as in [31], and thus are provided in Appendix B.
Proof of Lemma 6.4. Recall the definition of C j , W j from (2.10). For fixed K ≥ 1, if all the components (C j ) j∈ [K] are disjoint, then j = min{k : k ∈ C j }, i.e., j is the minimum index among the vertices in C j . In that case, W (j) = W j . Thus, it is enough to show that, for each fixed i, j ≥ 1, P(i and j are in same connected component) → 0.  (1))c i n α . By Lemma 6.2, and the fact that L j is continuous, it follows that the probability ofS j n having a jump of size at least εn α tends to zero for any fixed ε > 0. Thus we conclude (6.35) and the proof follows.
6.5. Analysis of the susceptibility function s 3 . The aim of this section is to prove the following proposition which estimates the contribution on s 3 due to components (C i ) i>K : Proposition 6.5. Suppose that Assumption 2 holds. For any ε > 0, Proof. Let G K denote the graph obtained by deleting all the edges incident to the vertices in [K].
In this proof, a superscript K to any previously defined object will correspond to the object in G K . Note that G K is again distributed as a configuration model conditioned on the new degree sequence . We also augment a previously defined notation with K in the superscript to denote the corresponding quantity for G K . Note that i∈ . Recall the definition of c i 's from Assumption 2. First, for each fixed K ≥ 1, where we have used the fact that δ < η = 1 − 2α < 1 − α in the last step. We aim to apply the upper bound (6.18). Since we have only deleted K = O(1) many vertices and Θ(n α ) many half-edges to obtain G K , it follows that (d K i ) i∈[n] also satisfies Assumption 2. We can apply the upper bound in (6.18), and thus which tends to zero in the iterated limit lim K→∞ lim sup n→∞ . Therefore, using the Markov inequality and the fact that c ∈ 3 ↓ \ 2 ↓ , it follows that, for any ε > 0, Now, the proof is complete by observing that i>K W 3

Remark 12.
Notice that the proof of Proposition 6.5 can be modified to conclude the similar results for i>K W i 2 |C i | and i>K W i |C i | 2 . Indeed, an analogue of (6.38) can be computed by observing Finally we prove the asymptotics of s 3 stated in Theorem 2.3: Asymptotics of s 3 . The proof follows by combining the asymptotics of W j and Proposition 6.5.

Remark 13.
The argument for s 3 can be followed verbatim to also conclude that 6.6. Mesoscopic typical distances. In this section, we obtain the asymptotics of D n in Theorem 2.3 using a similar analysis as in Section 6.3. Again the proof involves the Chebyshev inequality where the moments are estimated using path counting. We sketch the computation of E[D n ]. Recall the notations U * n , V * n , A l (v, k) and A l (v, k) from Section 6.3. Note that w k P (A l (V * n , k)) − P A l (V * n , k) . (6.41) Now compare the terms above to (6.4a), (6.4b). The only difference is that there is an extra multiplicative l here. Thus we can follows identical arguments as in the proof of (6.5), (6.11), and at the final step, we can use that l≥1 l(ν n ) l−1 = (1 − ν n ) −2 . Thus, (1)).

(6.42)
The variance terms can also be computed similarly. Due to the presence of l 2 in the second moment, we can use l≥1 l(l−1)(ν n ) l−2 = (1−ν n ) 3 . This gives rise to an additional fact 1/(1−ν n ) 2 = O(n 2δ ). Again, the identical arguments as (6.27) can be applied to show that Var (D n ) = o(n 4δ ). Thus the proof of the asymptotics of D n follows.

METRIC SPACE LIMIT FOR CRITICAL PERCOLATION CLUSTERS
The aim of this section is to complete the proof of Theorem 2.1. We start by defining the multiplicative coalescent process [3,4] that will play a pivotal role in this section.
Definition 3 (Multiplicative coalescent). Consider a (possibly infinite) collection of particles and let X(s) = (X i (s)) i≥1 denote the collection of masses of those particles at time s. Thus the i-th particle has mass X i (s) at time s. The evolution of the system takes place according to the following rule at time s: At rate X i (s)X j (s), particles i and j merge into a new particle of mass X i (s) + X j (s).
Before going into the details, let us describe the general idea and the organization of this section. The proof combines many ingredients and ideas from [10] and [31]. In Section 7.1 we consider a dynamically growing process of graphs that approximates the percolation clusters in the critical window. Now, the graphs generated by this dynamic evolution satisfy the following properties: (i) In the critical window, the components merge approximately as the multiplicative coalescent where the mass of each component is approximately proportional to the component size; (ii) The masses of the barely subcritical clusters satisfy nice properties due to Theorem 2.3. In Section 7.2, we derive the required properties in the barely subcritical regime for the dynamically growing graph process using Theorems 2.3 and 2.4. Section 7.3 is devoted to deriving scaling limits of functionals of G n (t c (λ)). In Section 7.4, we modify the dynamic process in such a way that the components merge exactly as multiplicative coalescent. Since the exact multiplicative coalescent corresponds to the rank-one inhomogeneous random graphs, thinking of these barely subcritical clusters as blobs, we use the universality theorem (Theorem 5.2) in Section 7.5 to determine the metric space limits of the largest components of the modified graph (Theorem 7.14). We finally complete the proof of Theorem 2.1 in Section 7.6.The proof of Theorem 2.2 is given in Section 7.7.

The dynamic construction and its properties.
Algorithm 4 (The dynamic construction). Let G n (t) be the graph obtained up to time t by the following dynamic construction: (S0) Initially, any vertex i has d i incident half-edges and all the half-edges are alive. During the construction, a half-edge can be in one of the following two sets: alive or dead. All the half-edges have an independent unit rate exponential clock attached to them. (S1) Whenever a clock rings, we take the corresponding half-edge, kill it and pair it with a halfedge chosen uniformly at random among the alive half-edges. The paired half-edge is also killed and the exponential clocks associated with killed half-edges are discarded.
Since a half-edge is paired with another unpaired half-edge, chosen uniformly at random from the set of all unpaired half-edges, the final graph G n (∞) is distributed as CM n (d). Define t c (λ) = 1 2 log ν n ν n − 1 + ν n 2(ν n − 1) λ n η . (7.1) We denote the i-th largest component of G n (t) by C (i) (t). In the subsequent part of this paper, we will derive the metric space limit of (C (i) (t c (λ))) i≥1 . The following lemma enables us to switch to the conclusions for the largest clusters of CM n (d, p n (λ)): Proposition 24]). There exists ε n = o(n −η ) and a coupling such that, with high probability, Let ω i (t) denote the number of unpaired/open half-edges incident to vertex i at time t in Algorithm 4. We end this section by understanding the evolution of some functionals of the degrees and the open half-edges in the graph G n (t). Let s 1 (t) denote the total number of unpaired half-edges at time t. Denote also s 2 (t) = i∈[n] ω i (t) 2 , s d,ω (t) = i∈[n] d i ω i (t). Further, we write µ n = n /n.
Proof. The proof uses the differential equation method [63]. Notice that, after each ring of an exponential clock in Algorithm 4, s 1 (t) decreases by two. Let Y denote a unit-rate Poisson process.
Using the random time change representation [33], where M n is a martingale. Now, the quadratic variation of M n satisfies M n (t) ≤ 4t n = O(n), which implies that sup t≤T |M n (t)| = O P ( √ n). Moreover, notice that the function f (t) = µ n e −2t as required. For s 2 (t), note that if half-edges corresponding to vertices i and j are paired, s 2 changes by −2ω i −2ω j +2 and if two half-edges corresponding to i are paired, s 2 changes by −4ω i +4. Thus, where M n is a martingale with quadratic variation given by M n (t) = O(n). Again, an estimate equivalent to (7.6) follows using Grőnwall's inequality. Notice also that when a clock corresponding to vertex i rings and it is paired to vertex j, then s d,ω decreases by d i + d j . Thus, where M n is a martingale with quadratic variation given by M n (t) ≤ 2t i∈[n] d 2 i = O(n). We can now apply Grőnwall's inequality as before. The proof of Lemma 7.2 is complete.

Entrance boundary for open half-edges. Define
The goal is to show that the open half-edges satisfy the entrance boundary conditions. Let d(t) = (d i (t)) i∈[n] denote the degree sequence of G n (t) constructed by Algorithm 4. Recall that G n (t) is a configuration model conditionally on d(t). Let us first derive the asymptotics of ν n (t n ). Recall that ω i (t) denotes the number of open half-edges adjacent to vertex i in G n (t). Notice that Using Lemma 7.2 and Assumption 1, Thus, (7.11) and (7.12) yields that ν n (t n ) = 1 − ν n n −δ + o P (n −δ ). Further, using the differential equation method again, the evolution of (ω i (t)) t≥0 is given by 13) and Assumption 1 yields that, for all T > 0, We aim to apply the results for the barely subcritical regime in Theorem 2.3 to the number of open half-edges ω(t n ) = (ω i (t n )) i∈ [n] . Notice that, by Lemma 7.2, (7.14) and Assumption 1, ω(t n ) and d(t n ) satisfy Assumption 2 with Let C i (t) be defined analogously as (2.10) for the graph G n (t). Denote f i (t) = k∈C i (t) ω k (t) and f (t) = (f i (t)) i≥1 . The following theorem summarizes the entrance boundary conditions for f (t).
Let s ω 2 , s ω 3 , D ω n respectively denote the quantities s 2 , s 3 , D n respectively with the weights being the number of open half-edges, and the underlying graph being G n (t n ).

Remark 14.
Setting w i = 1 for all i, we get the entrance boundary conditions for the component sizes also. In this case µ d = µ d,w = µ/ν. Replacing ω by c in the above notation to denote the component susceptibilities, it follows that

Components of the dynamically constructed graph.
The idea is to regard (C i (t n )) i≥1 , the connected components at time t n , as blobs. For t ≥ t n , the graph G n (t) should be viewed as a super-graph with the superstructure being determined by the edges appearing after time t n . Thus, the components of G n (t) can be regarded as a union of the blobs. For a component C , we use the notation B(C ) to denote the collection of indices corresponding to the blobs within C given by Let us denote the ordered components and the F-values of G n (t c (λ)) simply by (C (i) (λ)) i≥1 and (F i (λ)) i≥1 respectively. The goal in this section is to obtain the scaling of these component functionals, and also understand structural properties related to the surplus edges. Recall that SP(C ) denotes the number of surplus edges in the component C , i.e. SP(C ) = #edges in C − |C | + 1.
The following result gives the scaling limits of the rescaled component sizes and surplus edges of G n (t c (λ)): Proposition 7.4. Let (C (i) (λ)) i≥1 denote the ordered vector of components sizes of the graph G n (t c (λ)).
as n → ∞, with respect to the topology on 2 ↓ × N N , where the limiting objects are defined in Proposition 4.3.
The proof is a direct consequence of Lemma 7.1 and Proposition 4.3. See for example [31,Proposition 25]. The components consist of surplus edges within the blobs and the surplus edges in the superstructure. Next, let SP (C (i) (λ)) denote the number of surplus edges in the superstructure of C (i) (λ). Thus SP (C (i) (λ)) denotes the macroscopic surplus edges which are not inside some blob. The next result proves that all the surplus edges in the critical components are macroscopic. Further, it relates the component sizes and the F-values of G n (t c (λ)): Proposition 7.5. Assume that η/2 < δ < η. Then, for each 1 ≤ i ≤ K, the following hold: (a) With high probability, SP (C (i) (λ)) = SP(C (i) (λ)). Consequently, there are no surplus edges within blobs in C (i) (λ) with high probability; ν ξ with respect to the product topology. Since SP (C (i) (λ)) ≤ SP(C (i) (λ)) almost surely, for Part (a) it suffices to show that SP (C (i) (λ)) and SP(C (i) (λ)) have the same distributional limit. (7.18) Let G n denote the graph obtained from G n (t c (λ)) by shrinking each blob to a single node. Then, SP (·) can be viewed as the surplus edges in the components of G n . The graph G n can also be viewed to be constructed dynamically as in Algorithm 4 with the degree sequence being (f i (t n )) i≥1 . In the following, we investigate the relations between G n (t n ) and G n . Lemma 7.2 implies that the number of unpaired half-edges in G n (t n ) that are paired in G n (t c (λ)) is given by Note that the we have used δ > η/2 in (7.19).

Algorithm 5.
Define π n = νn νn−1 (n −δ + λn −η ) and associate f i (t n ) half-edges to the vertex i of G n . Construct the graph G n (π n ) as follows: (S1) Retain each half-edge independently with probability π n . (S2) Create a uniform perfect matching between the retained half-edges and obtain G n (π n ) by creating edges corresponding to any two pair of matched half-edges.
In (S1), if the total number of retained half-edges is odd, then add an extra half-edge to vertex 1. However, this possible addition of 1 extra half-edge will be ignored since it does not make any difference in the asymptotic computations. Notice that a i , the number of half-edges attached to i that are retained by Algorithm 5 (S1), is distributed as Bin(f i (t n ), π n ), independently for each i. Thus the number of half-edges in the graph G n (π n ) is distributed as a Bin(s 1 (t n ), π n ) random variable. We claim that there exists ε n = o(n −η ) and a coupling such that, with high probability G n (π n − ε n ) ⊂ G n ⊂ G n (π n + ε n ). (7.20) The proof follows using an identical argument as [31,Proposition 24] using the estimate (7.19) and standard concentration inequalities for binomial random variables. We skip the proof here. We now continue to analyze G n (π n ), keeping in mind that the relation (7.20) allows us make conclusions for G n . To analyze the component sizes and the surplus edges of the components of G (π n ) we first need some regularity conditions on a, the degree sequence of G n (π n ), as summarized in the following lemma: Proof. Using Theorem 7.3 and the fact that a i ∼ Bin(f i (t n ), π n ), one gets n −α a i = (1 + o P (1)) θ i ν . Moreover, i a i ∼ Bin( i f i (t n ), π n ) and i a i = (1+o P (1))π n i f i (t n ) yield the required asymptotics for a i / i a i . Next note that if X ∼ Bin(r, π), then Var(X(X − 1)) = 2r(r − 1)π 2 (1 − π)(1 + (2r − 3)π). Thus, Therefore, for any ε > 0, 24) and the required asymptotics for ν n (a) follows. To see (7.21) , and the proof follows again by using the condition on s ω 3 in Theorem 7.3.
Consider the exploration of the graph G n (π n ) via Algorithm 3, but now the first vertex is chosen proportional to its degree. Define the exploration process by S n similarly as the process S j n (l) in Section 6. where a n = i a i . Consider the re-scaled versionS n defined asS n (t) = n −α S n ( tn ρ−δ ). Define the limiting process The proof of Proposition 7.7 can be carried out using similar ideas as [31,Theorem 8]. A sketch of the proof is given in Appendix D. The excursion lengths of the exploration process give the number of edges in the explored components. Now, at each step l, the probability of discovering a surplus edge, conditioned on the past, is approximately the proportion of half-edges that are active. Note that the number of active half-edges is the reflected version of S n given by refl(S n (t)) = S n (t)−inf u≤t S n (u). Thus, conditional on (S n (l)) l≤tn ρ−δ , the rate at which a surplus edge appears at time tn ρ−δ is approximately n ρ−δ refl(Sn(tn ρ−δ )) i a i = 1 µ refl S n (t) (1 + o P (1)). Therefore, Proposition 7.7 implies that for each K ≥ 1, there exists components C 1 , . . . , C K ⊂ G n (π n ) such that where ξ i and N i are defined in Proposition 4.3. We refer to [31,Section 5.4] for more details regarding the proof of (7.27). Here we have also used the fact that the ordered excursion lengths of the process (S(t)) t≥0 , defined in (7.26), are identically distributed as the ordered excursion lengths of (S(t)/µ) t≥0 . We can now combine (7.20) and (7.27) to obtain the asymptotics for the number of blobs in the largest connected components and SP (·). Denote B(C ) = |B(C )| for a component C ⊂ G n (t c (λ)).
Proof. Notice that, j≤i |C j | ≤ j≤i |C (j) (λ)| for all i ∈ [K], almost surely. Thus, it is enough to prove that |C i | and |C (i) (λ)| involve the same re-scaling factor and have the same scaling limit. We again make use of the inclusions in graphs in (7.20). Algorithm 3 explores the components of G n (π n ) in a size-biased manner with the sizes being (a i ) i≥1 . An application of Lemma 7.13 with y i = C i (t n ) yields that, for any t > 0, uniformly for l ≤ tn ρ−δ , Since a i ∼ Bin(f i (t n ), π n ), we can apply concentration inequalities like [50,Corollary 2.27] and use the asymptotics from Theorem 7.3 to conclude that Thus, (7.28) and (7.29), together with (7.20), imply that ν|C i | n δ B(C i ) P − → 1, and it follows from Lemma 7.8 and Lemma 4.

Coupling with the multiplicative coalescent.
Recall the definitions of t c (λ) and t n from (7.1) and (7.9). Now, let us investigate the dynamics of f (t) starting from time t n . Notice that, in the time interval [t n , t c (λ)], components with masses f i (t) and f j (t) merge at rate and create a component with f i (t) + f j (t) − 2 open half-edges. Thus f (t) does not exactly evolve as a multiplicative coalescent, but it is close. We define an exact multiplicative coalescent that approximates the above process: Algorithm 6 (Modified process). Conditionally on G n (t n ), associate a rate 2/(s 1 (t n ) − 1) Poisson process P(e, f ) to each pair of unpaired-half-edges (e, f ). An edge (e, f ) is created between the vertices incident to e and f at the instance when P(e, f ) rings. However, the half-edges are not discarded after the pairing. At time t > t n , the obtained modified graphḠ n (t) consists of the edges of G n (t n ), and the edges created by this algorithm between times t n and t.
Proposition 7.10. There exists a coupling such that G n (t) ⊂Ḡ n (t) for all t > t n with probability one.
Proof. Recall the construction of G n (t) from Algorithm 4. We modify (S1) as follows: whenever two half-edges are paired, we do not kill the corresponding half-edges and do not discard the associated exponential clocks. Instead we reset the corresponding exponential clocks. The graphs generated by this modification of Algorithm 4 have the same distribution asḠ n (t), conditionally on G n (t n ). Moreover, the above also gives a natural coupling such that G n (t) ⊂Ḡ n (t), by viewing the event times of Algorithm 4 as a thinning of the event times of the modified process.
Henceforth, we will always assume that we are working on a probability space such that Proposition 7.10 holds. Recall that the connected components at time t n , (C i (t n )) i≥1 , are regarded as blobs, and we can also viewḠ n (t) as a super-graph with the superstructure being determined by the edges appearing after time t n in Algorithm 6. Let us denote the ordered connected components ofḠ n (t) by (C (i) (t)) i≥1 . DefineF TheF-value is regarded as the mass of component C (i) (t) at time t. Note that for the modified process in Algorithm 6, conditionally on G n (t n ), at time t ∈ [t n , t c (λ)],C (i) (t) andC (j) (t) merge at exact rate 2F i (t)F j (t)/(s 1 (t n ) − 1) and the new component has massF i (t) +F j (t). Thus, the vector of masses (F i (t)) i≥1 merge as an exact multiplicative coalescent. 7.5. Properties of the modified process. Notice that, conditionally on G n (t n ), blobs b i and b j are connected inḠ n (t c (λ)) with probability where the o P (·) term appearing above is uniform in i, j. Thus, using Theorem 7.3, (7.32) is of the form 1 − e −qx i x j (1+o P (1)) with where σ r (x n ) = i≥1 (x n i ) r . By Theorem 2.3, the sequence x n satisfies the entrance boundary conditions of [4], i.e., To simplify the notation, we writeF i (λ) forF i (t c (λ)) andC (i) (λ) forC (i) (t c (λ)). We next relate (F i (λ)) i≥1 to (C (i) (λ)) i≥1 , for each fixed i. Proposition 7.12. As n → ∞,F i (λ) = (ν − 1)|C (i) (λ)| + o P (n ρ ). Consequently, ν ξ with respect to the product topology. We will need the following lemma, the proof of which is same as [ Proof of Proposition 7.12. We only prove the asymptotic relation ofF 1 (λ) and |C (1) (λ)|. Consider the breadth-first exploration of the supestructure of graphḠ n (t c (λ)) (which is also a rank-one inhomogeneous random graph) using the Aldous-Limic construction from [4,Section 2.3]. Notice that the vertices are explored in a size-biased manner with the sizes being x = (x i ) i≥1 , where are as defined in (7.33). Let v(i) be the i-th vertex explored. Further, letC st (i) (λ) denote the componentC (i) (λ), where the blobs have been shrunk to single vertices. Then, from [4], one has the following: (i) there exists random variables m L , m R such thatC st (i) (λ) is explored between m L + 1 and m R ; where γ is some non-degenerate, positive random variable.
Let y i = n −ρ |C b i (t n )|. Using Theorem 7.3, Remark 13 and Remark 14, it follows that i x r i y s i = O P (n 3δ−3η ); for r + s = 3, i x i = O P (n 1−ρ ), and i x r i y s i = O P (n −2ρ+1+δ ); for r + s = 2. Below, we show that The proof of Proposition 7.12 follows from (7.36) by using Theorem 7.3 and observing that i x 2 To prove (7.36), we will now apply Lemma 7.13. Denote m 0 = i x i / i x 2 i and consider l = 2T m 0 for some fixed T > 0. Using Theorem 7.3, an application of Lemma 7.13 yields Now, for any ε > 0, T > 0 can be chosen large enough such that m R i=1 x v(i) > T has probability at most ε and on the event sup k≤2T m 0 An identical argument as above shows that where m 0 = i x i / i x i y i . The proof of (7.36) now follows from (7.38) and (7.39). The asymptotic distribution for (n −ρ |C (i) (λ)|) i≥1 can be obtained using Proposition 7.11.
Recall that ω i (t n ) denotes the number of open-half edges attached to vertex i in the graph G n (t n ). We now equipC (i) (λ) with the probability measure µ i fr given by µ i fr (A) = k∈A ω k (t n )/F i (λ) for A ⊂C (i) (λ), and denote the corresponding measured metric space byC fr (i) (λ). Theorem 7.14. Under Assumption 1, as n → ∞, with respect to the S N * topology, where M i is defined in Section 4.6.3.
Proof. We just consider the metric space limit ofC fr (i) (λ) for each fixed i ≥ 1 and the joint convergence in (7.40) follows using the joint convergence of different functionals used throughout the proof. Recall the notation B(C ) := {b : C b (t n ) ⊂ C , C b (t n ) = ∅} for a component C . Now,C fr (i) (λ) can be seen as a super-graph as defined in Section 4.2 with (i) the collection of blobs {C b (t n ) : b ∈ B(C (i) (λ))} and within blob measure µ b given by µ b (A) = k∈A ω k (t n )/f b (t n ), A ⊂ C b (t n ), b ∈ B(C (i) (λ)); (ii) the supersturcture consisting of the edges appearing during [t n , t c (λ)] in Algorithm 6 and weight sequence (f b (t n )/F i (λ) : b ∈ B(C (i) (λ))). (7.33). Let d(·, ·) denote the graph distance onC (i) (λ) and define Here u i gives the average distance within blob C b i (t n ). Using Lemma 7.13, we will show The argument is same as the proof of (7.36). We only have to ensure that (7.35) holds with y i = x i u i . Thus, we need to show that First, notice that, by Lemma 7.2 and Theorem 7.3, (1)).

(7.44)
Also, recall from Theorem 2.4 that u max = max b u b = O P (n δ log(n)). Now, = O P n −ρ n α+δ n δ log 2 (n) n ρ−δ n 2δ−ρ = O P (n δ−η log 2 (n)) = o P (1), and (7.43) follows, and hence the proof of (7.42) also follows. Recall that the superstructure of G n (t c (λ)) has the same distribution as a Norros-Reittu random graph NR n (x, q) with the parameters given by (7.33). Thus, using Proposition 4.2, we now aim to use Theorem 5.2 on C fr (i) (λ) with the blobs being (C i (t n )) i≥1 , and p (i) n , a (i) n given by (4.8).
) . Let N (R + ) denote the space of all counting measures equipped with the vague topology and let viewed as an element of S N . Recall the definition of ξ * i and Ξ * i from Section 4.6.3. Define Without loss of generality, we assume that the convergence in (7.47) holds almost surely. Now, using (7.42), it follows that where the last step follows from Theorem 7.3, (7.44) and (7.47). The proof of Theorem 7.14 is now complete using Theorem 5.2.
In the final part of the proof, we will also need an estimate of the surplus edges in the com-ponentsC (i) (λ), that can be obtained by following the exact same argument as the proof outline of Lemma 7.8. Recall that the superstructure on the graphḠ n (t c (λ)) is a rank-one inhomogeneous random graph NR n (x, q). The connection probabilities given by (7.32) can be written as Now, we may consider the breadth-first exploration of the above graph and define the exploration process S NR n (l) = i z i (λ)I n i (l) − l, as in (7.25). The only thing to note here is that the component sizes are not necessarily encoded by the excursion lengths above the past minima of S NR n . However, ifS NR n (l) = i I n i (l) − l, then it can be shown that (see [17,Lemma 3.1])S NR n and S NR n have the same distributional limit. Thus, a conclusion identical to Proposition 7.7 follows forḠ n (t c (λ)). Due to the size-biased exploration of the components one can also obtain analogues of Lemmas 7.8 and 7.9 for G n (t c (λ)). The proof is identical to [31,Lemma 31]. Recall Theorem 7.14 and the terminologies therein. Let n −η C fr (i) (λ) denote the measured metric space with measure µ i fr and the distances multiplied by n −η . At this moment, let us bring together the relevant properties C (i) (λ) andC (i) (λ): (A) By Lemma 7.16, it follows that with high probability C (i) (λ) ⊂C (i) (λ) for any fixed i ≥ 1. with high probability. Moreover, with high probability there is no surplus edge within the blobs by (7.18). This implies that, for any pair of vertices u, v ∈ C (i) (λ), with high probability, the shortest path between them is exactly the same in C (i) (λ) andC (i) (λ).
Thus, from the definition of Gromov-weak convergence in Section 4.1, an application of Theorem 7.14 yields that n −η C fr The only thing remaining to show is that we can replace the measure µ i fr by µ ct,i . Now, using Propositions 7.4 and 7.5 (b), it is enough to show that b∈B(C (i) (λ)) f b (t n ) − (ν − 1)|C b (t n )| = o P (n ρ ). (7.49) Indeed, during the breadth-first exploration of the superstructure of G n (t c (λ)), the blobs are explored in a size-biased manner with the sizes being (f i (t n )) i≥1 . Therefore, one can again use Lemma 7.13. Recall that, by Lemma 7.9, for any ε > 0, one can choose T > 0 so large that the probability of exploring C (i) (λ) within time T n ρ−δ is at least 1 − ε. Thus, if V b l denotes the set of blobs explored before time l, then, for any T > 0, Using the Cauchy-Schwarz inequality and Theorem 7.3 it now follows that the above term is o(n ρ ). Therefore (7.49) follows. Finally, to conclude the result for percolated graphs, we use Lemma 7.1.
In fact, if C + (i) (λ) and C − (i) (λ) denote the i-th largest component of G n (t c (λ) + ε n ) and G n (t c (λ) − ε n ) respectively, then analogously to Lemma 7.16, we can conclude that with high probability for any fixed i ≥ 1. This completes the proof of Theorem 2.1.

Remark 15.
The fact that the measure can be changed from µ i fr to µ ct,i in n −η C fr (i) follows only from (7.49), which again follows from the entrance boundary conditions. However, the entrance boundary conditions in Theorem 2.3 hold for weight sequences w = (w i ) i∈[n] under rather general assumptions (see Assumption 2). Therefore, one could also replace the measure µ ct,i by µ w,i , where µ w,i = i∈A w i / k∈C (i) (λ) w i and w satisfies Assumption 2. 7.7. Graphs conditioned on simplicity: Proof of Theorem 2.2. We will use the following joint construction of the CM n (d, p n (λ)) and CM n (d).

Algorithm 7.
(S0) Let p n = 2X, where X ∼ Bin( n /2, p n (λ)). Pick p n many half-edges uniformly at random and color them blue. Color the rest of the half-edges red. (S1) Pair the blue half-edges using a uniform perfect matching. (S2) Pair the red half-edges using another independent uniform perfect matching.
If G I is the graph obtained after (SI) with I=1,2, then (G 1 , G 2 ) is jointly distributed as (CM n (d, p n (λ)), CM n (d)) [32,Lemmas 8.1,8.2]. Let d p i be the number of blue half-edges incident to i and let d p = (d p i ) i∈ [n] . Then, by construction, G 1 , conditionally on d p , is distributed as CM n (d p ). To complete the proof of Theorem 2.2, consider the exploration algorithm given by Algorithm 3, now on the graph G 1 , conditionally on the blue half-edges selected in Algorithm 7 (S0). The starting vertex is chosen in a size biased manner with sizes proportional to the degrees d p . Let F l denote the sigma-algebra generated by the exploration process up to time l. Let I n i (l) denote the indicator that vertex i is discovered upto time l and note that Algorithm 3 will explore the vertices in a size-biased manner with sizes being d p . For convenience, we denote X = (C p (i) (λ)) i≤K in this section. Consider a bounded continuous function f : (S * ) K → R. Recall from [46, Theorem 1.1] that lim inf n→∞ P (G 2 is simple) > 0.
Thus, it is enough to show that E [f (X)1 {G 2 is simple}] − E [f (X)] P (G 2 is simple) → 0. Now, for any T > 0, let A n,T denote the event that X is explored before time T n ρ by the exploration algorithm. Using [31,Lemma 13], it follows that Further, let B n,T denote the event that a vertex v is explored before time T n ρ such that v is involved in a self-loop or a multiple edge in G 2 . Let v l denote the exploring vertex in the exploration at time l. Without loss of generality, we assume that during the sequential pairing of the half-edges in Algorithm 7 (S2), we first pair the red half-edges associated to (v l ) l≤T bn . Let n := p n − 2T b n − d p 1 + 1 and n := ( n − p n ) − i∈[n] d i I i (T n ρ ). Note that, by Assumption 1 (ii), i∈V d i = o(n) whenever |V | = o(n). Also, using concentration inequalities for the Binomial distribution, p n = p n (λ) n (1 + o(1)) almost surely. Thus, we assume that n , n ≥ c 1 n with probability 1 for some 0 < c 1 < 1. Note that, uniformly over l ≤ T b n , any blue half-edge of v l creates a self-loop in G 1 with probability at most d p v l / n , and any red half-edge creates a self-loop with probability at most (d v l − d p v l )/ n . Thus the expected number of self-loops incident to v l is at most E[d 2 v l ]/c 1 n . Moreover, the expected number of blue-blue multiple edge attached to v l in G 1 , is at most where we have used Assumption 1. While counting the multiple edges incident to v l in G 2 , we have to take care of (i) the creation of a red edge between two vertices having a blue edge and (ii) the creation of two red edges between two vertices. Using identical arguments, the expected number of multiple edges incident to v l in G 2 is at most CE[d 2 v l ]/ n . Therefore, Now, using Assumption 1, for every fixed K ≥ 1, Further, conditionally on Algorithm 7 (S0), the vertices are explored in a size-biased manner with sizes being (d p i / p n ) i∈ [n] . Therefore, using that p n = p n (λ) n (1 + o(1)), Now, by Assumption 1, the final term in (7.57) tends to zero if we first take lim sup n→∞ and then take lim K→∞ . Consequently, for any fixed T > 0, lim n→∞ P (B n,T ) = 0.
(7.58) Let E n,T denote the event that no self-loops or multiple edges are attached to the vertices in G 2 that are discovered after time T n ρ . Then (7.53) and (7.58)  lim n→∞ E [f (X T )P (E n,T |F T n ρ , B n,T )] .

(7.59)
Let G * T n ρ denote the graph obtained from G 2 after removing the vertices discovered upto time T n ρ . Then, conditionally on F T n ρ ∩ B n,T , E n,T happens if and only if G * T n ρ is simple. Also, G * T n ρ is distributed as a configuration model conditional on its degree sequence, and since only o(n) vertices have been removed, the corresponding ν n in G * T n ρ converges in probability to 1. Thus, [42,Theorem 7.12] implies that P (G * T n ρ is simple|F tn ρ ) P − → e −3/4 , (7.60) and also P (G 2 is simple) → e −3/4 , so that P (G * T n ρ is simple|F tn ρ ) − P (G 2 is simple) P − → 0.  Also, d (j) ≤ Cn α almost surely. Thus, applying (C.7) with a = εt n n −δ , b = Ct n n 3α−1 and R = Cn α , and also using (C.9), it follows that P(M n (t n ) > εt n n −δ ) ≤ exp − C ε 2 t 2 n n −2δ 2(t n n 3α−1 + εn α t n n −δ ) ≤ C e −C εn ε 0 , (C. 10) where in the last step we have used the fact that α − δ > α − η = 3α − 1 and t n = n α+δ+ε 0 . Thus the proof of (C.5) follows.
APPENDIX D. LIMIT OF EXPLORATION PROCESS: PROOF SKETCH FOR PROPOSITION 7.7 The proof of Proposition 7.7 can be carried out using similar ideas as [31,Theorem 8].
The key idea to prove Proposition 7.7 is that the scaling limit is governed by the vertices having large degrees only. More precisely, for any ε > 0 and T > 0, with the first K (fixed) terms it is enough to show that the iterated limit of the truncated process (first taking lim n→∞ and then lim K→∞ ) converges to S with respect to the Skorohod J 1 topology. Now, using the fact that a i / i a i P − → θ i /(µν), and the fact that the vertices are explored in a sizebiased manner with sizes being (a i ) i≥1 , it follows that (see [31,Lemma 9]), for each fixed K ≥ 1, This concludes the proof of Proposition 7.7.