SPEEDING UP RANDOM WALK MIXING BY STARTING FROM A UNIFORM VERTEX

. The theory of rapid mixing random walks plays a fundamental role in the study of modern randomised algorithms. Usually, the mixing time is measured with respect to the worst initial position. It is well known that the presence of bottlenecks in a graph hampers mixing and, in particular, starting inside a small bottleneck significantly slows down the diffusion of the walk in the first steps of the process. The average mixing time is defined to be the mixing time starting at a uniformly random vertex and hence is not sensitive to the slow diffusion caused by these bottlenecks. In this paper we provide a general framework to show logarithmic average mixing time for random walks on graphs with small bottlenecks. The framework is especially effective on certain families of random graphs with heterogeneous properties. We demonstrate its applicability on two random models for which the mixing time was known to be of order ( log 𝑛 ) 2 , speeding up the mixing to order log 𝑛 . First, in the context of smoothed analysis on connected graphs, we show logarithmic average mixing time for randomly perturbed graphs of bounded degeneracy. A particular instance is the Newman-Watts small-world model. Second, we show logarithmic average mixing time for supercritically percolated expander graphs. When the host graph is complete, this application gives an alternative proof that the average mixing time of the giant component in the supercritical Erdős-Rényi graph is logarithmic.


Introduction
Random walks on graphs are one of the fundamental tools for sampling (see, e.g., [38]).Applications are numerous in areas such as computer science, discrete mathematics and statistical physics.Prominent examples include the polynomial-time algorithm to estimate the volume of a convex body [19], computing the matrix permanent [28] or the use of Glauber dynamics to sample from Gibbs distributions, in particular from proper colourings [42].
Most usually, the size of the sampling space is exponential in the input size, and fully exploring this space is computationally intractable.The Markov chain Monte Carlo (MCMC) method consists of running a random walk in an appropriately chosen graph, whose vertex set is the sample space, until its distribution is arbitrarily close to equilibrium, regardless of the initial state.At that time we say the walk has mixed, and the time until it does is called the (worst-case) mixing time.To obtain efficient sampling algorithms it suffices to prove that the mixing time is poly-logarithmic in the input size.
The connection between rapid mixing and expanders is well-established.In the context of random walks, expansion is measured by means of a graph parameter called conductance; see Section 2.2 for the precise definition.Jerrum and Sinclair [28] gave an upper bound on the mixing time depending on the conductance and the logarithm of the minimum stationary value.This bound is central in the theory of Markov chains.
Random environments are particularly interesting sampling spaces and, in the last 20 years, researchers have developed the theory of random walks on random graphs.As expected, the good expansion properties of random graphs ensure rapid mixing.By the Jerrum-Sinclair bound, graphs with conductance bounded away from zero mix in logarithmically many steps and usually exhibit cut-off, that is, the distribution converges rapidly to the stationary distribution in a small window of time.Good examples are random graph models with control on the degrees, such as random regular graphs [34], random graphs with given degree sequences [4,6], their directed analogues [9,12], or graphs perturbed by random perfect matchings [27].
Nonetheless, the presence of small obstructions slows down the mixing.A canonical example is the giant component of a sparse Erdős-Rényi graph (, /) with  > 1.This component contains relatively small bottlenecks, that is, connected sets that only have few edges connecting them to the rest of the graph.In such cases, tools like the Jerrum-Sinclair bound fail to pin down the correct order of the mixing time.Fountoulakis and Reed [23] introduced a strengthening of the bound that is sensitive to small bottlenecks and used it to show that the mixing time of the largest component in (, /) is asymptotically almost surely (a.a.s. for short) (log 2 ) [24].Indeed, this is the correct order as the component contains paths of degree 2 vertices (also referred to as bare paths) whose length is of order log .Starting at the centre of such paths, a random walk takes Ω(log 2 ) steps in expectation to escape from it.We remark that the mixing time in the supercritical random graph (, /) was also bounded independently by Benjamini, Kozma and Wormald [5], using a different approach investigating the anatomy of the giant component.
However, these local bottlenecks are a negligible part of the giant component and the rest of the component has good expansion properties.This suggests that, if the random walk started outside the bottlenecks, the mixing time would decrease.This was implicit in the work of Benjamini, Kozma and Wormald [5] and their description of the giant component, and such a speeding up of mixing time was also conjectured explicitly by Fountoulakis and Reed [24].Berestycki, Lubetzky, Peres and Sly [6] confirmed their prediction, showing that there exists  =  () such that the mixing time starting at a uniformly random vertex is asymptotically  log  with high probability (they in fact proved much more, establishing the value of  () precisely as well as cut-off for the random walk).This result reinforces the idea that, in certain heterogeneous scenarios, averaging over the starting position yields more efficient sampling algorithms.
The goal of this paper is to provide a general framework to show logarithmic average-case mixing time for random walks on graphs with small bottlenecks.
1.1.Average mixing times.Given an -vertex graph , the lazy random walk over  is a Markov chain with state space  () which can be defined as follows.If at any given time we are in a vertex  ∈  (), the lazy random walk stays in  with probability 1/2, and with probability 1/2 it moves to a uniformly random neighbour of  in .If  is a connected graph, it is well known that the lazy random walk over  is ergodic and its distribution converges to the (unique) stationary distribution   (see, e.g., [33] for a comprehensive review of random walks and mixing times).
The total variation distance  TV (, ) between two probability distributions  and  on the vertex set  () of a graph  is defined as Let   be the transition matrix of the lazy random walk over .For  > 0, the -mixing time  mix (, ) of this lazy random walk is defined as where   0 is the distribution supported entirely on  ∈  ().If instead of considering the worst-case initial vertex we consider a uniformly random vertex  ∈  (), then the quantity  TV (  0    ,   ) is a random variable.We define the average -mixing time tmix (, ) of the lazy random walk, to be the time at which the expectation of this random variable falls below the .That is, tmix (, ) min In this work, we will focus on the quantity tmix (, ), which we believe is a natural candidate for tracking mixing times starting from a uniform vertex.Nonetheless, other related quantities have been used to measure the mixing time from a uniform starting point.Indeed, for a vertex  ∈  (), define  () mix (, ) min  ∈ ℕ 0 :  TV (  0    ,   ) ≤  and consider the random variable  (  ) mix =  (  ) mix (, ), where   is a vertex chosen uniformly at random from  ().This notion was the one studied by Berestycki, Lubetzky, Peres and Sly [6].It is natural to compare tmix to ( (  ) mix ): in the first case, we average the total variation distance over starting vertices and take the smallest time  when this average is smaller than ; in the second one, we average the mixing times over the starting vertices (see Figure 1).In general as functions, neither of these notions is stronger than the other, in that one can design examples of trajectories for total variation distances  TV (  0    ,   ) for different vertices , showing that tmix cannot be bounded by a function of ( (  ) mix ) and vice versa.However, bounding either ( (  ) mix ) or tmix implies that  (  ) mix is small with high probability.In the first case this is a direct application of Markov's inequality.In the second one, define   () TV (  0    ,   ), for a vertex , then tmix (, ) is the time  at which the expected value of   () (averaged over starting points) is less than .By Markov's inequality,    ( tmix (,  2 )) ≤  with probability at least 1 − .A related but very different notion is the time it takes to mix starting at   , the uniform distribution over :  () mix (, ) min  ∈ ℕ 0 :  TV (     ,   ) ≤  .A similar notion has been studied for directed graphs, where the initial distribution is the in-degree one; see, e.g., [9,Theorem 3].In general, this latter notion of average mixing time is much smaller than the previous notions and we expect this to also be the case in the settings studied here, although we do not explore this direction.
Remark 1.2.In the literature, the mixing time of the random walk is often defined as  mix ()  mix (, 1/4), since the distance to the stationary distribution is contractive after this time.However, this might not be the case for tmix .Consider for instance the lollipop graph  , : a clique on  vertices and a path on  −  vertices joined by an edge incident to one of the endpoints of the path.If  and  −  are both very large, then, after one step, the total variation distance is roughly 0 if we start at the clique (almost all the mass of   , is supported on the clique), and roughly 1 if we start at the path.Taking  = ⌈⌉, then 1 If  > 3/4, then tmix ( , , 1/4) = 1.However, the time required to further decrease the distance to the stationary distribution is of order Ω( 2 ), as this is the time required for the walk starting at a typical vertex in the path to hit the clique.
1.2.Our results.Our results will apply to graphs satisfying certain natural structural conditions, which we formalise in the following definition.
Definition 1.3.Let  be an -vertex graph.For  > 0, we say that a set For  > 0, we say that a set We say that  is an (, )-spreader graph if it satisfies the following three properties: Note that, for  > log 2 , one has that e − √  < 1, and thus the conditions on an -vertex graph  being an (, )-spreader graph guarantee that there are no -connected vertex subsets of size between log 2  and (1 − 1/ 2 ) that have too few edges leaving the set (S1) or too many edges contained inside the set (S2).These pseudo-random conditions on expansion and edge distribution arise naturally in the context of random graph models.Indeed, the density of a random graph within any vertex set and across any vertex partition is expected to be the same as the density of the whole graph, and concentration inequalities in conjunction with union bounds can be used to derive the non-existence of such bad connected vertex sets with high probability.Moreover, the conditions of Definition 1.3 bound the number of bad vertex sets of size between (log ) 1/5 and log 2 , with exponential decay as one can expect from concentration inequalities on binomial random variables.In the context of the current work, the conditions on spreader graphs will guarantee that all bottlenecks are small and they are scarce in the graph.
To digest the notion of spreader graphs, one can think of  > 0 as an arbitrarily small constant and  as arbitrarily large.The parameter  > 0 controls conditions (S1) and (S2) in the sense that, as  shrinks, these conditions become easier to satisfy and thus the definition of spreader graphs captures more graphs.Similarly, the parameter  controls (S3) and imposes in particular that the spreader graphs are sparse with bounded average degree.It should be noted that, due to  appearing in (S1) and (S2) and  appearing in (S3), our definition is not actually monotone in these parameters.This is a technical subtlety that is needed in our proof to guarantee a trade-off between the conditions.However, in all applications, the restraints given by  in (S1) and (S2) and  in (S3) are never critical, as we have very good control over the edge distribution in all linear sets.
We also remark that the constant 1/5 could be replaced by any constant  < 1/4.Indeed, for sets smaller than (log )  , we impose no restriction.The point is that, as we will focus on connected spreader graphs , even if a small set is an extreme bottleneck, the random walk will not get stuck there for too long before exploring the set enough to escape.In our proof, these bottlenecks contribute a factor (log ) 4 (due to a connected set of size  having conductance at least 1/ 2 and Theorem 2.1 giving a quadratic dependence on conductance, see Section 2.2 for details), hence a choice of  < 1/4 guarantees that this contribution is negligible.It may be possible to replace this constraint of 1/4 by 1/2 but not beyond this.
Finally, we remark that the constants  and  could be replaced with functions that depend on  and the definition of spreader graphs could be adjusted so that our main theorem would still give bounds on average mixing times.However, as our focus is on sparse graphs with constant average degree, we do not pursue this direction here.
Remark 1.4.The definition of (, )-spreader graphs bears resemblance with that of -AN graphs (or -decorated expanders) introduced in [5].An -AN graph  is defined in terms of the existence of an expander subgraph  whose complement is formed by a small number of small components, similar to what can be deduced from (S1)-(S3), and additionally requiring that not too many components of  −  are connected to each  ∈  ().The backbone of the main result in [5] is to show that random walks on -AN graphs mix in (log 2 ) steps.
Our main theorem provides a tool to prove logarithmic average mixing time for (, )-spreader graphs.
We believe that in many cases, as in our two applications below, this theorem can be used to quickly derive optimal bounds for average mixing times in settings where worst-case mixing times are established via conductance bounds.
The proof of Theorem 1.5 bears some similarities with the proof in [6].Both use the idea of contracting badly connected sets and coupling the random walks in the original and the contracted graphs.However, our proof is conceptually simpler as it does not use the anatomy of the giant component [16], a powerful description of the largest component in the supercritical regime.Instead, we rely on the Fountoulakis-Reed bound for mixing [23] and recent progress on hitting time lemmas [35].
1.3.Application 1: Smoothed analysis on connected graphs.The idea of studying the effect of random perturbations on a given structure arose naturally in several distinct settings.In theoretical computer science, Spielman and Teng [40] (see also [41]) introduced the notion of smoothed analysis of algorithms.By randomly perturbing an input to an algorithm, they could interpolate between a worst-time case analysis and an average case analysis, leading to a better understanding of the practical performance of algorithms on real life instances.This has been hugely influential, leading to the study of smoothed analysis in a host of different settings, including numerical analysis [39,43], satisfiability [14,22], data clustering [3], multilinear algebra [7] and machine learning [29].Almost simultaneously, in graph theory, Bohman, Frieze and Martin [8] introduced the model of randomly perturbed graphs which, as with smoothed analysis, allows one to understand the interplay between an extremal and probabilistic viewpoint.The majority of work on the subject has focused on dense graphs [10,11,26].
In the context of random walk mixing, it can be seen that small random perturbations cannot speed up the mixing time on dense graphs significantly.Indeed, the canonical examples leading to torpid mixing (e.g., two cliques connected by a long path) are robust with respect to that property.Smoothed analysis of sparse graphs was introduced by Krivelevich, Reichman and Samotij [31].Here one starts with a connected graph of bounded degree (in fact, bounded degeneracy often suffices) and applies a small random perturbation by adding a copy of the binomial random graph  ∼ (, /) for small  > 0. Although this perturbation is very slight, they showed that it greatly improves the expansion properties of the graph.A graph  is said to be Δ-degenerate if there is some ordering of the vertices of  such that each vertex has at most Δ neighbours in  that precede it in the ordering.To be precise, Krivelevich, Reichman and Samotij proved that, for any Δ ∈ ℕ and  > 0, if  is an -vertex Δ-degenerate connected graph and  ∼ (, /), then  ′  ∪  a.a.s.satisfies that  mix ( ′ ) = (log 2 ).By considering, for example, a path on  vertices, which has mixing time Ω( 2 ), we see a vast improvement after a slight random perturbation.We also note that the result is tight on such examples, as the randomly perturbed path a.a.s.contains bare paths of length Ω(log ).
Our first application of Theorem 1.5 shows that we can improve the mixing time yet further in this model by starting from a uniformly chosen vertex, as in this case we avoid the small bottlenecks that remain if the initial graph had poor expansion.Theorem 1.6.For any ,  > 0 and Δ ∈ ℕ, there exists a  > 0 such that the following holds.Let  be an -vertex Δ-degenerate connected graph, choose  ∼ (, /) and let  ′  ∪ .Then, a.a.s.
Remark 1.7.Theorem 1.6 is tight, up to the constant factor , for all graphs with maximum degree Δ.Indeed, this follows from the fact that   () = ((2 d)  log ) for all vertices  ∈  ( ′ ), where   () denotes the number of vertices that are at distance at most  from  in  ′ and d Δ +  is an upper bound on the average degree in  ′ .Such an upper bound can be shown easily by induction, see for example [13], and setting  =  log  for  > 0 sufficiently small shows that at least half of the vertices cannot be reached from  in  steps and hence tmix ( ′ , ) ≥ .Nonetheless, the converse of the inequality in Theorem 1.6 is not true for all Δ-degenerate graphs.Consider for instance a star: it is 1-degenerate, but the mixing time of the randomly perturbed star is (1) as we mix in the step after visiting the centre of the star for the first time.
Some time before the systematic study of random perturbations in the combinatorial and theoretical computer science communities discussed above, the notion appeared in physics literature with the study of so-called small-world networks.Here we will concentrate on a model introduced by Newman and Watts [36,37] where, for some fixed  ∈ ℕ,  > 0 and  ∈ ℕ large, one starts with -vertices of the graph ordered as  1 , . . .,   , adds all edges     for which  + 1 ≤  ≤  +  (with addition modulo ), and then adds all remaining edges independently with probability  = /.We denote the resulting random graph as  ,, .It is easy to see that this graph fits into the framework of Krivelevich, Reichman and Samotij [31], and so their result implies that, for any  ∈ ℕ and  > 0, the Newman-Watts small world network  ,, a.a.s.satisfies  mix ( ,, ) = (log 2 ).In fact, this was established before their work by Addario-Berry and Lei [1], improving on a previous bound of (log 3 ) due to Durrett [18].Here, as a direct consequence of Theorem 1.6, we conclude that the average mixing time on the Newman-Watts small world network is of order (log ).
1.4.Application 2: Giant components in random subgraphs of expanders.For  ∈ [0, 1] and a graph , we define   to be the graph with the same vertex set where each edge of  is retained in   independently with probability .The graph  is called the host graph, and the random subgraph   , the -percolated one.Percolation on graphs is a well-established topic in probability theory.Most classically, if the host graph is the complete graph on  vertices   , then its -percolated subgraph is the Erdős-Rényi graph (, ).For any graph , let  1 () denote a largest connected component in  and let ℓ 1 () denote its order.In their seminal paper [21], Erdős and Rényi proved a phase transition for ℓ 1 ((, )).Namely, writing  = / for some constant , if  < 1 then a.a.s.ℓ 1 ((, )) = (log ), while if  > 1 then a.a.s.ℓ 1 ((, )) = Ω() and the largest component, which is the unique component of linear size, is known as the giant component.
A central question in random graph theory is whether other host graphs exhibit the same phenomenon [2].One quickly observes that, in order for   to have a sharp threshold for the component structure, the host graph  should satisfy some additional properties.A natural property to consider is the pseudo-random notion of expansion.There is a strong connection between expansion and the graph spectrum.Given the eigenvalues of the adjacency matrix of a -regular graph , say max{| 2 |, |  |} be the second largest eigenvalue.We then define an (, , )-graph to be a -regular graph  on  vertices with () = .When  is small compared to , an (, , )-graph is said to be an expander and it enjoys many of the same properties as a random graph with the same density.We refer the reader to the excellent survey of Krivelevich and Sudakov [32] on the subject.
Our next application of our main result shows that for percolated pseudo-random graphs the average mixing time is logarithmic.Theorem 1.9.For all  > 0 sufficiently small and all  > 0, there exists a  > 0 such that, if  = (1 + )/ and  is an (, , )-graph with  ≤  4 , then a.a.s.
Similarly as in Remark 1.7, it can be proven that Theorem 1.9 is tight up to multiplicative constant for all (, , )-graphs.
As a consequence, for  =   we obtain the following.
Corollary 1.10.For all  > 0 sufficiently small and all  > 0, there exists a  > 0 such that, for By Remark 1.1,  (  ) mix = (log ) a.a.s., where   is chosen uniformly at random from  ().This result aligns with [6], although theirs is much stronger, showing cut-off for  (  ) mix as previously mentioned.
1.5.Organisation.The rest of this paper is organised as follows.In Section 2 we introduce all the necessary notation, definitions and tools for our proofs.We use these to prove Theorem 1.5 in Section 3.This section is structured in subsections where we build different tools to be used in our main proof; in particular, we discuss the main ideas of the proof in Section 3.1.Sections 4 and 5 are devoted to proving Theorems 1.6 and 1.9, respectively.Finally, we discuss some open problems in Section 6. {1, . . ., }.Throughout, we will consider both simple graphs and multigraphs.The word graph will refer to simple graphs, that is, each pair of vertices forms at most one edge.Our multigraphs, which will be allowed to have parallel edges but no loops, will be clearly identified as such.All our graphs are labelled, so whenever we discuss an -vertex (multi)graph , we implicitly assume that  () = [].Given a (multi)graph  = (, ) and disjoint sets ,  ⊆  (), we write   () for the (multi)set of edges of  contained in , and   (, ) for the (multi)set of edges with one endpoint in  and the other in .We set   ( In many of our statements we will consider an -vertex graph satisfying a set of conditions or a conclusion, which are often asymptotic in nature.This is in fact an abuse of notation.To be precise, one must consider a sequence (  ) ≥1 of graphs on an increasing number of vertices so that the graphs in the sequence satisfy the conditions.This abuse of notation greatly simplifies the statements, so we will assume it throughout.(This also includes any asymptotic statements about random graphs.)For any sequence of graphs (  ) ≥1 with | (  )| → ∞, we say that a graph property P holds asymptotically almost surely (a.a.s.) if lim →∞ ℙ[  ∈ P] = 1.

Random walks.
Given an arbitrary connected multigraph , the lazy random walk over  is a Markov chain on state space  () defined by the transition matrix   = (  (, )) ,∈ () given by That is, the lazy random walk is a sequence of random variables (  ) ≥0 with probability distributions (  ) ≥0 , respectively, over  (), where  0 is the starting distribution and, for each  ≥ 1, the distribution of   is obtained from the distribution of  −1 as   =  −1   =  0    .The sequence of distributions thus depends only on  and the starting distribution.In the special case when there is a vertex  ∈  () such that  0 () = 1, we will write (   ) ≥0 to denote the resulting sequence of distributions.
If  is connected, the lazy random walk over  converges to a stationary distribution   (that is, a distribution satisfying   =     ), independently of the starting distribution  0 .It is well known (see, e.g., [33]) that this stationary distribution satisfies for all  ∈  ().Given a set  ⊆  (), we define   () ∈   ().It follows from (2.2) that We define  min () min ∈ ()   () and  max () max ∈ ()   ().Recall the definition of mixing times in the introduction.The mixing time of a random walk on a connected (multi)graph  is deeply tied with the concept of conductance.Given a set  ⊆  (), we define where the equality follows from (2.1) and (2.2).Observe that   () =   ( () \ ).Finally, we define the conductance Φ  () of  as . (2.5) From the definitions in (2.3), (2.4) and (2.5) and the fact that deg  () ≤ 2() for any set  ⊆  (), it follows that . ( Our approach to estimate the mixing time of the lazy random walk over a multigraph  is based on ideas of Fountoulakis and Reed [23,24].Roughly speaking, their main contribution is the fact that the mixing time of an abstract irreducible, reversible, aperiodic Markov chain (which we may represent using a weighted graph  on its state space) can be bounded from above using the conductances of different -connected sets of states of various sizes.The fact that we may restrict ourselves to -connected sets is crucial to obtain tighter bounds than would be obtained through other classical means.For simplicity, here we only state a version of the result of Fountoulakis and Reed [23] which is applicable to our setting.For any  ∈ ( min (), 1), we let Φ  () be the minimum conductance Φ  () over all -connected sets  ⊆  () such that /2 ≤   () ≤  (if no such set  exists, we set Φ  () = 1).Theorem 2.1 (Fountoulakis and Reed [23]).Let  be a connected multigraph.There exists an absolute constant  0 such that Another parameter of interest is the hitting time to a vertex (or set of vertices) in the random walk on a multigraph .Given any  ∈  () and the lazy random walk (  ) ≥0 with starting distribution  0 , we define the hitting time to  as In more generality, given any set  ⊆  (), we define the hitting time to  as Given any vertex  ∈  (), let   be the matrix obtained from the transition matrix   by removing the row and column corresponding to .If   is primitive (i.e., all entries of    are positive for some  ≥ 1), by Perron-Frobenius, the largest eigenvalue of   , denoted by   , is real, of multiplicity 1 and satisfies   < 1.
We will make use of the first visit time lemma of Cooper and Frieze [15].Here we state a more recent version with weaker hypotheses due to Manzo, Quattropani and Scoppola [35].
Theorem 2.2 (First Visit Time Lemma, Manzo, Quattropani and Scoppola [35]).Let  be an -vertex connected multigraph.Suppose that there exist a real number  > 2 and a diverging sequence  =  () such that the following conditions hold: Then, for all  ∈  (), we have ) where is the expected number of indices  ∈ [ ] ∪ {0} for which the lazy random walk (  ) ≥0 on  starting at From the intuitive point of view, the theorem says that the hitting time to  is roughly distributed as a geometric random variable with success probability   ()/  ().If one wants to hit  by independently sampling vertices according to   , then it would be a geometric random variable with success probability   ().The factor   () is the price to pay for taking into account the geometry of the graph: the more likely it is to return from  to , the less connected  is to the rest of the graph, and the smaller the probability to hit it at a given (large) time is.Remark 2.3.In the proof of Theorem 2.2, one can check that, if we only want (2.7) to hold for a given  ∈  (), then (HP2) can be replaced by (HP2 ′ ) Small   ():  •   () = (1).
3. A general approach to average mixing times 3.1.Proof overview.As discussed in the introduction, the main tool we will use to bound the mixing times of random walks is the result of Fountoulakis and Reed (Theorem 2.1) which relates the (worst-case) mixing time of a random walk in a graph  to the conductance (see (2.5)) of the -connected vertex subsets  of .We think of vertex subsets  whose conductance is poor (those for which Φ  () = (1)) as bottlenecks: they have more edges internally in  than leaving  and so the random walk is likely to get held up in .The spreader graphs (see Definition 1.3) we are interested in studying here have only few small bottlenecks.Indeed, any vertex subset which can lead to small bottlenecks must either be thin or loaded and our upper bounds on the number of these sets in a spreader graph readily imply that any vertex set with poor conductance is of at most polylogarithmic size (size (log 2 ) to be precise, see Remark 3.3).Now, if a set with poor conductance is very small (size at most (log ) 1/5 ), it will not slow down mixing significantly as our random walk will not get stuck for very long in these sets before leaving them.Therefore, it is the intermediate size sets which pose a problem, and we will first show in Section 3.2 (see Lemma 3.4) that the set  of bad vertices contained in some intermediate set which has poor conductance contains a negligible proportion of the overall vertex set of our spreader graph.Intuitively, we can then see how starting at an average vertex in  speeds up the mixing time.Indeed, we are very unlikely to start at a bad vertex in  and, moreover, we are in fact very unlikely to visit a vertex in  in the first (log ) time steps, by which time we aim to show that the distribution of the random walk is already well-mixed.In order to formalise this intuition, we adjust our spreader graph  by shrinking the intermediate sets with poor conductance and thus removing troublesome small bottlenecks.The resulting (multi-)graph we will call  * .Using that the number of bad vertices | | is negligible, or rather that the number of edges incident to , deg  (), is negligible (Lemma 3.4), we show in Section 3.3 that switching from  to  * does not have a big effect on the edge distribution and that, in particular, the stationary distributions on  and  * are comparable.We then show in Section 3.4 that, after contracting intermediate sets with poor conductance, we can apply Theorem 2.1 of Fountoulakis and Reed to conclude that the worst-case mixing time in  * is logarithmic.Here we will need that  * is defined carefully to preserve connectivity between sets after contractions (see (3.7)).Finally, we will prove Theorem 1.5 by coupling the random walk from an average vertex on  with the random walk in  * .As the random walk in  * from any starting point mixes rapidly, we can conclude that the random walk in  also mixes rapidly as long as the two random walks stay coupled for long enough.For this, our final ingredient is to show that the random walk in  is unlikely to hit our bad vertices , which we do in Section 3.5 by appealing to the First Visit Time Lemma (Theorem 2.2) of Manzo, Quattropani and Scoppola.
3.2.Badly connected sets.We will make use of the following simple definition.Definition 3.1.For  > 0 and a connected multigraph , we say a set and that it is -good otherwise.
The following lemma gives us a basic property of bad sets in a connected multigraph and quickly ties them with our notion of (, )-spreader graphs.Note that the notions of thin and loaded sets extend naturally to multigraphs.

Lemma 3.2. Let 𝐺 be a multigraph. For
Proof.The assertion (1) follows easily since, if , a contradiction.For the second assertion, suppose that  is neither -thin nor  −1 -loaded in .Then, we have that , a contradiction.Here we used that |  ()| ≤ 2  () from (1) in the first inequality and the definitions of -thin and  −1 -loaded in the second.□ We will also make use of the following simple observation.We now turn our attention to (3.4).By the definition of , we have that and using Lemma 3.2(1) this simplifies to Now, for each  ∈ S, let () be some -connected set such that (log ) 2 ≤ |()| ≤ (log ) 3 and  ⊆ ().Note that this is possible because  is connected and every  ∈ S has size less than (log ) 2 .By (S2) and the bounds on |()|, for each  ∈ S we have that Therefore, using (3.6) and for  sufficiently large, In particular, since  is connected (so () ≥ /2), it follows from (2.3) that In particular, () =  * .Finally, we define the multiset Observe that  * is connected if and only if  is connected, and that  * must be an independent set in  * .Given some connected graph , we want to compare the behaviour of the lazy random walk on  and its contracted form  * .In particular, we wish to compare their stationary distributions.In order to do this, we need to make them comparable by having them on the same state space.Let us describe this in full generality.Let  1 = ( 1 ,  1 ) and  2 = ( 2 ,  2 ) be two connected multigraphs (possibly with  1 ∩  2 ≠ ∅).Then, we define an auxiliary multigraph as the union of  1 and  2 .Given the stationary distributions   1 and   2 , we define two distributions  1 and  2 on , where, for each  ∈  (), The second term in the sum can be evaluated as Introducing this in (3.10) together with (3.9) and using Lemma 3.4 and that () ≥ /2, we conclude that ). □

Mixing time after contractions.
The following result shows the mixing properties of contracted spreader graphs.
Proposition 3.6.For all  ≥ 4, 0 <  < 1/ 2 and  > 0, there exists a  > 0 such that the following holds for all  sufficiently large.Suppose  is an -vertex connected (, )-spreader graph.Then, In order to prove Proposition 3.6, we will rely on the following lemma.

Smoothed analysis on connected graphs
We next want to show applications of Theorem 1.5.We use this section to prove Theorem 1.6.

Random subgraphs of expanders
In order to prove Theorem 1.9, we will rely on several known properties of the giant component of a random subgraph of an (, , )-graph.Recall from Section 2.1 that when we refer to asymptotic statements holding in an (, , )-graph, implicitly what is meant is that the statement holds for any sequence (  ) ≥1 of (, , )-graphs that satisfy the stated condition.Lemma 5.1.Let  > 0 be a sufficiently small constant and let  be an (, , )-graph with  ≤  4   □ With this, we can prove Theorem 1.9.

Open problems
Theorem 1.5 is only effective on graphs where the mixing is slowed down by few small bottlenecks.This is the case in the two applications presented.Nevertheless, there are other cases where both small and large bottlenecks exist.It would be interesting to study average-case mixing times in such scenarios and determine which improvement with respect to the worst-case can be attained.
One such example is the small-world model of Kleinberg [30], whose mixing time has been studied in [20].
In recent years, the theory of random walks in random directed graphs has attracted a considerable amount of attention.As in the case of random regular graphs, under mild conditions on the bidegree sequence, the mixing time is logarithmic [9,12].From the point of view of smoothed analysis, a natural question is whether randomly perturbing a deterministic strongly connected digraph can yield logarithmic mixing time.Conductance-based bounds such as Jerrum-Sinclair and Fountoulakis-Reed are not valid in the non-reversible setting, which requires new ideas.Finally, we mention an analogous result to the mixing time in randomly perturbed connected graphs by Krivelevich, Reichman and Samotij [31], for graphs perturbed by a random perfect matching, has been obtained by Hermon, Sly and Sousi [27].Considering such a model in the directed setting would also be interesting.

Figure 1 .
Figure 1.Schematic plot of the total variation distance starting at different vertices and the two average mixing times for  = 0.05.In red, the function 1  ∈ ()  TV (  0    ,   ) and the dot representing tmix (, ).In blue, the average of mixing times at different thresholds and the dot representing ( (  ) mix (, )).