A log-Sobolev inequality for the multislice, with applications

Let $\kappa \in \mathbb{N}_+^\ell$ satisfy $\kappa_1 + \dots + \kappa_\ell = n$ and let $\mathcal{U}_\kappa$ denote the"multislice"of all strings $u$ in $[\ell]^n$ having exactly $\kappa_i$ coordinates equal to $i$, for all $i \in [\ell]$. Consider the Markov chain on $\mathcal{U}_\kappa$, where a step is a random transposition of two coordinates of $u$. We show that the log-Sobolev constant $\rho_\kappa$ for the chain satisfies $$(\rho_\kappa)^{-1} \leq n \sum_{i=1}^{\ell} \tfrac{1}{2} \log_2(4n/\kappa_i),$$ which is sharp up to constants whenever $\ell$ is constant. From this, we derive some consequences for small-set expansion and isoperimetry in the multislice, including a KKL Theorem, a Kruskal--Katona Theorem for the multislice, a Friedgut Junta Theorem, and a Nisan--Szegedy Theorem.


Introduction
Suppose we have a deck of n cards, with κ 1 of them colored red, κ 2 of them colored blue, and κ 3 of them colored green. If we "shuffle" the cards by repeatedly transposing random pairs of cards, how long does it take for the deck to get to a well-mixed configuration? This question is asking about the mixing time and expansion in a Markov chain known variously as the multi-urn Bernoulli-Laplace diffusion process or the multislice.
Let ℓ ∈ AE + denote a number of colors and let n ∈ AE + denote a number of coordinates (or positions).
Following computer science terminology, we refer to elements u ∈ [ℓ] n as strings. Given a color i ∈ [ℓ], we write # i u for the number of coordinates j ∈ [n] for which u j = i. The vector κ = (# 1 u, . . ., # ℓ u) ∈ AE ℓ is referred to as the histogram of u. In general, if κ ∈ AE ℓ + satisfies κ 1 +· · ·+κ ℓ = n (so κ is a composition of n), we define the associated multislice to be u∼A τ∼Trans(n) Sets A with small conductance are natural bottlenecks for mixing in the Markov chain. An example when ℓ = 2 and κ = (n/2, n/2) is the "dictator" set A = {u : u 1 = 1}. It has expansion Φ[A] = 1 n−1 , and indeed, if we start the random walk from a string u with u 1 = 1, it will take about n/2 steps on average before there's even a chance that u 1 will change from 1.
One feature of this example is that the set A is "large"; its (fractional) volume, is bounded below by a constant. The "small-set expansion" phenomenon [KKL88, LK99, RS10] (occurring most famously in the standard random walk on the Boolean cube {0, 1} n ) refers to the possibility that all "small" sets have high conductance. Intuitively, if small-set expansion holds for a Markov chain, then a random walk with a deterministic starting point should mix rapidly in its early stages, with the possibility for slowdown occurring only when the chain is somewhat close to mixed. A log-Sobolev inequality for the Markov chain is one way that such a phenomenon may be captured. In particular, if the log-Sobolev constant for the transposition chain on U κ is ̺ κ , it follows that Φ[A] ≥ 1 2 ̺ κ · ln(1/vol(A)) for all nonempty subsets A ⊆ U κ . (1) So sets of constant volume must have conductance Ω(̺ κ ), but sets of volume 2 −Θ(n) (for example) must have conductance Ω(n̺ κ ). A known further consequence of a log-Sobolev inequality is a hypercontractive inequality, which concerns expansion in the continuous-time version of the Markov chain. It implies that if σ is the random permutation generated by performing the continuous-time chain for t = ln c Thus again, if vol(A) is small, then the Markov chain will almost surely exit A after running for Θ(̺ −1 κ ) steps.
We remark that Inequality (1) is merely a consequence of the log-Sobolev constant being ̺ κ . It is not the case that ̺ κ is defined to be the largest constant for which Inequality (1) holds (for all A) -though this is a reasonable intuition. Instead, ̺ κ is defined to be the largest constant for which a certain generalization of Inequality (1) to nonnegative functions holds; namely, E u∼π τ∼Trans(n) φ(u) − φ(u τ ) 2 ≥ ̺ κ · KL(φπ π) for all probability densities φ.
The main case of interest for us is n −→ ∞ with ℓ = O(1) and κ i /n ≥ Ω(1) for each i; in other words, when we are at a "middling" histogram of a high-dimensional multicube [ℓ] n . In this case our bound is ̺ κ ≥ Ω(1/n), which is the same bound that holds for the standard random walk on the Boolean cube. Thus for this parameter setting, the random transposition chain on the multislice enjoys all of the same small-set expansion properties as the Boolean cube (up to constants).
On the sharpness of Theorem 1. When ℓ is considered to be a constant, Theorem 1 is sharp up to constant factors (which we did not attempt to optimize); i.e., To see the upper bound on ̺ κ , assume without loss of generality that ℓ = argmin i {κ i }, and take A = u ∈ U κ : u j = ℓ for all j ∈ [κ ℓ ] .
At the opposite extreme, when ℓ = n and κ = (1, 1, . . ., 1), we have the random transposition walk on the symmetric group S n . In this case, Theorem 1 as stated gives the poor bound of ̺ κ ≥ Ω(1/n 2 log n), whereas the optimal bound is ̺ κ = Θ(1/n log n) [DS96,LY98]. In fact, our proof of Theorem 1 (which generalizes that of [LY98]) can actually achieve the tight lower bound of ̺ κ ≥ Ω(1/n log n) in this case. However, we tailored our general bound for the case of ℓ = O(1), and did not try to optimize for the most general scenario of ℓ varying with n. A reasonable prediction might be that Equation (3) always holds, up to universal constants, without the assumption of ℓ = O(1); we leave investigation of this for future work.

Applications
There are many known applications of log-Sobolev and hypercontractive inequalities in combinatorics and theoretical computer science (see, e.g., [O'D14, Ch. 9, 10]). In this paper we present four particular consequences of Theorem 1 for analysis/combinatorics of Boolean functions on the multislice. We anticipate the possibility of several more. Full details of these applications appear in Section 4; here we describe them informally.
Throughout the remainder of this section, let us think of n as large, of ℓ as constant, and let us fix a histogram κ (with κ 1 + · · · + κ ℓ = n) satisfying κ i /n ≥ Ω(1) for all i. For example, we might think of ℓ = 3 and κ = (n/3, n/3, n/3), so that U κ consists of all ternary strings with an equal number of 1's, 2's, and 3's. The isoperimetric problem for U κ would ask: for a given fixed 0 < α < 1, which subset A ⊆ U κ with vol(A) = α has minimal "edge boundary", i.e., minimal Φ[A]? (Here "edge boundary" is with respect to performing a single transposition, although in our Kruskal-Katona application we will relate this to the size of A's "shadows" at neighboring multislices.) We typically think of α as "constant", bounded away from 0 and 1. In our example with κ = (n/3, n/3, n/3), when α = 1/3 the isoperimetric minimizer is a "dictator" set like A = {u : u 1 = 1}; it has n−1 . The "99% regime" version of the isoperimetric question would be: if Φ[A] is within a factor 1 + o(1) of minimal, must A be "o(1)-close" to a minimizer? This question will be considered in a companion paper. We will instead consider the "1% regime" version of the isoperimetric question: if Φ[A] is at most O(1) times the minimum, must A at least "slightly resemble" a minimizer?
To orient ourselves, first note that for constant α (bounded away from 0 and 1), the minimum possible value of Φ[A] among A with vol(A) = α is Θ(1/n); indeed, this follows from our Theorem 1 and Inequality (1). From this fact, we will derive a multislice variant of the Kruskal-Katona Theorem. Up to O(1) factors, this minimum is achieved not just by "dictator" sets like {u ∈ U (n/3,n/3,n/3) : u 1 = 1}, but also by any "junta" set, meaning a set A for which absence or presence of u ∈ A depends only on the colors (u j : We give two closely related positive answers to this question, as a consequence of our log-Sobolev inequality. The first answer, a KKL Theorem for the multislice (cf. [KKL88]), follows immediately from previous work [OW13a,OW13b]. It says that for any set with Φ[A] ≤ O(1/n), there must exist some pair of coordinates j, j ′ ∈ [n] with at least constant influence on A, where the influence of the transposition ( j j ′ ) on A is defined to be It is the hallmark of a junta A that every transposition ( j j ′ ) has either Inf ( j j ′ ) [A] = 0 or Inf ( j j ′ ) [A] ≥ Ω(1). In fact, mirroring the original KKL Theorem, our work shows that: ≥ Ω log n n . From this, we can also derive a "robust" version of our Kruskal-Katona theorem (a là [OW13a]).
A closely related consequence of our work is a Friedgut Junta Theorem for the multislice (cf. [Fri98]), which follows (using a small amount of representation theory) from work of Wimmer [Wim14] (see also [Fil16b] for a different account). It states that for any A with Φ[A] ≤ c/n, and any ǫ > 0, there is a genuine exp(O(c/ǫ))-junta A ′ ⊆ U κ that is ǫ-close to A, meaning vol(A△A ′ ) ≤ ǫ. The junta theorem can also be generalized to real-valued functions, following the work of Bouyrie [Bou17], with a worse dependence on ǫ in the exponent.
Finally, with a little more representation theory effort, we are able to derive from Theorem 1 a Nisan-Szegedy Theorem for the multislice (cf. [NS94]), which is (roughly) an ǫ = 0 version of the Friedgut Junta Theorem; this generalizes previous work on the Hamming slice [FI18a]. It says that if A ⊆ U κ is of "degree k" -meaning that its indicator function can be written as a linear combination of k-junta functions -then A must be an exp(O(k))-junta itself. (The k = 1 case of this theorem, with the conclusion that A is a 1-junta, was proven recently in [FI18b].)

Context and prior work
In this section we review similar contexts where log-Sobolev inequalities and small-set expansion have been studied.
It follows immediately that the log-Sobolev constant in the general-n case is ̺ triv ν /n. In particular, if κ 1 +· · ·+κ ℓ = n and ν(i) = κ i /n, then ν ⊗n resembles the uniform distribution π κ on U κ , and the product chain on [ℓ] n somewhat resembles the random transposition chain on U κ . This gives credence to the possibility that Equation (3) may hold with absolute constants for any ℓ.
The Boolean slice / Bernoulli-Laplace model / Johnson graph. Significant difficulties arise when one moves away from product Markov chains. One of the simplest steps forward is to the Boolean slice. This is the ℓ = 2 case of the Markov chains studied in this paper, with the "balanced" case of κ = (n/2, n/2) being the most traditionally studied. This Markov chain is also equivalent to the Bernoulli-Laplace model for diffusion between two incompressible liquids, and to the standard random walk on Johnson graphs; taking multiple steps in the chain is similar to the random walk in generalized Johnson graphs. The chain has been studied in wide-ranging contexts, from genetics [Mor58], to child psychology [PI76], to computational learning theory [OW13a]. An asymptotically exact analysis of the time to stationarity of this Markov chain was given by Diaconis and Shahshahani [DS87], using representation theory. However, the log-Sobolev constant for the chain took a rather long time to be determined; it was left open in Diaconis and Saloff-Coste's 1996 survey [DS96] before finally being determined (up to constants) by Lee and Yau in 1998 [LY98]. This sharp log-Sobolev inequality, and its attendant hypercontractivity and small-set expansion inequalities, have subsequently been used in numerous applications -for the Kruskal-Katona and Erdős-Ko-Rado theorems in combinatorics [OW13a,DK16,FKMW18], for computational learning theory [Wim09,OW13a], for property testing [Mos14], and for generalizing classic "analysis of Boolean functions" results [OW13a,OW13b,Fil16b,Fil16a,FM16,FKMW18,Bou18].
The Grassmann graph. One direction of generalization for the Johnson graphs are their "qanalogues", the Grassmann graphs; understanding this Markov chain was posed as an open problem even in the early work of Diaconis and Shahshahani [DS87, Example 2]. For a finite field and integer parameters n ≥ k ≥ 1, the associated Grassmann graph has as its vertices all k-dimensional subspaces of n , with two subspaces connected by an edge if their intersection has dimension k − 1. Understanding small-set expansion (and lack thereof) in the Grassmann graphs was central to the very recent line of work that positively resolved the 2-to-2 Conjecture [KMS17, DKK + 18b, DKK + 18a, BKS18,KMS18] (with the analogous problems on the Johnson graphs serving as an important warmup [KMMS18]). Still, it seems fair to say that the mixing properties of the Grassmann graph are far from being fully understood.
The multislice. We now come to the multislice, the other natural direction of generalization for the Johnson graphs, and the subject of the present paper. One can see the multislice as a generalization of the Bernoulli-Laplace model, modeling diffusion between three or more liquids. As well, the space of functions f : U κ → Ê, together with the action of S n on U κ , is precisely the Young permutation module M κ arising in the representation theory of the symmetric group. Understanding the mixing properties of the U κ Markov chain with random transpositions was suggested as an open problem several times [DS87], [Dia88,p. 59], [FI18a]. The multislice has also played a key combinatorial role in problems in combinatorics, such as the Density Hales-Jewett problem (where ℓ = 3 was the main case under consideration) [Pol12].
Although it might at first appear to be a simple generalization of the Boolean slice, there are several fundamental impediments that arise when moving from ℓ = 2 even to ℓ = 3. These include: the fact that a Hamming slice disconnects the nearest-neighbour graph in [2] ℓ but not in [3] ℓ ; the fact that one can introduce just one variable per coordinate when representing functions [2] ℓ → Ê as multilinear polynomials; the fact that 2-row irreps of S n (Young diagrams) are completely defined by the number of boxes not in the first row; and, the fact that when ℓ ≥ 3, the decomposition of the permutation module M κ into irreps has multiplicities. The last of these was the main difficulty to be overcome in Scarabotti's work [Sca97] giving the asymptotic mixing time for the transposition walk on balanced multislices U (n/ℓ,...,n/ℓ) (see also [DH02,ST10]). It also prevents the multislice from forming an association scheme.
For the purposes of this paper, the main difficulty that arises when analyzing the log-Sobolev inequality is the following: when ℓ = 2, any nontrivial step in the Markov chain (switching a 1 and a 2) has the property that the histogram within [ℓ] n−2 of the unswitched colors is always the same: (κ 1 −1, κ 2 −1). By contrast, once ℓ ≥ 3, the multiple "kinds" of transpositions (switching a 1 and a 2, or a 1 and a 3, or a 2 and a 3, etc.) lead to differing histograms within [ℓ] n−2 for the unswitched colors. This significantly complicates inductive arguments.
The symmetric group and beyond. Finally, we mention that analysis of the multislice can also be motivated simply as a necessary first step in a full understanding of spectral analysis on the symmetric group and other algebraic structures, an opinion also espoused in, e.g., [CFR11]. Such structures include classical association schemes such as polar spaces and bilinear forms, matrix groups such as the general linear group, and the q-analog of the multislice.

Definitions relevant for the log-Sobolev inequality
Given parameters ℓ ∈ AE + (number of colors) and n ∈ AE + (number of coordinates), our objects of study in this paper are multislices, parametrized by a histogram κ ∈ AE ℓ + satisfying κ 1 + · · · + κ ℓ = n: We will only consider multislices with at least two colors; in other words, ℓ ≥ 2.
We introduce the inner product space of functions on U κ , where π = π κ denotes the uniform distribution on the multislice U κ . Let K denote the transition / Markov operator on M κ associated to the transposition random walk, defined by where Trans(n) consists of all n 2 transpositions. Let L denote the Laplacian operator ½ − K (where ½ is the identity operator). Then the energy (or Dirichlet form) of f : where we have introduced the notation u ∼ v to denote that (u, v) is a random edge in the Schreier graph; equivalently, u ∼ π and v = u τ for τ ∼ Trans(n). One may check that if A ⊆ U κ , then Before formally defining the log-Sobolev inequality for the transposition chain on U κ , we recall first its simpler counterpart, the Poincaré inequality. For f ∈ M κ , this is where λ 1 is the spectral gap; i.e., the lowest eigenvalue of L other than the trivial λ 0 = 0. For the transposition chain on U κ it is known that λ 1 = 2 n−1 , with "dictator" functions and other "degree-1" functions providing the tight examples; see Corollary 20 in Section 4.2.
As for the log-Sobolev inequality, it is typically defined as . We remark that Diaconis and Saloff-Coste [DS96] showed that ̺ ≤ λ 1 always holds.
In this work we will prefer a slightly different (equivalent) definition for the log-Sobolev inequality, used in [LY98]. It's easy to see that replacing f with | f | does not change Ent[ f 2 ] but can only decrease E [ f ]. Thus in the log-Sobolev inequality it suffices to consider nonnegative f . Also, the inequality is 2-homogeneous (both sides are multiplied by c 2 when f is multiplied by c); thus it suffices We call such a φ a probability density function, thinking of it as a relative probability density with respect to the uniform distribution π = π κ . In other words, we associate φ to the probability distribution in which u ∈ U κ has probability mass φ(u)π(u). Now when φ = f 2 , we have where we used lnE π [φ] = ln1 = 0. This quantity is precisely the Kullback-Leibler divergence (or relative entropy) between the distribution φπ and the distribution π, denoted KL(φπ π). Thus we have shown that the usual formulation of the log-Sobolev inequality is equivalent to for all probability densities φ on U κ , as stated in Inequality (2).

Hypercontractivity, influences, and other preliminaries for our applications
In this section we make some further definitions, which will be useful for our applications of the log-Sobolev inequality. We first decompose L into its components on each transposition τ ∈ Trans(n), introducing the operator L τ defined by (where we are using norm notation f p = E[| f | p ] 1/p ). Note that if A ⊆ U κ then we have the following combinatorial interpretation, agreeing with our notation from Equation (4): For general f : U κ → Ê we introduce the following additional notation, for the average influence (equivalent to energy), total influence, and maximum influence of f : As recounted in the important survey of Diaconis and Saloff-Coste [DS96], there is an equivalence between log-Sobolev inequalities and hypercontractivity for reversible Markov chains. To explain what hypercontractivity means in this abstract setting, we first define the continuous-time analog of the random transposition walk. This is the continuous-time Markov chain, running from time t = 0 to t = ∞, in which (informally) in any interval of infinitesimal length dt, one performs step of the random transposition chain with probability dt.
More formally, we can provide the following alternative description: if we initialize the continuoustime Markov chain at state u, then its state u t at time t ≥ 0 is obtained by performing Poisson(t) random transpositions on u. From this definition we may define noise operator (or heat kernel) H t on M κ : It's well known that we can also express H t in terms of the Laplacian operator, H t = e −tL . Diaconis and Saloff-Coste [DS96, Theorem 3.5(ii)] show that the log-Sobolev inequality implies hypercontractivity: where q ′ is the Hölder conjugate of q (meaning 1/q + 1/q ′ = 1).
(The first inequality directly appears in [DS96]; the second statement appears only implicitly. It is a consequence of H t f being a self-adjoint operator; see, e.g., [O'D14, Prop. 9.19].) In all our applications, we actually use hypercontractivity, rather than log-Sobolev directly.

Two numerical lemmas
Here we give some two elementary numerical lemmas we'll need for our proofs. We start by computing and bounding the inverse moment of a hypergeometric random variable. The following simple fact may well be in the literature; the most relevant citation we found was [Gov64]: Proof. Take an urn with N + 1 balls, K + 1 of them white, one of the white balls being "special". Consider an experiment in which we draw n+1 balls without replacement, and then choose a random white ball among the ones drawn (if any). A "success" occurs if the randomly chosen white ball is the special one. A necessary condition for the experiment to succeed is that the special white ball was chosen at all, which happens with probability n+1 N+1 . Assuming that this happened, the number of remaining white balls drawn is distributed as X ∼ Hypergeometric(N, K , n). Thus the probability that we finally choose the special ball is n+1 N+1 E[ 1 X +1 ]. Let us now think of the experiment in a different way. A necessary condition for success is that at least one white ball is drawn, which happens with probability p. Given that this occurs, the finally chosen white ball is just a random white ball (among all K + 1 white balls), so the probability that it is the special one is exactly 1 K+1 . Thus completing the proof of the lemma.
Next, we give a bound on the log-Sobolev constant from Equation (5) that is more tractable.
Lemma 4. Let ν be a probability distribution of full support on [ℓ]. Then the log-Sobolev constant ̺ triv ν for the associated trivial Markov chain, given in Equation (5), satisfies the bound where lg denotes log 2 .
Proof. Let p = min i∈[ℓ] {ν(i)}, and assume without loss of generality that this minimum is achieved by i = ℓ. Recall the formula of Diaconis and Saloffe-Coste [DS96], What we need to show is By convexity of t → lg(1/t) for t ∈ (0, 1], the right-hand side above is at least where the second inequality used ℓ ≥ 2. Thus it suffices to show (This inequality is simply the lemma we are trying to prove, restricted to the case ℓ = 2.) Write p = 1/2 − δ/2, where δ ∈ [0, 1). Both sides of the above inequality are zero for δ = 0; thus to establish the inequality it suffices to show the right-hand side's derivative is at least the left-hand side's. Taking derivatives, we need to show Multiplying this by 1−δ 2 2 > 0 gives which is evidently true as 1 ln 2 − 1 > 0 and 0 < 1 − δ 2 ≤ 1.

Bounding the log-Sobolev constant
In this section we will frequently identify a histogram κ ∈ AE ℓ + with the associated multiset of colors, namely the multiset with κ i copies of i for each i ∈ [ℓ]. Thus we may write n = |κ| = κ 1 + · · · + κ ℓ . We will also use the notation i ∼ κ to mean that i is chosen uniformly at random from the multiset κ; i.e., according to the probability distribution on [ℓ] in which i has probability κ i /n. We will write κ for this probability distribution, and will need to refer to the log-Sobolev constant ̺ triv κ from Equation (5). Let us introduce one more piece of notation: if ̺ κ denotes the optimal log-Sobolev constant for the transposition Markov chain on the multislice U κ , we will write Thus the goal of our Theorem 1 is to upper-bound R κ . The midpoint of the proof will be establishing the following inductive bound: Given Lemma 5, the deduction of Theorem 1 will be elementary, though not completely straightforward; this is in Section 3.2. As for the deduction of Lemma 5 itself, it will mostly follow the proof Lee and Yau used [LY98] to analyze ̺ κ in the cases ℓ = 2 (the Hamming slice) and ℓ = n (the symmetric group S n ). Notice, however, that in both of these cases the "max" appearing in Inequality (7) becomes superfluous. In the ℓ = 2 case, the only possibility for {i 1 , i 2 } is {1, 2}, so we have the much simpler recursion In the ℓ = n case (meaning κ i = 1 for all i ∈ [ℓ]), we see by symmetry that every choice of i 1 , i 2 leads to an isomorphic subproblem, that of bounding the log-Sobolev constant for S n−1 . That is, we have the even simpler recursion where the asymptotic inequality used Equation (5). This recursion straightforwardly yields the known bound for the symmetric group, R (1 N ) 1 2 n ln n. A key point of our work is recognizing that one can use the Lee-Yau methodology to obtain Inequality (7), and that despite its somewhat complicated form, this recursion can be solved to yield a good bound.

Proving Lemma 5
Although much of the proof of Lemma 5 is from [LY98], we recapitulate it here for completeness and clarity. Fix κ ∈ AE ℓ + with n = |κ| > 2. Recall from Inequality (6) that R κ is the smallest constant such that holds for all probability densities on U κ . The way we recursively bound R κ involves applying the chain rule for KL divergence to KL(φπ π). We will set up the notation for invoking the chain rule with respect to the nth coordinate. In fact, we will eventually apply it for each coordinate k ∈ [n], and then take expectations over a uniformly random k. However, it will be notationally convenient to focus just on the k = n case.
To this end, given a probability distribution ξ on U κ (which will be either φπ or π), we will write ξ n to denote its marginal on the nth coordinate (a probability distribution on [ℓ]). Also, given a particular a ∈ [ℓ], we will write ξ |a to denote ξ's distribution on U κ conditioned on the last coordinate having color a.
To begin the analysis of Inequality (8), let us write ψ = φπ (a probability distribution on U κ ) and then apply the chain rule for KL divergence with respect to the nth coordinate: We will now bound MARGINAL n and CONDITIONAL n . In each case we will prove a bound that has a certain "dependence on the nth coordinate". We will then remark that we could have equally well proved an analogous bound involving the kth coordinate, for any k ∈ [n]. Finally, we will average this analogous bound over all k ∈ [n]. Adding the two averaged bounds from the MARGINAL and CONDITIONAL cases yields a valid upper bound on KL(ψ π), bound on MARGINAL k + avg

k∼[n]
bound on CONDITIONAL k .
From this we will derive a recursive upper bound on R κ via Inequality (8).

Bounding MARGINAL n
Our bound on MARGINAL n is from [LY98]. We think of the trivial Markov chain on [ℓ] with invariant distribution π n , noting that π n is nothing more than κ. Applying the associated log-Sobolev inequality, we get KL(ψ n π n ) ≤ 2 · (̺ triv κ ) −1 · E π n ψ n /π n , where we wrote E π n to denote the energy functional for the trivial Markov chain. It is simple to check that the function ψ n /π n on [ℓ] is just a → E u∼π |a [φ(u)]. Thus We would prefer if the two inner expectations here were over the same probability space. To that end, observe that for any a, b ∈ [ℓ] (not necessarily distinct), we can make a draw v ∼ π |b in the following unusual way. First, draw u ∼ π |a . Next, draw j ∼ u −1 (b), where we have introduced the notation u −1 (b) for the set of all coordinates j with u j = b. Finally, form v = u ( j n) . (Here we are abusing notation by allowing the possibility of j = n, so that the "transposition" ( j n) may be the identity.) Thus we have where use the notation φ τ (u) := φ(u τ ). Putting this into Equation (11) (and also "pointlessly" choosing j ∼ u −1 (b) in the first inner expectation) we get Considering the quantity inside the outer expectation, we further use where · 2 is the 2-norm defined by the distribution on (u, j), and we used the triangle inequality.
Putting this back into Equation (12) yields where we have introduced the shorthand In the right-hand expectation in Inequality (13), u is simply distributed according to π. Furthermore, suppose we fix any outcome u = u. Let us consider the joint distribution of b ∼ π n and j ∼ u −1 (b).
Since u ∈ U κ , we could equivalently form b by choosing a random color within u. But in this case, j is formed by first choosing a random color within u, and then taking a random coordinate where u has that same color. It is clear that the resulting distribution on j is simply uniformly random on [n]. Thus the right-hand side of Inequality (13) is simply Putting this into Inequality (13) and then into Inequality (10), we conclude e φ; u, u ( j n) .
This bound we have derived depends on the nth coordinate only through the transposition of j with n.
If we repeat this derivation for a general coordinate k ∈ [n], and then average over k, we will get avg k∼[n] bound on MARGINAL k = 2 · n−1 n · (̺ triv κ ) −1 · E u∼π τ∼Trans(n) e φ; u, u τ = 2 · n−1 n · (̺ triv where the n−1 n factor accounts for the fact that when j and k are uniformly random, there is a 1 n chance that they are equal -in which case the "transposition" ( j k) is the identity and we get a contribution of e( φ; u, u) = 0.
Combining all previous deductions, we get u, a), (u τ , a) = 2 E u∼π τ∼Trans(n−1) e φ; u, u τ · R κ\u n . This is our desired bound for CONDITIONAL n , except we make a slight adjustment so that the expectation is over all τ in Trans(n), obtaining the equivalent bound Again, had we repeated this derivation for an arbitrary coordinate k in place of the nth, and then averaged over k, we would get avg

k∼[n]
bound on CONDITIONAL k = 2 E u∼π τ∼Trans(n) where Fix(τ) denotes the fixed points of transposition τ. This is the point at which, by necessity, we depart from [LY98]. To proceed, we simply take a worst-case upper bound on the two colors swapped by τ; no matter what u and τ are, we have In fact, when inserting this into Inequality (15), we can do slightly better. Notice that if τ swaps two colors of u that are the same, then e φ; u, u τ = 0 anyway. Thus we may insert the indicator random variable 1[τ swaps distinct colors in u] into the expectation in Inequality (15), and then use Putting this inequality into Inequality (15) yields avg

k∼[n]
bound on CONDITIONAL k ≤ 2 max

Deducing Theorem 1 from Lemma 5
In this section we upper-bound R κ using the recursion given by Lemma 5. Let us make a few simplifications to the statement of Lemma 5. First, let us drop the factor n−1 n for simplicity. Second, let us use the upper bound on (̺ triv κ ) −1 from Lemma 4. Finally, let us drop the condition that i 1 , i 2 be distinct in the max; this condition greatly simplifies the recursion when ℓ = 2 (as in [LY98]), but doesn't help us much when ℓ > 2. Thus we will finally use We remind the reader that in the above, {i 1 , i 2 } and κ are considered to be multisets of [ℓ]. In fact, we will always consider κ to merely be a multiset of the colors on which it is supported. That is to say, whenever some κ i becomes 0 in the above recursion (through the removal of a color inside the expectation), we will treat the color i as no longer existing (rather than allowing κ i = 0). This is acceptable, since the definition of R κ is not affected by removing colors that don't appear in κ. This is why we dropped the hypothesis κ ∈ AE ℓ + in writing Inequality (17), and why we wrote the sum in c(κ) as being over {i : κ i > 0}, rather than over all i ∈ [ℓ]. It might seem that dropping the hypothesis κ ∈ AE ℓ + (and hence κ i > 0 for all i) in Inequality (17) could cause a problem for the case when the number of colors drops to just one, meaning κ = {i, i, . . ., i} for some i. However, in this degenerate case, the correct value of R κ is 0, and we also have c(κ) = 0.
Regarding the base cases of |κ| = 2 for our recursion Inequality (17), we have the true values Indeed, ̺ triv {1,2} = 1 according to Equation (5), and the energy of this trivial chain is half that of the transposition chain on U {1,2} .
"Strategies". Given κ, let's define a strategy for κ to be a mapping p that takes in an arbitrary nonempty µ ⊆ κ and outputs a pair {i 1 , i 2 } ⊆ µ. We'll say that p(µ) = {i 1 , i 2 } is the pair protected by p. We'll write Strat(κ) for the set of all strategies for κ. Then from Inequality (17) we have that where S p (κ) is defined to be the solution of the recursion with the base case S p (pair) = 1/2 from Inequality (18). Let us write Then it follows from Equation (20) that we have Thus returning to Inequality (19), we have Next, it will be convenient if c (a) (κ) is a decreasing function of κ a (considering |κ| fixed), even for κ a = 0. So let us redefine c (a) (κ) : This redefinition only increases c (a) (κ); it increases it from zero to a nonzero value when κ a = 0; and, when κ a > 0, the increase from κ a to κ a + 1 in the denominator is at most a factor of 2, and this is compensated for by the new factor of 2 in the numerator. Thus Inequality (22) is still valid under our redefinition.
The advantage of the redefinition is, as mentioned, that c (a) (κ) always goes up when κ a drops by one, even when dropping from 1 down to 0. Thus for each fixed a ∈ [ℓ], it is now clear from from Equation (21) that the optimal strategies p ∈ Strat(κ) for maximizing S (a) p (κ) are precisely the "greedy" ones. Here the "greedy" strategies for color a mean the ones that "always protect non-a colors" (to the extent this is possible -if µ has only m < 2 non-a colors then p(µ) will be obliged to contain 2 − m a's). It is also not hard to see that every such greedy strategy g is equally effective; for the purposes of computing S (a) g (κ), it doesn't matter what non-a colors appear in the µ ⊆ κ that arise -only how many of them there are. (This observation relies in part on using the same upper bound, 1/2, for both cases in Inequality (18).)

Greedy strategies. Let S (a)
g (κ) denote the solution for a greedy protection strategy, which, as we have argued, equals max p∈Strat(κ) {S (a) p (κ)}. Our final goal will be to establish Putting this bound into Inequality (22) will yield Theorem 1. We first dispense with the edge case when κ contains fewer than 2 non-a's. In this case, under the greedy strategy i is always a in Equation (21), and so "solving the recursion" just amounts to computing a sum. When κ a = |κ|, the result is S (a) g = 1 2 lg 2n 1 + n + · · · + 1 2 lg 2 · 3 1 + 3 + 1 2 , and when κ a = |κ| − 1, the result is In both cases, each of the n − 1 summands is at most 1 2 lg2, from which Inequality (24) immediately follows.
We now come to the main case, when κ contains at least 2 non-a's. In this case, the greedy strategy involves always protecting two non-a's, and as argued, the quantity S (a) g (κ) only depends on κ a . Thus for analysis purposes, we may henceforth assume ℓ = 2 and a = 1. Now from Inequality (22) and Equation (23) we conclude that S (a) g (κ) is the solution of with base case S (1) g ({2, 2}) = 1/2. It is easier to analyze this recursion in terms of λ := κ\ {2, 2}. Writing G(λ) = S (1) g (κ), we have with base case G( ) = 1/2. In fact, it will be convenient to overpay for the base case, taking G( ) = 1 2 lg 2(0+0+2) 1+0 = 1. Finally, we can solve the recursion in Equation (25) by giving it a probabilistic interpretation. Suppose we choose a random string u in U λ . Then we begin a deterministic process with "stages" numbered |λ|, |λ| − 1, |λ| − 2, . . ., 1, 0. In each stage, we "pay" 1 2 lg 2(# 1 u+# 2 u+2) 1+# 1 u , and then we delete the last character in u. It is easy to see that the solution of Equation (25) is equal to the expectation (over the initial choice of u) of the total payment in this process. By linearity of expectation, this total payment is the sum of the expected payment in each stage, and in the mth stage it is clear that the random variable # 1 u is distributed as Hypergeometric(λ 1 + λ 2 , λ 1 , m). Thus the expected payment in the mth stage is By Lemma 3, this is at most (The bound here is a little loose, but we valued simplicity over optimization of lower-order terms.) When this is summed over 0 ≤ m ≤ |λ|, we get an upper bound of |λ| + 1 + |λ| 2 lg |λ| + 1 λ 1 + 1 .

KKL and Kruskal-Katona for multislices
By viewing the multislice as a Schreier graph, we can apply the results of [OW13a,OW13b]  The latter statement here is the traditional conclusion of the KKL Theorem. Let us record here one more concrete corollary of Theorem 6. In our model scenario, that theorem (roughly speaking) says that the energy E [1 A ] = avg τ∈Trans(n) Inf τ [1 A ] is at least Ω log n n unless some transposition (i j) has a rather large influence, like 1/n .01 , on 1 A .
Proof. This is immediate from Theorem 6 and Theorem 1, using lg It is not hard to show that vol(∂A) ≥ vol(A) always (here the fractional volume vol(∂A) is vis-à-vis the containing slice U (κ 0 +1,κ 1 −1) ). The Kruskal-Katona Theorem improves this by giving an exactly sharp lower bound on vol(∂A) as a function of vol(A). The precise function is somewhat cumbersome to state, but the qualitative consequence, assuming that vol(A) and κ 0 /n are bounded away from 0 and 1, is that vol(∂A) ≥ vol(A) + Ω(1/n). This is sharp, up to the constant in the Ω(·), as witnessed by the "dictator set" A = {u : u 1 = 0}. See [OW13a, Sec. 1.2] for more discussion.
To extend the Kruskal-Katona Theorem to multislices, we first need to extend the notion of neighboring slices and shadows. Fix an ordering on the colors, 1 ≺ 2 ≺ · · · ≺ ℓ. This total order extends to a partial order on strings in [ℓ] n in the natural way.
Definition 8. Let κ ∈ AE ℓ + be a histogram. We say that histogram κ ′ is a lower neighbor of κ, and for all other colors i. In the opposite case, when c ≻ d, we say κ ′ is an upper neighbor of κ, and write κ ′ ⊲ κ.
The main difference between the Boolean case and the multicolored case is that each multislice now has multiple upper and lower neighbors.
Definition 9. Let A ⊆ U κ , and let κ ′ ⊳ κ. The lower shadow of A at κ ′ is We similarly define upper shadows. We may use the same notation ∂ κ ′ A for both kinds of shadows, since whether a shadow is upper or lower is determined by whether κ ′ ⊲ κ or κ ′ ⊳ κ.
Towards proving Kruskal-Katona theorems, we relate the volume of A's lower shadows to E [1 A ]. Recalling that A now has multiple lower shadows, we show that a certain weighted average of their volumes is noticeably larger than the volume of A. We first define the appropriate weighted average. Definition 10. Given a histogram κ ∈ AE ℓ + , we define a natural probability distribution lower(κ) on the lower neighbors of κ as follows. To draw κ ′ ∼ lower(κ): take an arbitrary u ∈ U κ ; choose j, j ′ ∼ [n] independently and randomly, conditioned on u j = u j ′ ; let c, d denote the two colors u j , u j ′ , with the convention c ≺ d; finally, let κ ′ be the lower neighbor of κ with κ ′ c = κ c + 1 and κ ′ d = κ d − 1. We similarly define a probability distribution upper(κ) on the upper neighbors of κ by interchanging the roles of c and d.
Proposition 11. Given a histogram κ ∈ AE ℓ n(n−1) ≤ 1. (This is the probability that applying a random transposition to a string in U κ actually changes it.) Then for any A ⊆ U κ , In particular, at least one lower neighbor of A has volume at least vol(A) + E [1 A ]/h(κ).

Remark 12. The same proposition also holds if we consider upper neighbors, κ ′ ∼ upper(κ).
Proof. By definition, where the random string v ∈ U κ is defined to be u τ conditioned on u = u τ . In other words, the pair (u, v) is distributed as a random pair of strings differing by a "nontrival" color-swap. Let c, d ∈ [ℓ] denote the two colors swapped, with the convention c ≺ d. Then if we define κ cd ⊳ κ to be the lower neighbor of κ having one fewer d and one more c, it holds that κ cd is distributed according to lower(κ). Finally, let w ∈ U κ cd to be the string that agrees with u, v on the unswapped coordinates, and has color c on the swapped coordinates. It is easy to see the following: v is uniformly distributed on U κ ; conditioned on c and d, the string w is uniformly distributed on U κ cd ; and, w ≺ u, w ≺ v. In light of the last of these, we may make the following deductions: If v ∈ A, then w ∈ ∂ κ cd A. Furthermore, even when v ∈ A, if u ∈ A then w ∈ ∂ κ cd A. Thus In this deduction, on the right we used that v is uniformly distributed on U κ and we used Equation (26).
On the left we used that -conditioned on c and d -the string w is uniform on U κ cd . The proof is completed by recalling that κ cd is distributed according to lower(κ).
We can now immediately deduce our first Kruskal-Katona Theorem, using just the log-Sobolev inequality Theorem 1, and Inequality (1): In particular, at least one lower shadow of A has volume at least the right-hand side. The analogous statement for upper shadows also holds.
Thus in the model case when vol(A) and each κ i /n is bounded away from 0 and 1, and ℓ = O(1), we get that the average lower shadow of A has volume at least vol(A) + Ω(1/n). Now using our KKL Theorem (Corollary 7) we can get a "robust" version of this statement; the volume increase is in fact on the order of (log n)/n unless there is a highly influential transposition for A: and that ǫ ≤ vol(A) ≤ 1−ǫ. Then for any δ > 0 we have or else there exists τ ∈ Trans(n) with Inf τ [A] ≥ 1/n δ . The analogous statement for upper shadows also holds.
As in [OW13a], we now give a conceptual improvement to the "or else" clause in Theorem 14. Let us work with upper shadows rather than lower shadows going forward. The natural example for sets A with upper-shadow expansion "only" Ω(1/n) are "dictator" sets such as A = {u : u 1 = ℓ}. For such sets, all transpositions of the form (1 j) indeed have huge influence. However, it's not so natural to single out one such (1 j) as the "reason" for the small expansion; instead, we would prefer to say the reason is that A is highly "correlated" with coordinate 1. To this end, let us make a definition.
Definition 15. Let A ⊆ U κ , let j ∈ [n], and let c ≺ d be colors in [ℓ]. The correlation of A with respect to coordinate j and colors c, d is For simplicity, we present the following theorem without stating the most general possible settings for parameters:  Suppose that Inequality (27) doesn't hold. Then there must exist τ ∈ Trans(n) such that Inf τ [A] ≥ 1/n .01 . Without loss of generality, we can assume that τ = (1 2). We then deduce two consequences of this, relating the volume of A and its upper shadows. Finally we will combine these to get that A is correlated to a single color change on one coordinate. The proof here has similar ideas to [OW13a, Lemma A.6], but we reproduce it for completeness. We also introduce the notation vol c (A) := Pr u∼π κ [u ∈ A | u 1 = c].
For every c, d ∈ [ℓ] with c ≺ d, let κ dc ⊲ κ be the upper neighbor with one more d and one less c than κ. Then we have e∈{c,d} Proof. Let P c,d be the probability that κ ′ ∼ upper(κ) is such that κ ′ has one more d and one less c than κ. We use the convention that u ∼ U κ and v is sampled from an upper neighbor of u. By definition, Finally, we bound P c,d . This is the probability that, for any u ∈ U κ and i, j ∈ [n] chosen uniformly and independently, {u i , u j } = {c, d} conditioned on u i = u j . We can calculate this probability explicitly: Therefore, Then there exist c, d ∈ [ℓ] with c ≺ d such that the upper neighbor κ dc ⊲ κ satisfies for i = 1 or 2.
Proof. Draw u ∼ π κ , and write u = (u 1 , u ′ ) = (u 1 , u 2 , w) with w ∈ [ℓ] n−2 . We have Since there are ℓ 2 choices of colors for u 1 and u 2 , it must be true that for some choice of c, d ∈ [ℓ], We consider the case that c ≺ d and reach the case of i = 1 in the conclusion; the other case is similar. If (d, c, w) ∈ A then (d, d, w) ∈ ∂ κ dc A. Therefore Every u in the event above also satisfies u ∉ A ∧ (d, u ′ ) ∈ ∂ κ dc A, so and thus Finally, let u ′ ∼ π κ ′ , where κ ′ is obtained from κ by removing one c. If (c, u ′ ) ∈ A then (d, u ′ ) ∈ ∂ κ dc A, and so and also Meanwhile, (d, u ′ ) is uniformly distributed on U κ dc conditioned on u 1 = d, and so Proof of Theorem 16. Apply Theorem 14 with δ = .02, and assume that Inequality (27) does not hold. The theorem shows that Inf τ [A] ≥ 1/n .02 for some transposition τ, which without loss of generality is τ = (1 2). We can therefore apply Lemma 18 (with γ = 1/(ℓ 2 n .02 )), obtaining two colors c ≺ d, which without loss of generality satisfy the conclusion of the lemma for i = 1: We will show that A is correlated to the first coordinate and the colors c and d.

Harmonic analysis on the symmetric group, and Friedgut on the multislice
In this section we will recap some aspects of harmonic analysis on the symmetric group and on the multislice, paying particular attention to the notion of the "low-degree" components of a function. For more details, see e.g. [Dia88]. First, we briefly discuss partitions. A partition λ of n is a nonincreasing sequence of positive integers summing to n. (Equivalently, it is a sorted histogram κ; i.e., one with κ 1 ≥ κ 2 ≥ · · · ≥ κ ℓ .) We write λ ⊢ n. Sometimes we extend λ into an infinite sequence, by padding it with infinitely many zeroes. We say that λ dominates or majorizes µ, written λ ☎ µ, if for all i ≥ 1 the inequality λ 1 + · · · + λ i ≥ µ 1 + · · · + µ i holds.
Though we will eventually be interested in functions on multislices, we begin by studying the larger vector space V of functions f : S n → Ê on the symmetric group. Note that we can naturally extend the operators K, L, H t to this space V . The partitions λ of n index the irreducible representations of the symmetric group S n . In particular this means that V has an orthogonal decomposition where the isotypic component V λ corresponds to the irreducible representations λ (counted with multiplicity). In analogy with the level/degree decomposition on the Boolean cube, we denote the orthogonal projection of f onto V λ by f =λ . One utility of this decomposition is that V λ is an eigenspace for the operator K, with eigenvalue equal to χ λ (τ), the normalized character evaluated at a(ny) transposition τ ∈ Trans(n). Frobenius [Fro00] determined an explicit formula for these character values: See [DS81, Cor. 1 & Lem. 7] for an explicit proof of the above. Immediate consequences of the above formula are the following: From Equation (33) we can see that H t is an invertible operator for all t ≥ 0, and that it is natural to write H −1 t = H −t . An important feature of the formula for c λ (and hence d λ ) is its relation to majorization order. The following simple calculation was observed in, e.g., [DS81, Lem. 10]: Lemma 19. If λ ☎ µ then c λ > c µ and hence d λ < d µ .
From this we may immediately determine the spectral gap 3 of the transposition chain on S n , which is achieved at λ = (n − 1, 1).
Corollary 20. The minimal nontrivial eigenvalue of operator L on V is 2 n−1 . As we explain shortly, given λ we will be particularly interested in the parameter k = n − λ 1 . An immediate consequence of Lemma 19 is that we can determine the minimal and maximal value of d λ in terms of this parameter k. We skip the straightforward calculations (most of which appear in [Dia88, Ch. 3D, Lem. 2]): Corollary 21. For λ ⊢ n and λ 1 = n − k, we have The upper bound has equality if λ = (n − k, 1, . . ., 1). Further, if k ≤ n/2 we have with equality if λ = (n − k, k).
Why consider the parameter k = n − λ 1 ? It turns out that this parameter is very much analogous to "Fourier degree" for functions on the Boolean cube, as the following result (proved in, e.g., [EFP11, Thm. 7]) shows: Theorem 22. Let f : S n → Ê be a nonzero function. The degree of f is the least k ∈ AE such that f can be represented as a linear combination of "k-juntas" (meaning functions g such that g(π) depends only on some k values π( j 1 ), . . ., π( j k )). It is also equal to the least k such that f =λ = 0 for all λ with n − λ 1 > k.
We now provide two simple applications of Corollary 21 concerning functions of bounded degree: Lemma 23. If f : S n → Ê has degree at most k, then Proof. Using Equation (32) and Corollary 21, Lemma 24. If f : S n → Ê has degree at most k, then for all t ≥ 0, and for all t ≤ 0, H t f 2 ≤ e −2kt/(n−1) f 2 .
Proof. Corollary 21 shows that for t ≥ 0, The bound for t ≤ 0 follows in a similar fashion.
Note that as π runs over all permutations in S n , the string u π 0 runs over all strings in U κ , with equal multiplicity κ 1 !κ 2 ! · · · κ ℓ !. In this way, each function f in the permutation module M κ (i.e., the multislice U κ considered as a representation of S n ) can be naturally identified with a "pullback" function f ∈ V , via f (π) = f (u π 0 ). Conversely, the functions g ∈ V that correspond to functions on the multislice U κ are precisely those that are invariant to the action of the Young subgroup S κ 1 ×· · ·×S κ ℓ . Classical results in the representation theory of the symmetric group show that this subspace has the following isotypic decomposition: where V λ κ is (isomorphic to) a nonzero subspace of V λ (specifically, V λ κ consists of K λκ copies of the irrep associated to λ, where K λκ is the Kostka number. Since this decomposition always includes V (n−1,1) κ ≤ V (n−1,1) (unless κ = (n)), we conclude: Corollary 25. The minimal nontrivial eigenvalue of the operator L on any M κ (for κ = (n)) is also 2 n−1 .
We now define the notion of "degree" for functions on multislices: Definition 26. Let f : U κ → Ê be a nonzero function. The degree of f is the least k ∈ AE such that f can be represented as a linear combination of "k-juntas" (functions g such that g(u) depends only on some k values u j 1 , . . ., u j k ). It is also equal to the least k such that f =λ = 0 for all λ with n − λ 1 > k (in f 's decomposition as in Equation (34)).
Claim 27. The two definitions of "degree" above are indeed the same.
Proof. If g ∈ M κ is a k-junta, it's easy to see that its pullback g : S n → Ê is a k-junta. Thus if f ∈ M κ is a linear combination of k-juntas, so too is its pullback f : S n → Ê. From Theorem 22 we get that f =λ = 0 for all λ with n − λ 1 > k and so the same is true of f =λ .
In the other direction, if f =λ = 0 for all λ with n − λ 1 > k, the same is true of f =λ , and hence f is a linear combination of k-juntas (by Theorem 22 again). We need to show that f is also a linear combination of k-juntas. By linearity, it suffices to assume that f is itself a k-junta; indeed, it further suffices to assume f is of the form f (π) = 1[π(i 1 ) = j 1 , . . ., π(i k ) = j k ] for some coordinates i 1 , . . ., i k , j 1 , . . ., j k ∈ [n]. By definition we have f (v) = f (π) for any π ∈ S n such that u π 0 = v. In particular, it equals the average of f (π) over all such π; i.e., [π(i 1 ) = j 1 , . . ., π(i k ) = j k ] = 1[v i 1 = j 1 , . . ., v i k = j k ].
This means that f is indeed a k-junta on U κ .
An immediate consequence is the following: Corollary 28. Lemmas 23 and 24 hold equally well for functions f ∈ M κ of degree at most k.
Finally, we relate the main theorem in our paper to the comparison of norms for low-degree functions on the multislice: Lemma 29. Fix a histogram κ ∈ AE ℓ + and let p = min i κ i /n. Suppose that f ∈ M κ has degree k. Then for all finite q ≥ 2: f q ≤ (q − 1) Θ(k log(1/p)) f 2 , f 2 ≤ (q − 1) Θ(k log(1/p)) f q ′ , where q ′ is given by 1/q + 1/q ′ = 1.
Applying this to g = H −1 t f (which has the same degree as f ) and using Lemma 24 (and Corollary 28), we deduce f q ≤ H −t f 2 ≤ e 2tk/(n−1) f 2 = (q − 1) Θ(k log(1/p)) f 2 , and similarly for the second claimed inequality.
We end this section by providing an analogue of Friedgut's Junta Theorem [Fri98] for functions on multislices: The proof is essentially identical to Wimmer's proof [Wim14, Sec. VI] of the analogous theorem for functions on the Boolean slice (i.e., the ℓ = 2 case of the above). After replacing Wimmer's pullback function (notated f g therein) with our generalization f , it only remains to substitute in our main log-Sobolev inequality for the multislice U κ .

Nisan-Szegedy Theorem on the multislice
The Nisan-Szegedy Theorem says that a degree-k Boolean-valued function on the Hamming cube is a k2 k -junta. (We remark that the smallest quantity γ 2 (k) that can replace k2 k here is now known [CHS18] to satisfy 3 · 2 k−1 − 2 ≤ γ 2 (k) < 22 · 2 k .) In [FI18a], an analogous result for functions on Hamming slices was shown; they conjectured a similar result for functions on multislices. We resolve this conjecture, following the structure of their proof. This proof structure involves proving three successively stronger versions of the desired theorem.
To prove Theorem 31, we first use our hypercontractivity result to establish the following ana- Proof. Since f has degree at most k, the same is true of L τ f . Thus Lemma 29 shows (taking, say, q = 4) that Inf τ [ f ] = 1 2 L τ f 2 2 ≤ ℓ O(k) · L τ f 2 4/3 . which has degree k since d 1 + . . . + d ℓ = k − r.