On a complete and sufficient statistic for the correlated Bernoulli random graph model

Inference on vertex-aligned graphs is of wide theoretical and practical importance. There are, however, few flexible and tractable statistical models for correlated graphs, and even fewer comprehensive approaches to parametric inference on data arising from such graphs. In this paper, we consider the correlated Bernoulli random graph model (allowing different Bernoulli coefficients and edge correlations for different pairs of vertices), and we introduce a new variance-reducing technique---called \emph{balancing}---that can refine estimators for model parameters. Specifically, we construct a disagreement statistic and show that it is complete and sufficient; balancing can be interpreted as Rao-Blackwellization with this disagreement statistic. We show that for unbiased estimators of functions of model parameters, balancing generates uniformly minimum variance unbiased estimators (UMVUEs). However, even when unbiased estimators for model parameters do {\em not} exist---which, as we prove, is the case with both the heterogeneity correlation and the total correlation parameters---balancing is still useful, and lowers mean squared error. In particular, we demonstrate how balancing can improve the efficiency of the alignment strength estimator for the total correlation, a parameter that plays a critical role in graph matchability and graph matching runtime complexity.


Overview
Paired random graphs with a natural alignment between their vertex sets arise in a wide variety of application domains; for example, the interaction dynamics of the same set of users across two social media platforms, or a pair of connectomes (brain graphs) as imaged from two different subjects of the same species. Given a pair of such graphs, the problem of graph matching-that is, optimally aligning the two vertex sets in order to minimize edge disagreements, usually with the purpose of obtaining the natural alignment-has a rich mathematical history, and graph matching now plays a fundamental role in algorithms for machine learning and pattern recognition; see the excellent surveys in [2,5,13].
The correlated Bernoulli random graph model, described in Section 2, is the focus of our work in this paper. It is a versatile model used to describe two graphs that are correlated with each other across a natural alignment between their vertex sets. The model allows for different probabilities of adjacency for different pairs of vertices, and allows for different edge correlations between different pairs of vertices across the natural alignment. This model is simple enough to be theoretically and computationally tractable, yet it is rich enough to successfully describe real data, and it has been profitably employed in this capacity; see, for example, [3,10,11].
The contributions of this paper fall into two groups, the second group utilizing the machinery of the first group.
Our first group of contributions: In the context of a correlated Bernoulli random graph model, we introduce a "smoothing" procedure, called balancing, which reduces the mean-squared error for any estimator of a function of model parameters; specifically, for any estimator S of a function of model parameters g(θ), the balanced estimator S has the same bias as S, but has lower variance. Indeed, under a mild nondegeneracy condition, we prove in Theorem 3 that if S is an unbiased estimator of g(θ) then S is the UMVU estimator of g(θ); this is because S is a Rao-Blackwellization of S via the disagreement statistic H, and we prove in our main result Theorem 1 that H is complete and sufficient, under the mild nondegeneracy condition. We also prove in Theorem 2, under the mild nondegeneracy condition, that if S is an unbiased estimator of g(θ) then any statistic T is also an unbiased estimator of g(θ) if and only if S = T .
These results should not be taken for granted; in Example 4 we illustrate that even knowing, hence fixing, the mean of the adjacency probabilities creates a violation of the nondegeneracy condition, and the conclusions of the above theorems then will indeed fail, in general.
Our second group of contributions of this paper focus on very recent advances in [4] regarding the correlated Bernoulli random graph model. Specifically, the paper [4] introduced and showed the importance of a novel model parameter called total correlation, which combines inter-and intra-graph contributions to a unified measure of the correlation between the pair of graphs. They empirically demonstrated-in broad families within the model-that graph matching complexity and matchability are each functions of total correlation. They also proved that the statistic called alignment strength is a strongly consistent estimator of total correlation.
Our second group of contributions: In the context of a correlated Bernoulli random graph model, the alignment strength statistic str was shown in [4] to be a strongly consistent estimator of total correlation ̺ T between the pair of graphs; however, we point out here that str is not a balanced statistic, hence, as noted above, the mean squared error in estimating ̺ T is reduced by using str instead. We then prove (in Theorems 5, 6, 7) that there do not exist unbiased estimators for several correlation parameters, including ̺ T . Empirical experiments in Section 5 demonstrate that balancing the numerator and denominator of str separately, which we call the modified alignment strength str ′ , often has less bias than str as an estimator of ̺ T , always has less variance than str, and we conjecture that str ′ always has less mean square error than str in estimating ̺ T .
The organization of this paper is as follows. The correlated Bernoulli random graph model, important functions of the parameters, and important statistics are described in Section 2. Our main results are stated in Section 3 and proved in Section 4. Empirical demonstrations are in Section 5.

Correlated Bernoulli Random Graphs
We begin by describing the correlated Bernoulli random graph model. It consists of a pair of random graphs; without loss of generality these graphs are on the same vertex set. (Indeed, the natural alignment between their vertex sets is a bijection, and the associated one-to-one correspondence can be thought of as an identification.) For simplicity of further notation, let us suppose that the N ( = number-of-vertices-choose-two) pairs of vertices are arbitrarily ordered. It is not hard to see that that the choice of these parameters uniquely specifies the joint distribution of the two graphs (see Appendix A of [4]). Indeed, we can sample from the distribution in the following manner. For each i = 1, 2, . . . , N independently, sample X i ∼ Bernoulli(p i ), then conditioned on the value these are, respectively, the probability that X i = Y i = 1, the probability that X i = Y i = 0, and the probability that [X 1 = 1 and Y i = 0]. Let X and Y denote the random vectors whose ith components are respectively X i and Y i , for all i = 1, 2, . . . , N; thus, in effect, X and Y are like the adjacency matrices representing the respective graphs. Let X := {(x, y) : x, y ∈ {0, 1} N } denote the sample space for the correlated Bernoulli random graph model; in particular, x and y respectively are possible realizations of the adjacency vectors X and Y .
Define R o := {(p 1 , p 2 , . . . , p N , 0, 0, . . . , 0) : p 1 , p 2 , . . . , p N ∈ R} The parameter space Θ will be called nondegenerate if Θ ∩R o has an interior point, relative to R o ; i.e. there exists z ∈ Θ ∩R o and real number ǫ > 0 such that Θ ∩ R o contains all points in R o that are less than ǫ distant from z.
Nondegeneracy of Θ will play a critical role here; it is an assumption explicitly required for most of the theorems in this paper. Furthermore, when this condition is assumed, it is not merely for ease of exposition or analysis; indeed, we will demonstrate that absence of this condition, when it is assumed, can falsify the conclusions of the theorems that assume this condition.

Important statistics and functions of the parameters
The most important statistic in this paper, the disagreement vector statistic H, is defined first.
This statistic is foundational for the first group of our results; in Theorem 1 we will show, under the nondegeneracy condition, that H is complete and sufficient.
Definition 2. The (vector-valued) disagreement vector statistic H : X → {0, ⋆, 1} N is defined as follows: For all (x, y) ∈ X , the vector H(x, y) ∈ {0, ⋆, 1} N is such that, for each i = 1, 2, . . . , N, the ith component of H(x, y) is equal to 1 if x i = y i = 1, is equal to 0 if x i = y i = 0, and is equal to ⋆ if x i = y i . For all h ∈ {0, ⋆, 1} N , the preimage of h, which is the set H −1 (h), is denoted as X h , and is called a disagreement class. Note that X is partitioned into the disjoint union X = h∈{0,⋆,1} N X h .
The following definitions are key for the second group of our results.
The Bernoulli parameter mean µ and the Bernoulli parameter variance σ 2 are defined as The empirical density of X, denoted d X , the empirical density of Y , denoted d Y , and the combined empirical density, denoted d X,Y , are statistics X → R that are respectively defined as Clearly, all three of these statistics are unbiased estimators of the parameter µ. Then, we define the statistics d X∩Y , d X∪Y : X → R as Note that for all (x, y) ∈ X , we have that d X∩Y (x, y) := |{i :x i =1 and y i =1}| N , and we also have that Next, the disagreement enumeration statistic ∆ : X → R is ie. the number of components at which X and Y disagree. Clearly, we have, for all θ ∈ Θ, that The heterogeneity correlation ̺ H is a parameter defined by In the case where µ is 0 or 1 then any convention may be adopted for defining ̺ H (but it must be a value between 0 and 1). It is not hard to show that a) it holds that 0 ≤ ̺ H ≤ 1, and b) it holds Define the total correlation parameter ̺ T as In the case where µ is 0 or 1 then any convention may be adopted for defining ̺ T (but it must be a value between 0 and 1). Note that in the case where all ̺ i are equal, say to the value It is always the case that 0 ≤ ̺ T ≤ 1. In [4] it was empirically demonstrated-for the correlated graphs in broad families within our model-that graph matching complexity as well as graph matchability are each functions of total correlation, hence the importance of total correlation.
The alignment strength str : X → R is an important statistic defined as In the case that x and y are both all zeros or both all ones then any convention may be adopted for defining str (but it should be a value between 0 and 1). The definition of alignment strength str arose in [4] in a natural way. Specifically, 1 minus the alignment strength is the ratio of disagreements between the two graphs -through the natural alignment-divided by the average number of disagreements over all vertex bijections between the two graphs; see there for more details. In [4] it is proven under mild conditions that str is a strongly consistent estimator of ̺ T .
An equivalent formula for alignment strength is it follows immediately using the easily-derived identities mentioned later in Equations (8) through (12).

The Results
As mentioned in Section 1, our main results, which will be listed in this section, can be divided into two groups.
The first group of our results: In the context of correlated Bernoulli random graphs, we begin with Theorem 1, which asserts, under a nondegeneracy condition, that the disagreement vector statistic H is complete and sufficient; using this, given any estimator of a function of model parameters, we describe a way to refine ("balance") the estimator, reducing the mean squared error.
Indeed, under the nondegeneracy condition, given any unbiased estimator of a function of model parameters, we then characterize all unbiased estimators (Theorem 2) and the UMVU estimator (Theorem 3). The second group of our results: Theorems 5, 6, and 7 show that there are no unbiased estimators of various graph correlation measures, including total correlation ̺ T ; however, not only does balancing alignment strength improve its mean squared error in estimating ̺ T , but balancing numerator and denominator separately is seen empirically to be a further improvement.
Our first result is the following theorem.
Theorem 1. If the parameter space Θ is nondegenerate, then the disagreement vector H is a complete and sufficient statistic.
Theorem 1 is proved in Section 4.2.4.
Given any statistic S : X → R, define the statistic S : X → R as follows. For all (x, y) ∈ X indeed, consider h = [⋆, ⋆, . . . , ⋆] T , and consider (x ′ , y ′ ) ∈ X h such that x ′ is all zeros and y ′ is all ones, and consider (x ′′ , y ′′ ) ∈ X h such that the first ⌊ N 2 ⌋ entries of x ′′ are all zeros and of y ′′ are all ones, and the last ⌈ N 2 ⌉ entries of x ′′ are all ones and of y ′′ are all zeros-the statistic has the value 1, and at (x ′′ , y ′′ ) has a value approaching 1 2 , hence the statistic is not constant on X h , hence is not balanced. Example 3. When N > 1, the alignment strength statistic str is NOT a balanced statistic. This is because in Example 1 we have that the numerator of str in Equation (1) is balanced, and in Example 2 we have that the denominator of str in Equation (1) is not balanced; hence str is not a constant function on all X h , and is thus not balanced.
It is important to note that the assumption of nondegeneracy for the parameter space Θ is needed in Theorems 2 and 3, as highlighted in the next example. (In particular, this points to the non-triviality of Theorems 2 and 3.) Example 4. If the value of µ is known, hence fixed, then Θ is contained in a particular hyperplane intersecting R o , and Θ is degenerate; in this scenario, we will show in Section 4.8 that unbiasedness of estimators is not characterized as described in Theorem 2 and, often, there do not exist UMVU estimators for functions of the model parameters, even when there exist unbiased estimators.
A practical takeaway of Theorems 2 and 3 is that when you have a statistic estimating any function of the parameter space then it is a good idea to balance this statistic (if it is computationally tractable to do); indeed, the expectation and thus bias is not affected, and the variance can be lowered; thus, this balancing can lower the mean squared error associated with the estimation.
The following corollary is an immediate consequence of Theorem 3 and the fact that sums and products of constant functions (on respective X h ) are constant (on respective X h ).
Corollary 4. Suppose the parameter space Θ is nondegenerate, and a statistic S : X → R is an unbiased estimator of g(θ), where g : Θ → R, and a statistic S ′ : X → R is an unbiased estimator The next set of theorems are applications of the above theorems-and the methodologies of their proofs-to unbiasedness and efficiency of statistics for estimating various graph correlation parameters, particularly total correlation. We prove Theorems 5 and 6 in Section 4.5.
In the following negative result on estimating edge correlation, besides the assumption of a nondegenerate parameter space, we have additional assumptions that all pairs of vertices share the same edge correlation parameter (ie. Θ is restricted so that, for all θ ∈ Θ, it holds that ̺ 1 = ̺ 2 · · · = ̺ N ), and we also assume that this edge correlation parameter is not always zero. Specifically: Theorem 7. Suppose that the following three conditions hold: a) The parameter space Θ is nondegenerate.
b) The parameter space Θ is such that edge correlations are component-uniform, meaning that there exists a function ̺ E : Θ → R such that, for all θ ∈ Θ and all i = 1, 2, . . . , N, When these three conditions hold then there does not exist an unbiased estimator of ̺ E .
We prove Theorem 7 in Section 4.4.
Remark 8. Suppose the parameter space Θ is nondegenerate. As mentioned in Example 3, align- (So, the bias in estimating ̺ T is the same for str as for str, but the variance of str is less than the variance of str.) Next, define the modified alignment strength ; by Corollary 4, str ′ is balanced. We will empirically show in Section 5 that str ′ is often superior to str as an estimator of ̺ T . Also, in Section 4.6 we will prove the following clean formulas: When x and y are both all zeros or all ones then any convention for defining str ′ is acceptable, provided that it is between 0 and 1. Note that str ′ can have less variance than str (and in general it does) without violating Theorem 3, since the expected values of str ′ and str can be different.
Sometimes balancing a statistic-even at one sample space point-requires averaging an exponential number of values. Remark 8 is notable for its simple expressions for the balanced statistics comprising str ′ , and in Section 5 a linear time algorithm is given for computing str.
The following result is proved in Section 4.7; it follows from Corollary 4 and Remark 8.
Corollary 9. Suppose the parameter space Θ is nondegenerate and also suppose that Θ ⊆ R o .
These are the results in this paper, and they will be proven in the next section, Section 4.

Proof of the Results in Section 3
We begin by proving Theorem 2 and Theorem 3; the proofs of Theorem 1 and Theorem 7 will then be built on the methodology of the forward direction of the proof of Theorem 2. The rest of the results in Section 3 will also be shown.

Proof of the reverse direction of Theorem 2
The reverse direction of Theorem 2 can be equivalently formulated in the following way. Suppose the parameter space Θ is nondegenerate, and a statistic S : X → R is an unbiased estimator of g(θ), where g : Θ → R. If the statistic T : X → R satisfies the condition that for all . Proving the reverse direction of Theorem 2 is quite straightforward. For each h ∈ {0, ⋆, 1} N , the elements of X h are equiprobable. In particular, for all θ ∈ Θ it holds that Thus T is an unbiased estimator of g(θ).

Proof of the forward direction of Theorem 2 and of Theorem 1
The proof of the forward direction of Theorem 2 involves notation that is complex at first glance, and the core ideas may be challenging to follow when presented all at once in full generality. Our expositional strategy is as follows. After proving the basic preliminary Lemma 10 in Section 4.2.1, we proceed to first prove the forward direction of Theorem 2 in the special cases where N = 1, 2 in Section 4.2.2, so that the notation, reasoning, and strategy are crystal clear, and then in

Preliminaries
We begin with a technical lemma, Lemma 10. Two polynomials in a single variable that are equal as functions at infinitely many points are, by interpolation theory, equal algebraically (meaning that the two polynomials have the same coefficients as each other). However, for two polynomials in more than one variable, this may fail. For example, consider the polynomial p 2 1 − p 2 and the zero polynomial, in the two variables p 1 and p 2 . These two polynomials agree as functions on a parabola, but they are not equal algebraically. However, if two polynomials of any degree agree as functions on an open neighborhood then they are equal algebraically. Formally: Lemma 10. Suppose that Θ is nondegenerate, and g,g : Θ ∩ R o → R are two polynomials in the variables p 1 , p 2 , . . . , p N such that for all θ ∈ Θ ∩ R o it holds that g(θ) =g(θ). Then the respective coefficients of the polynomial g are identical to the respective coefficients of the polynomialg.
The proof of Lemma 10 is a straightforward induction on the maximum degree of the polynomials g andg, considering sequential partial derivatives. An equivalent formulation of Lemma 10 can be found in the classical textbook Algebra of Serge Lang [7], Chapter IV, Corollary 1.6. To best illustrate, we begin with a proof for the case where N = 1. Focussing on Θ ∩ R o , we see that E(S) = (1 − p 1 ) 2 S(0, 0) + p 1 (1 − p 1 ) S(0, 1) + S(1, 0) + p 2 1 S(1, 1). In particular, g needs to be a quadratic polynomial in the single variable p 1 on Θ ∩ R o , say g(p 1 ) := g (0) p 0 where g (0) , g (1) , and g (2) are fixed coefficients. By the nondegeneracy of Θ and Lemma 10, we can uniquely represent polynomials as vectors with respective entries being the coefficients of p 0 1 , p 1 1 , and p 2 1 , respectively. Thus (1 − p 1 ) 2 is represented as , and g is represented as g (2) . In particular, S being an unbiased estimator for g on Θ ∩ R o means that g (1) g (2)     Denote the left hand side matrix as A, that is A = ; since A is invertible, and T has to satisfy the above equation as well, we therefore have that , which precisely says that for all h ∈ {0, ⋆, 1} N it holds that (x,y)∈X h T (x, y) = (x,y)∈X h S(x, y), and the case where N = 1 is proven.
By further way of illustration, we next prove the case where N = 2. (Although 2 is not n 2 for any n, so there isn't a graph with exactly two pairs of vertices, nonetheless the random vectors X and Y and all other aspects of the model are still well-defined.) Focussing on Θ ∩ R o , we have S(x, y) S(x, y).
Note in particular that g would have to be a polynomial in the two variables p 1 , p 2 , with its monomials each consisting of a constant, denoted g (k 1 ,k 2 ) , times p k 1 1 p k 2 2 , where k 1 , k 2 ∈ {0, 1, 2}. By the nondegeneracy of Θ and Lemma 10, g can be uniquely represented by the vector of coefficients ordered lexicographically (i.e. dictionary order) by superscript: [g (0,0) , g (0,1) , g (0,2) , g (1,0) , g (1,1) , g (1,2) , g (2,0) , g (2,1) , g (2,2) Indeed, all other polynomials with monomials each consisting of a constant times p k 1 1 p k 2 2 , where k 1 , k 2 ∈ {0, 1, 2}, will also be similarly represented (uniquely) by the vector of coefficients ordered lexicographically by superscript. For example, in the matrix on the left hand side below, the columns respectively are the vectors (of lexicographically ordered monomial coefficients) representing the respective polynomials ( . . , which are the respective probabilities of (x, y) ∈ X h for h's lexicographically ordered ("⋆" has the value 1 2 ) as: 0 0 , 0 ⋆ , 0 1 , ⋆ 0 , ⋆ ⋆ , ⋆ 1 , 1 0 , 1 ⋆ , 1 1 . Now, S being an unbiased estimator of g on Θ∩R o means precisely that the following linear system holds: Observe that the left-hand-side matrix above is the Kronecker product A⊗A, where A is the lower triangular matrix with diagonals all ones mentioned in the proof of the case where N = 1. Note that A ⊗ A is thus lower triangular with diagonals all ones, thus has nonzero determinant and is invertible. Since T solves the same linear system (above, Equation 4) as S does, we conclude -by multiplying both sides of the equation above by the inverse of A ⊗ A-that for all h ∈ {0, ⋆, 1} N it holds that (x,y)∈X h T (x, y) = (x,y)∈X h S(x, y), and the case where N = 2 is now also proven.

Proof of the forward direction of Theorem 2, the general case
With the proofs of the cases N = 1, 2 as illustration, we now prove the result for arbitrary N.
Let ⊗ N A denote the N-fold Kronecker product A ⊗ A ⊗ · · · ⊗ A. Next, let − − → S/H denote the vector whose components are respectively (x,y)∈X h S(x, y) for each of the h ∈ {0, ⋆, 1} N , ordered lexicographically according to h. We will be focused on Θ ∩ R o ; so here, g is a polynomial in the variables p 1 , p 2 , . . . , p N , with its monomials each consisting of a constant, denoted g (k 1 ,k 2 ,...,k N ) , times p k 1 1 p k 2 2 · · · p k N N , where k 1 , k 2 , . . . , k N ∈ {0, 1, 2}. By the nondegeneracy of Θ and Lemma 10, we have that g can be uniquely represented by the column vector of monomial coefficients ordered lexicographically by the powers of p 1 , p 2 , . . . , p N ; denote this vector g.
We claim that S being an unbiased estimator for g(θ) on Θ∩R o means precisely that S satisfies the linear system This can be verified directly by noting that for each h ∈ {0, ⋆, 1} N and (x, y) ∈ X h , the probability of (x, y) is given by (and is simplified with elementary algebra) where, in the subscript of A k j +1 , 2·h j +1 , "⋆" has the value 1 2 , meaning that when h j is ⋆ then 2 · " ⋆ " + 1 is defined to be 2. With nondegeneracy of Θ and Lemma 10, we have uniqueness of polynomial coefficients, and Equation (6) directly yields Equation (5).
Since ⊗ N A is a lower triangular matrix with all diagonals being ones, it is an invertible matrix. Now, T has to satisfy Equation (5)

Proof of Theorem 3
Theorem 3 states that if the parameter space Θ is nondegenerate, and a statistic S : X → R is an unbiased estimator of g(θ), for g : Θ → R, then there exists a UMVU estimator of g(θ) and, in fact, the balanced statistic S is the UMVU estimator of g(θ).
We prove this as follows, under the assumption that Θ is nondegenerate. Recall first that S is an unbiased estimator of g(θ) by Theorem 2. Now, note that S is a constant function on is precisely the mean of the values of S on X h , which is precisely the statistic S.
(An excellent reference for Rao-Blackwell theory are the original papers [12,1], and an excellent reference for the Lehmann-Scheffe Theorem are the original papers [8,9].)

Another proof of Theorem 3, by first-principles
In this section, we mention a "first-principles" proof of Theorem 3, besides the Lehmann-Scheffe proof methodology in the previous Section 4.3.
For any statistic T : X → R which is an unbiased estimator of g(θ), we compute the variance of T , for any particular θ ∈ Θ, as:

 
In particular, Var(T ) can be minimized over such unbiased T by, for all h ∈ {0, ⋆, 1} N , minimizing y); this is because of Theorem 2. Treating the T (x, y) as variables, this convex optimization problem has a global minimizer, when the objective gradient is equivalued (by the KKT conditions) hence the minimum variance is achieved when T = S, independent of θ ∈ Θ, and Theorem 3 is shown.

Proof of Theorem 7
In this Section we prove Theorem 7.
Theorem 7 states that if the following three conditions hold: a) The parameter space Θ is nondegenerate.
c) The parameter space Θ is not a subset of R o , i.e. there exists θ ∈ Θ such that ̺ E (θ) = 0.
Then, under these three conditions, there does not exist an unbiased estimator of ̺ E (θ).
Suppose, by way of contradiction, that statistic S : X → R is an unbiased estimator of ̺ E (θ).
Focussing our attention on Θ∩R o , where ̺ E ≡ 0, we have by Equation (5)

Proof of Theorems 5 and 6
Theorems 5 and 6 state that if the parameter space Θ is nondegenerate and N > 1 then there does not exist an unbiased estimator of the heterogeneity correlation ̺ H nor of the total correlation ̺ T .
We will just focus on Θ ∩ R o ; on this set it is easy to see that ̺ T = ̺ H . Thus, by the development in Section 4.2 and the nondegeneracy of Θ, we will have proved Theorems 5 and 6 if we show that, on Θ ∩ R o , ̺ H := σ 2 µ(1−µ) is not a polynomial in the variables p 1 , p 2 , . . . , p N . By way of contradiction, suppose that, on Θ ∩ R o , ̺ H is a polynomial in the variables p 1 , p 2 , . . . , p N . Let (p 1 ,p 2 , . . . ,p N , 0, 0, . . . , 0) be an interior point of Θ ∩ R o , relative to R o ; such a point exists by the nondegeneracy of Θ. Let ̺ H (t), σ 2 (t), and µ(t) respectively denote the polynomials in the single variable t, where t is substituted in place of p 1 and, for each of i = 2, 3, . . . , N, the value of p i is fixed to bep i , for ̺ H , σ 2 , and µ respectively. Let I be a real, open interval containingp 1 , such that for all t ∈ I we have (t,p 2 , . . . ,p N , 0, 0, . . . , 0) ∈ Θ ∩ R o ; a nontrivial such I exists by the nondegeneracy of Θ.
Using basic algebra, σ 2 (t) is quadratic in t, and the coefficient of , and the coefficients of the respective powers of t on the left hand side are respectively equal to the coefficients of the powers of t on the right hand side, since I is an interval (and invoking polynomial interpolation theory). However, this implies that polynomial ̺ H (t) can't have positive degree, and thus is constant. However, this constant is nonnegative (indeed, it has been pointed out in Section 2.1 that 0 ≤ ̺ H ≤ 1), which means that the coefficient of t 2 in (the left hand side) σ 2 (t) is positive, but the coefficient of t 2 in (the right hand side) µ(t)(1 − µ(t))̺ H (t) is nonnegative times negative, which is nonpositive. By the contradiction, we have thus proved Theorems 5 and 6.

Proof of Equation (3) in Remark 8
and also recall the definition of the modified alignment strength The main goal of this section is to prove Equation (3) in Remark 8; namely, we show that In order to do this, we will appeal to the following identities: Equation (8)  It is trivial to see that d X∩Y and d X,Y are balanced, so we need only compute d X d Y . Indeed, for any h ∈ {0, ⋆, 1} N and and any (x, y) ∈ X h , we have the following (using the identities in Equations (8) through (12), and combinatorial symmetry, and well-known identities involving binomial coefficients): Thus, by Equation (13) and the definition d X, from this we have that Equation (7), ie Equation (3), is proven, as desired.

Proof of Corollary 9
Corollary 9 states that if the parameter space Θ is nondegenerate and also Θ ⊆ R o then the We prove this now. Because Θ is nondegenerate, we pointed out in Example 1 that ∆ : X → R is the UMVU , which is equal to µ(1 − µ) since by hypothesis d X and d Y are independent.
Finally, by Corollary 4, we have that d X, , and the result is shown.

Necessity of the Θ nondegeneracy assumption in Theorems 2, 3
In the statement of Theorems 2 and 3 we assume that the parameter space Θ is nondegenerate.
In this section we demonstrate that this assumption is required for these theorems to hold.
Specifically, we will focus on a scenario in which the value of µ is known, in which case the parameter space is reduced to parameter tuples that have the prescribed value µ. This restricts the parameter space Θ to a particular hyperplane, which makes Θ degenerate; we will show that the claims of Theorems 2 and 3 then fail, in general.
For simplicity, in this entire section, let us take N = 2, (although 2 is not n 2 for any n, i.e. there isn't a graph with exactly two pairs of vertices, nonetheless the random vectors X and Y and all other aspects of the model are still well-defined), set ̺ 1 = ̺ 2 = 0, and suppose that µ : 0 < µ < 1 is known; other than this, we allow 0 < p 1 < 1 and 0 < p 2 < 1. Here, p 1 +p 2 2 = µ yields p 2 = 2µ − p 1 . Denote δ := min{µ, 1 − µ}, and denote p := p 1 ; the parameter space is reduced to single variable p on the interval (µ − δ, µ + δ). There are 2 4 = 16 points in X ; for each (x, y) ∈ X , the probability of (x, y) is given by which is a polynomial of degree 4.
In Section 4.2, consider the linear system in Equation (4); that 9-by-9 linear system-describing statistic S being an unbiased estimator of g-now becomes a 5-by-9 linear system over here. This is because the columns of the left hand side matrix A ⊗ A and also the right hand side of the linear system, which there were each 9-vectors consisting of the coefficients of particular polynomials in two variables, can each now-in the reduced parameter space-be expressed as polynomials of degree 4 in a single variable, thus with five coefficients instead of 9 coefficients. As a 5-by-9 linear system, there is a nontrivial nullspace, and linear system solutions describing unbiasedness are no longer unique, which implies that there will exist a statistic T also an unbiased estimator of g such that it does not hold for all h ∈ {0, ⋆, 1} N that (x,y)∈X h T (x, y) = (x,y)∈X h S(x, y). This completes our demonstration that Theorem 2 requires the assumption that Θ is nondegenerate.
Next, we illustrate the need for nondegeneracy of Θ in Theorem 3. The difference statistic ∆ is, by definition, unbiased for its expected value when µ is fixed in the setting described above. Indeed, for a specific example among many (for N = 2, and parameter space parameterized by p exactly as above in this section), let µ = .25; we considered ALL unbiased estimators of E(∆) and then we considered which of them had the least variance-among all unbiased estimators of E(∆)-for different fixed values of p. We discovered that none of the unbiased estimators of E(∆) had variance less than or equal to the other unbiased estimators of E(∆), uniformly for all values of p. Thus there isn't a UMVU estimator for E(∆) here. From this we conclude that Theorem 3 needs the condition that Θ is nondegenerate.
The way we computed, in the scenario of this section, the unbiased estimator (unbiased over all p) with least variance (for a fixed value of p) was as follows: Let the points in X be ordered in any specified way, say z 1 , z 2 , . . . , z 16 . Define the matrix M ∈ R 5×16 wherein, for all i, j, the entry M ij is the coefficient of p i−1 in the polynomial φ z j (p) (where φ z j (p) is as defined earlier in this section). For any statistic S : X → R, let S be expressed as a vector S ∈ R 16 wherein, for all i = 1, 2, . . . , 16, we define S i := S(z i ). A function on the reduced parameter space g : (µ − δ, µ + δ) → R can only have an unbiased estimator if g is a polynomial in the variable p of degree at most 4; this is because g would need to be a linear combination of the φ's. Say that g ∈ R 5 is the vector wherein for all i = 1, 2, . . . , 5, we define g i to be the coefficient of p i−1 in g. Because (µ − δ, µ + δ) is a nontrivial interval (indeed, we just need at least 5 points) and by the uniqueness of interpolating polynomials, we have that the unbiased estimators S of any particular g are precisely the solutions S of the linear system M S = g.
Suppose that there exists an unbiased estimator of g. Among unbiased estimators of g, to find one of minimum variance for any specific value of p ∈ (µ − δ, µ + δ), we proceed as follows.
Let the vector of sample point probabilities for the respective 16 sample space points be denoted ̟ ∈ R 16 ; we have ̟ T = [p 0 , p 1 , p 2 , p 3 , p 4 ] · M, which is a positive vector since p ∈ (µ − δ, µ + δ); finding a (globally) unbiased estimator with (specifically for p) minimum variance is equivalent to the quadratic, convex optimization problem min 16 i=1 ̟ i S 2 i such that S satisfies M S = g. Define a bijective change of variables where new variables S ′ ∈ R 16 are such that for all i = 1, 2, . . . , 16 Now this minimum variance problem is equivalent to min S ′ 2 such that S ′ satisfies M ′ S ′ = g. Classical generalized inverse theory guarantees a unique solution S ′ = M ′ † g (the symbol † denotes the Moore-Penrose Generalized Inverse of the matrix), which corresponds to statistic S wherein for each i = 1, 2, . . . , 16, which is unique as having minimum variance (for the particular value of p) among the (globally) unbiased estimators.
This concludes the description of the way we computed, in the scenario of this section, the unbiased estimator (unbiased over all p) with least variance (for a fixed value of p). (An excellent reference for matrix analysis in general, with theory of generalized inverses, is [6].) 5 Simulation experiments: comparing str, str, and str ′ As we mentioned earlier, in [4] it was empirically demonstrated-for correlated graphs in broad families within our model-that graph matching complexity as well as graph matchability are each functions of total correlation, and it was also proved in [4] that alignment strength str is a strongly consistent estimator of ρ T . The specific formulation/definition of alignment strength str arose in a very natural way; see [4]. Nonetheless, str suffers from a deficiency; in Example 3 we pointed out that str is not balanced. The balanced statistic str reduces the variance, keeping the expected value unchanged. In this section we will empirically demonstrate that another balanced statistic, denoted str ′ , is often superior to str in estimating ρ T . Note that there is no contradiction to Theorem 3, which asserts that, assuming the parameter space is nondegenerate, str is UMVUE for E(str); indeed, str ′ can be biased as an estimator of E(str). Which can be a good thing; we will see that str ′ frequently has less bias than str in the estimation of ρ T , and in all of these experiments here str ′ has less variance than str.
But we first make a computationally helpful observation about computing the value of str.
In general, when given an arbitrary statistic S : X → R, the computation of the value of S, even for just one particular sample space point (x, y) ∈ X , can require exponential time; indeed, there are 2 ∆(x,y) values to average. In the case of computing str, this computation can be greatly simplified as follows. Given any particular h ∈ {0, ⋆, 1} N and any particular (x, y) ∈ X h (such that not both x and y are all zeros, and not both x and y are all ones), we have by Equation (2) and Equation (10), that The above provides a linear time algorithm for computing str for any (x, y) ∈ X , although this computation is much more involved then the very simple formula for str ′ as given in Remark 8. In the first set of experiments, we did 200 independent replicates of the following experiment.
We realized p 1 , p 2 , p 3 , p 4 , p 5 , p 6 , ̺ 1 , ̺ 2 , ̺ 3 , ̺ 4 , ̺ 5 , ̺ 6 (which correspond to the six pairs of vertices in a vertex set with four vertices) independently from a Uniform(0, 1) distribution; the first five such experiments' values were: Remarkably, in all of the many tens of thousands of experiments that we conducted, for many different parameter values, we always found that the mean squared error in estimating ̺ T , denote it MSE(·, ̺ T ), was lower for the modified alignment strength str ′ than for the balanced alignment strength str. Based on these computations, we conjecture the following.

Summary and future directions
Our setting is the correlated Bernoulli random graph model for the production of a pair of correlated random graphs, wherein different pairs of vertices are allowed different probabilities of adjacency, and inter-graph edge correlations are allowed to be different for different pairs of vertices. This is a broad and useful model. Our main results come in two groups.
The first group of results: We introduce a balancing procedure to lower the mean squared error for any statistic used to estimate any function of the model parameters; it is essentially a Rao-Blackwellization procedure utilizing the disagreement vector statistic, which we prove is complete and sufficient. Indeed, given any unbiased estimator of any function of the model parameters, we neatly characterize all unbiased estimators, as well as the UMVUE estimator for this function of the model parameters. With these tools we obtain the second group of results, which involve estimating the total correlation parameter, which is of current interest in the theory of Graph Matching, and has been recently shown to play a critical role in matchability also runtime complexity.
Future steps would be to extend our results in this paper to broader random graph models and to settle Conjecture 11.