Testing marginal symmetry of digital noise images through the perimeter of excursion sets

: In this paper we consider digital images for which the pixels values are given by a sequence of independent and identically distributed variables within an observation window. We proceed to the construction of an unbiased estimator for the perimeter without border eﬀects. The study of the ﬁrst and second moments of the perimeter allows to prove auto-normalised asymptotic normality results with an explicit covariance matrix consistently estimated. Theses Central Limit Theorems permit to built a consistent and empirical accessible test statistic to test the symmetry of the marginal distribution. Finally the asymptotic perimeter behaviour in large threshold limit regime is also explored. Several numerical studies are provided to illustrate the proposed testing procedures.


Introduction
The stochastic modeling of images by spatial random fields allows to set up a convenient statistical framework for different issues in image processing for image denoising, pattern detection, segmentation or classification [28,21,11].A particular interest directed towards the study of geometric attributes and features of objects has grown these last years.Roughly speaking the considered quantities are the surface area, perimeter and Euler characteristic, which is a topological invariant that is equal to the number of connected components minus the number of holes, in a black and white binary image obtained by thresholding a gray image at some fixed level.These quantities are related to intrinsic volumes and Minkowski functionals particularly studied in the integral and stochastic geometry [7,31] or Lipschitz-Killing (LK) curvatures of excursion sets (see [32] for a formal introduction to the subject).Those functionals are robust and efficient shape descriptors that have been largely applied to a variety of domains: cosmology (e.g. the morphological analysis of cosmic microwave background [30,26]), astrophysics (e.g.modeling galaxy formation [17]), military (e.g.mine field detection, [25]) medicine (e.g.brain imaging, [16], study of synthesized 2D digital mammograms, [4,12]).
Very important results have been obtained in the framework of smooth stationary random fields, especially for Gaussian related fields (see [1]).In this framework, the theoretical means of LK curvatures are explicitly computed with respect to the parameters of the fields [34,33,2] and can be efficiently estimated from images [4].The consistent estimation of related variances had been recently studied in [12].Moreover, Central Limit Theorems (CLT) have been proven for Gaussian fields [15,22] that paved the road for statistical hypothesis tests.Several extensions for non-Gaussian fields, namely shot noise random fields are also available for means of LK curvatures [23] and associated CLT's [24].However the framework of smooth fields, indexed by continuous space variables, is not appropriate with the discrete framework induced by digital images.Let us remark that links with discrete setting have been recently explored in [3].
In this paper we consider this discrete framework for digital images for which white noise is well defined.More precisely, we assume that pixels values are given by (X i,j ) i,j≥1 a sequence of independent and identically distributed (i.i.d.) variables within an observation window S. We have in mind the residuals of a denoising procedure or a linear regression, or the difference between two images as for brain activities study.Therefore, we are interested to test the natural symmetry hypothesis, that is to know whether X i,j is drawn from a symmetrical distribution.This assumption of marginal distribution symmetry is called here the null hypothesis H 0 and includes Gaussian or Student distributions.Being able to formally test for symmetry hypotheses is an important topic in many fields since this basic assumption contains important information regarding the underlying model, that would allow to validate it before further investigation.This problem has been largely considered in the literature for i.i.d.sample, especially due to its importance for time series analysis.Different approaches to construct a symmetry test include the use of empirical distribution for known center of symmetry [6], the characteristic symmetry function and study of its behavior to know if the distribution is symmetric or not [8] with unknown center.Other tests rely on the study of the skewness of the distribution function as initiated by [18]; see [27] for a free-distribution test based on Bonferroni's measure.For an overview and comparison of the existing methods we refer to [29] and [20].
In our original approach we attempt to use our digital image setting and the geometrical features of its excursion sets.More precisely, for a given threshold t ∈ R we consider the binary image given by the i.i.d.Bernoulli variables (1 {Xi,j ≥t} ) i,j≥1 , coloring black pixels for values equal to one and considering the area of black components as well as its perimeter inside a given domain.This allows us to build an unbiased estimator of the perimeter, without border effects.This framework is close from the one used by [14] to set up a goodness of fit test for complete spatial randomness.In the aforementioned paper the authors consider the counting process obtained from a homogeneous Poisson point process such that the observed values (X i,j ) are i.i.d.Poisson random variables.They obtain explicit expectation and covariance matrix for the three Lipschitz-Killing curvatures at a fixed level as well as CLT to build their test.In the present paper, we focus on the perimeter and obtain a multivariate CLT with respect to several threshold levels with an explicit covariance matrix, under a general i.i.d.assumption, in a dense tiling framework.The choice of our unbiased perimeter estimator permits us to preserve the symmetry of the distribution, in contrast with perimeter with border.This lead us to construct the statistical test of symmetry, by considering the behavior of the ratio between two specific thresholds.We also explore the asymptotic in large threshold limit regime and obtain the equivalence between the asymptotic behavior of the expectation value and the variance establishing a joint auto-normalized Central Limit Theorem for dense tiling and large threshold limit regime in a white noise framework, similarly to [10] for crossings in dimension 1.
The paper is organized as follows.In Section 2 we introduce our mathematical framework for binary images and we proceed to the construction of an unbiased estimator for the associated perimeter.Then, in Section 3 we study the first and second moments of the perimeter, which allows us to prove asymptotic normality results with an explicit covariance matrix, for which we consider several consistent estimators.The construction of our consistent symmetry test via empirical accessible test statistics is proposed in Section 4. Numerical evaluations are presented in Section 5.The case of large threshold values is investigated in Section 6, that yields to a new a central limit theorem and test statistics illustrated with numerical comparisons.Finally, we postpone the technical proofs to Section 7, and additional numerical evaluations to the Appendix.

Construction of the binary image
Square tiling Let m be an integer with m ≥ 2, without loss of generality, we consider our observation window as the unit square S = [0, 1] 2 and we divide our window into m 2 pairwise disjoint squares.This provides a regular tiling of S with squares of "size" (side length) 1/m, i.e., m ], for i, j ∈ {1, . . ., m}.

The C
(m) i,j will be referred to as cells.We denote by E m the set of edges in S = (0, 1) 2 , each w ∈ E m is a segment of length 1/m.C m i,j .Let (X i,j ) 1≤i,j≤m be a sequence of random variables defined over the same probability space (Ω, A, P), that are independent and identically distributed.Each pixel value X i,j is associated to the cell C (m) i,j .In Figure 2, X i,j ∼ U(0, 1) and m = 20.Let us consider a threshold parameter t ∈ R. In order to create the associated binary image, we introduce a random m × m-matrix Z(t) = (Z i,j (t)) 1≤i,j≤m , where i,j (t) := 1 {Xi,j ≥t} , for i, j ∈ {1, . . ., m}.

Each cell C
(m) i,j is associated to black or white according to whether Z i,j (t) = 0 or Z i,j (t) = 1.Then, Z i,j (t) follows a binomial distribution of parameter (1, p t ), where with F the cumulative distribution function associated to X i,j .

Perimeter of a binary image
Let Z = Z(t) be the binary image Z at that given threshold t ∈ R. Following the approach presented in [3], for each edge w ∈ E m , we aim to know if w contributes to the perimeter of the black component of Z.Let w be a horizontal edge of the form w = l−1 m , l m × { k−1 m }, w belongs to both cells C l,k−1 and C l,k , which means that w is a contribution to the perimeter if Z l−1,k = Z l,k .Following this consideration, one can consider to count the horizontal contributions and to count the vertical ones.Definition 2.1 (Perimeter and scaled perimeter of a binary image).We denote by P (1)  m (t) = m l=1 m k=2 f (t) 1 (l, k) the sum of all horizontal contributions and P (2) 2 (k, l) the sum of the vertical ones, then, the perimeter and scaled perimeter are given by P m (t) := P (1)  m (t) + P (2)  m (t), Pm (t) := 1 m 2 (P (1)  m (t) + P (2)  m (t)) .
An equivalent way to compute the perimeter is by considering, for each edge w, the maximal and minimal values on the two sides of w.Then, the perimeter is equivalently given by where f (t) + (w) = max(Z l,k−1 (t), Z l,k (t)) and f (t) − (w) = min(Z l,k−1 (t), Z l,k (t)) for w the common edge between C l,k−1 and C l,k .The interested reader is referred to [3].The only edges w ∈ E m that contribute to the computation are those for which f (t)  + (w) = 1 and f (t) − (w) = 0 which are exactly the edges belonging to the perimeter intersected with the interior of our observation window.

Algorithm 1 Perimeter of a binary image computation using Equation (2)
Initialization Binary image Z of shape (M, N ), Remark 1.Notice that the proposed method in Definition 2.1 (see Equation (2)) does not take into account the contribution of the edges that belongs to the frontier of the observation window S. Indeed, in Algorithm 1, the border cells can only contribute to the perimeter with one or two edges contrary to the other cells that can have up to four edges contributing to the perimeter.In this sense this method is unbiased in comparison with [14]'s technique.In the aforementioned paper, the authors introduce a white artificial frontier around the image and then, compute the perimeter of the extended new image (white frontier included), making all cells contributing to the perimeter have a four edge contribution.

Moments and asymptotic normality
In this section we investigate the first and second moments of the perimeter.This preliminary study will be useful to state our multivariate Central Limit Theorem (see Theorem 3.3 below).Proposition 3.1 (First moment of the perimeter).The expected value of the scaled perimeter in (1) is given by The proof of Proposition 3.1 is postponed to Section 7.
Remark 2. Assume that X i,j is drawn from a symmetrical continuous distribution of axis θ ∈ R, i.e p θ−t = 1 − p θ+t , ∀t ∈ R.Then, µ P (p θ−t , m) = µ P (p θ+t , m) and therefore, t → µ P (p t , m) has an axis of symmetry on θ.This implies that the ratio ( Pm (θ−t))/( Pm (θ+t)) would be distributed around 1 in the case of symmetry.This consideration will be crucial for the proposed symmetry testing procedure in Section 4.
We now study the second moment of the perimeter of the considered binary image.
Proposition 3.2 (Covariance between perimeters at two thresholds).Let t, s ∈ R be two given thresholds.The covariance between the scaled perimeter taken at levels t and s is given by Taking t = s, we obtain the following variance formula The proof of Proposition 3.2 is postponed to Section 7. If we add in Proposition 3.2 the bias induced by the boundary of the window S to the variance computation, we obtain the same results as those provided in [14] (see Theorem 4.1).Beyond Propositions 3.1 and 3.2, we can prove the multivariate asymptotic normality as m → ∞ of our geometrical feature for a given vector of thresholds.
where d − → holds for the convergence in distribution and N (0, Σ r ) for the rdimensional centered Gaussian distribution with covariance matrix Σ r given by The proof of Theorem 3.3 is postponed to Section 7. Furthermore, some numerical illustrations of Theorem 3.3 in the case r = 2 can be found in Appendix A.
Finally, as can be seen in ( 5), the last theoretical result that will be needed for the construction of an empirical accessible test statistic is an estimation of p t .This will be discussed in the next section.
In the following, we will present two estimators and study their consistency properties.In this section, for the sake of clarity, we denote p(t) := p t , ∀ t ∈ R.
One way to estimate p(t) is by considering the area of the excursion set.Another idea for the estimation of p(t) is built on the relationship that exists between the expectation value of the perimeter at a level t and p(t) (see Equation ( 3)): Definition 3.4 (Area and perimeter based estimator for p(t)).Given a threshold t ∈ R, we define Besides, we partition the image S in m 2 /4 sub-images S i 2 of size (2 × 2).We denote by Pi 2 (t) the value of the scaled perimeter for each sub-image.Let S m (t) := 4 Pi 2 (t) and the continuous mapping g : [0, 1] → R defined by Then, for t > θ with θ the median value of the distribution, we define the perimeter based estimator of p(t) as

Proposition 3.5 (Asymptotic normality for the proposed estimators for p(t)).
Let p A m (t) and p P m (t) as in Definition 3.4.Then, it holds that Similarly, it holds and The proof of Proposition 3.5 is postponed to Section 7.
Remark 3. It is also possible to estimate p(t) for t < θ by considering the con- else, and applying the same procedure.

Perimeter based symmetry test
As previously discussed in Remark 2, we now aim to build a test using the specific structure of the mean perimeter in the case of symmetry.
A test statistic Following the idea in [4] We postpone the proof of Proposition 4.1 in Section 7.
Let t ∈ I θ and H 1 (t) be the alternative hypothesis Proposition 4.2 (Consistency of the proposed symmetry test).For t ∈ I θ , it holds that P H1(t) (φ P m (σ) = 1) → 1 for m → ∞, with φ P m (σ) as in Equation (8).We postpone the proof of Proposition 4.2 in Section 7.

Empirical accessible test statistic
Notice that, if we take a consistent empirical estimator σ m (θ + t) of the variance of the considered ratio at level θ+t, then, we can consider the empirical accessible test statistic Using the estimators of p(t) presented in Proposition 3.5, we can now construct estimators of the variance of the ratio to built φ P m ( σ) in ( 9).Proposition 4.3 (Plug-in variance estimators).Let p m (θ + t) be a consistent estimator of p(θ+t) (i.e p m (θ + t) → p(θ + t), a.s.) and σ(θ+t) as in Proposition 4.1.We define The proof of Proposition 4.3 is postponed to Section 7.
Corollary 4.4.Using the same framework as in Proposition 4.3, we denote by σ P m (θ + t) and σ A m (θ + t) the plug-in estimators based on Equation (10) using respectively p P m (θ + t) and p A m (θ + t) as in Proposition 3.5.Then, σ P m (θ + t) and σ A m (θ + t) are two consistent estimators of σ(θ + t) and asymptotic normality in (11) holds for these two estimators.
Thus, from Corollary 4.4 and using Equation ( 9), we get two empirical accessible test statistics with (σ k-s ) 2 (θ + t) = 2 p(θ+t) .The proof of Proposition 4.5 is postponed to Section 7. Let us denote One can provide an alternative Kolmogorov-Smirnov based test statistic with asymptotic level α , where σ 2 (s) is given by ( 7) and σ 2 k-s (s) by ( 14), as shown in Figure 9 in Appendix B. Then, the perimeter based test has a smaller variance than the alternative Kolmogorov-Smirnov one.

Some numerical studies for the proposed symmetry test
In this section, we report simulation results for our symmetry test, for samples of 500 images of size m × m, for m = 100, 512, 1024 and different choices of threshold level.Firstly, we display the results for three symmetrical distributions (Gaussian, Student and Uniform distributions, see Figure 4) and two asymmetrical ones (Exponential and Skew-normal distributions, see Figure 5).
For a better approximation of θ, we use the mean estimator provided by the NumPy library, making use of the fact that for a symmetrical distribution it is equivalent to use the mean or median to characterize the symmetry of a distribu-tion [i.e. a random variable X is symmetrically distributed around θ, the center of symmetry, As we can appreciate in Figure 4, the test is able to successfully accept the H 0 hypothesis.However, the choice of the threshold is crucial in regard to the quality of the test.Unsurprisingly, for extreme thresholds, the test is less precise.We will focus on this important point in Section 6 below.Notice that the performances of our test statistic φ P m with the two proposed estimators of the variance σ A and σ P (in red crosses and blue diamonds, respectively) seem to be globally similar.Furthermore, these satisfactory results are comparable to those obtained with the Kolmogorov-Smirnov based estimator in (15) (black stars).m = 1) for different thresholds t, for 500 Montecarlo simulations for m = 512 with a skew-normal marginal distribution with different choices of parameter a = 0.1, 0.5, 1, 3 (from first to fourth panel) and Exp(1) distribution (fifth panel).We consider different test statistics: φ P m ( σ P ) as in (12) (blue diamonds), φ P m ( σ A ) as in (13) (red crosses) and φ k-s m ( σ k-s ) as in (15) (black stars).

Skew-normal and Exponential marginal distributions
Figure 5 numerically describes the good performance of our test φ P m to distinguish H 1 (t) from H 0 (t), for t ∈ R + , in two asymmetric distributional cases: Skew-normal distribution with probability density function given by f (x) = 2φ(x)Φ(ax), for x ∈ R, with a ≥ 0, φ(•) (resp.Φ(•)) the standard normal probability density function (resp.standard normal cumulative distribution function) and Exponential distribution with parameter 1. Unsurprising, in the skew-normal case, we observe that when a is larger the power of the test increases, indeed the smaller the parameter a, the less asymmetric the distribution will be.This behavior can be easily observed for the four chosen levels of skewness parameter a = 0.1, 0.5, 1, 3 in Figure 5 (from first to fourth panel).

Perimeter based symmetry test for large threshold and dense tiling
Following the considerations in Section 5, we aim to improve the quality of the proposed test for large thresholds.As in [10], we firstly study the convergence of the ratio between the first two moments of the unscaled perimeter P m (see Equations ( 3) and ( 4)).In Lemma 6.1 below, the expectation and the variance are proven to have the same order of magnitude for large threshold and dense tiling.
Lemma 6.1 (First two moments ratio for large threshold).Let P m (t) as in (1) and µ P (p t , m) and σ 2 P (p t , m) the associated mean and variance.Let (t m ) be a sequence of positive real numbers such that p tm → 0 as m → ∞, then, and furthermore, for m large enough, The proof of Lemma 6.1 is postponed to Section 7. By using Lemma 6.1 we can now formulate a bivariate Central Limit Theorem for our geometrical feature for m → ∞ and large thresholds.
The proof of Proposition 6.2 is postponed to Section 7. Note that we have written E Pm (t (m,γ) ) and E Pm (s (m,γ) ) in ( 16) for visual symmetry purposes although those two quantities are equal (see Equation ( 3)).Finally, we can easily derive the following result.
Using Corollary 6.3, one can build a modified version of the empirical accessible test statistic φ P m in ( 9) with asymptotic level α, adapted to large thresholds.Under H 0 (t) hypothesis, let θ be the median value of the distribution and u (m,γ) ∈ R + so that θ +u (m,γ) Then, from Corollary 6.3, it holds that This asymptotic allows us to define a new symmetry test statistic by prescribing Conversely to Equation ( 9), the main statistical interest of the test in ( 19) is to avoid the estimation of the variance.Furthermore, as one can remark for instance in the numerical study in Figure 6, global performances of the test in (19) are similar to those obtained in the same distributional framework in Figure 4.This means that we can obtain similar p−values with less effort in terms of moment estimations.Furthermore, notice that, since γ = − ln(pu m,γ ) ln(m) and γ ∈ (0, 2), the range of admissible values for u m,γ from Proposition 6.2 is growing with m.

Proofs
Proof of Proposition 3.1.One can start by observing that both f (t) 1 (l, k) and f (t)  2 (k, l) follow a Bernoulli distribution with parameter 2p t (1 − p t ).Thus, using Equation ( 1), one can write Thus we obtain the result for E Pm dividing by m 2 .
Proof of Proposition 3.2.Without loss of generality, one can assume that t ≤ s.
Now, let us compute the inter-level covariance between the horizontal contributions to the perimeter and the vertical ones: Cov (P (1)  m (s), 2 (j, i) .
To this aim, notice that the covariance between cells is non equal to zero only if the cells are neighbors, that implies that Thus, Cov(f Due to the i.i.d setting (see Figure 7), one can remark that the four types of covariance in the above sum are equal and are equal to Cov(f Thus, Putting all elements together, we get the covariance function of the unscaled perimeter: Hence, the result.
Proof of Theorem 3.3.In order to prove this result, we will use the well known Cramèr-Wold method.For the sake of completeness we recall it below.
Theorem 7.1 (Cramèr-Wold method, see, e.g., [5] for each (a 1 , • • • , a r ) ∈ R r , i.e., if every fixed linear combination of the coordinates of X m converges in distribution to the correspondent linear combination of coordinates of X.
Proof of Proposition 3.5.Let θ be the median value of the underlying distribution.We first start with solving the quadratic Equation ( 6) for m = 2 under the condition that t > θ (i.e p(t) < 1  2 ).We get that p(t) = with σ 2 P (t, 2) = Var P2 (t) = p t (1 − p t ) (1 − 3p t (1 − p t )).We apply the Delta method to Equation (21) using the function g as defined in Proposition 3.4.Note that g is differentiable on (0, 1  2 ) and that for t Hence, the result for p P m .For p A m , the result is a classical consequence of the Central Limit Theorem.

Let define
Then, we get hence, we get the result.
Proof of Corollary 4.3.Considering the continuous mapping and the fact that p m (θ +t) is a robust estimator of p(θ + t), we get that ψ( p m (θ + t)) a.s. where . Thus, applying the Slutsky Theorem to (7), we get Hence the result.
Proof of Proposition 4.5.For the sake of simplicity, let us note θ − t = u and 1 {Xi,j <x} .
Furthermore, from the bivariate Central Limit Theorem, Applying the Delta method using the function g : Hence the result.
Proof of Lemma 6.1.We recall from Equation (3) that we have, µ P (p t , m) = 4p t (1 − p t )m(m − 1), and from Equation ( 4) Then, the ratio is equal to Hence, uniformly in m, σ Thus, as m → ∞, σ where the second part of the inequality is due to (23) and (24).
Proof of Proposition 6.2.Let us first notice that E (P(t m,γ )) = E(P(s m,γ )), since p tm,γ = 1 − 1 m γ = 1 − p sm,γ .In order to prove the result, we will establish that ∀ a 1 , a 2 ∈ R To prove this statement, we will follow the same idea as in the proof of Theorem 3.3, introducing the following notations, similar to (20), 2 ), we will check that the family { W m z , z ∈ V m } satisfies the three conditions of Proposition 7.2.
To show that { W m z , z ∈ V m } satisfies (i), we proceed as in the proof of Theorem 3.3.For > 0 and choosing p >   Let us consider the following optimization problem : By doing an analytical study, we get the maximum value of the variance is reached on α 1 and α 3 and the minimum value for α 2 (see Figure 10 for an illustration in the Uniform white noise model).

Fig 2 .
Fig 2. Left and center panels: Image of size (20 × 20) realization of a Uniform white noise model.Right panel: Obtained binary image for t = 0.5.

Fig 3 .
Fig 3. Computation of the perimeter of a binary image with m = 3.Here P 3 = 5.

Fig 10 .
Fig 10.Curve of the variance function in Equation (4) for a Uniform white noise model for m = 1024, the extrema are reached at pt ∈ {0.172, 0.5, 0.82}.
we propose a method to test the symmetry of the marginal distribution marginal of the field at equidistant levels from the median.Let us consider the null hypothesis for t ∈ I θ H 0 (t) : p θ−t = 1 − p θ+t , where θ is the median value of the distribution.Let us first present a normality asymptotic result in the general case and a second result under the null hypothesis.Proposition 4.1 (Ratio between perimeters at two thresholds).Let t 1 ∈ R and t 2 ∈ I with t 1 < t 2 , and Pm (•) be the scaled perimeter in Definition 2.1 computed for each threshold.Then, it holds that, m Pm (t 1 ) Pm (t 2 ) − µ Pm (t 1 ) µ Pm (t 2 ) )4.3.Comparison with Kolmogorov-Smirnov based estimatorRemark that under H 0 (t), the cumulative distribution function F satisfies F (θ − t) = 1 − F (θ + t).Then, considering the empirical cumulative distribution function t → F Proposition 4.5.Let t ∈ (0, ∞).Under H 0 (t), it holds that m (t) := 1 m 2 i,j 1 Xi,j ≤t , which is a robust estimator of F , one can build an alternative symmetry test.Let us note that F m (t − ) = 1 − p A m (t) with p A m (t) as in Definition 3.4.