Robust detection of abnormality in highly corrupted medical images

: Medical imaging helps to detect and monitor internal irregu- larities in the human body. We leverage a block median ﬁltering technique to model pixel-to-pixel diﬀerences between two images to develop auto- mated detection of abnormalities in noisy medical images. We propose two robust detection methods, with the test statistic being the conventional maxima and the scale-invariant ratio of the medians from partitioned image grids. Theoretically, we investigate the asymptotic behaviors of two proposed tests. Numerically, we carry out simulation studies to investigate the type I error rate and the power of two tests. In addition, a real appli- cation in medical images with gastrointestinal bleeding demonstrates the outperformance and eﬃciency of the ratio test method. Besides, the devel- oped tests can also be applied to problems in other scientiﬁc ﬁelds, e.g., air pollution detection using collected remote sensing hyperspectral images.


Introduction
Medical imaging, collected using imaging technology such as radiology (e.g. ultrasound, CT, X-ray and MRI), nuclear medicine (e.g. PET and SPECT) and optical imaging (e.g. OCT), etc., helps to monitor disorders in the human body. For example, nuclear medicine can diagnose the severity of various diseases such as heart disease, gastrointestinal, endocrine, or neurological disorders, and cancer. Gastrointestinal bleeding is a major cause of death in the United States, with mortality rates ranging from 10% to 30% [18,14]. Fig. 1 plots a sample of medical images of gastrointestinal bleeding originating from a branch of the superior mesenteric artery. Note that these figures are from Figure 4 of [14]. It can be seen in Fig. 1 that (a) these images, taken in chronological order, are noise-contaminated, (b) the bleeding starts in the 6th image and appears more severe after that (7th-11th images). Fig. 2 shows their histograms of the pixel-to-pixel differences of two consecutive images exhibited in Fig. 1. The histograms in subplots from Hist-1 to Hist-4 appear symmetric about zero, while the histograms in other subplots exhibit asymmetry. The asymmetric histograms correspond to instances in which the two consecutive images are different.
We can vastly improve the patient's prognosis if we can detect the bleeding by the 6th image in Fig. 1. The problem is how to detect abnormality in  Figure 4 in [14] with removal of the 1st, 3rd, 8th, and 11th images that have additional marks not belonging to the original images. They were published in J. Nucl. Med. (2016), 57, 252-259, c by the Society of Nuclear Medicine and Molecular Imaging, Inc. noisy images timely and accurately. To tackle the problem as above, we leverage advanced statistical techniques to develop a fast and efficient robust detection method to test whether there is a statistical difference between two consecutive images. We propose to consider the pixel-to-pixel differences between the two images. Under such a framework, abnormality detection is equivalent to testing the symmetry/asymmetry about zero of the distribution for pixel-to-pixel differences. One should note that the images that we study herein do not have systematic change along the time. For example, the image that will change periodically based on the heart rate in ultrasound imaging and brain fMRI is not considered in this study.
As is shown in Fig. 2, the images are fairly noisy, and thus, Normal, t, or Laplace distribution may not be appropriate for modeling the pixel values. A possible way to the problem is to apply the censored weighted likelihood to down-weight the effect of observations with large variances [8,2]. However, it might be difficult to obtain the true likelihood of the observations since a common error distribution, say, Normal, t or Laplace, can not be assumed. One can alternatively consider nonparametric approaches. Two classical nonparametric methods, i.e., the sign test [13] and the Wald-Wolfowitz runs test (also known simply as the runs test) [21], are designed to test for symmetry of a distribution about zero. However, the type I error rates for both tests will not be small if the dataset contains a high level of noise. There exists a vast body of other nonparametric methods to test the hypothesis of symmetry of distribution, such as the bootstrap tests [10,20,26,31,25], the distribution-free tests [11,22,28], the kernel-based nonparametric testing methods [15], and also some other methods [9,2,4,5]. For nonparametric tests, some recent literature also establishes the minimax property of the power [19,30].
Besides the above nonparametric test methods, some researchers have applied a block median filtering technique to avoid specifying the noise distribution. This technique may be suitable for data analysis with positive/negative light/heavytailed distributed noise, likely due to sharp and sudden jumps in the image signals. For example, [6] first divided the observations into small bins and took the median values of data in each bin to form a new dataset, and then fit a Gaussian distribution to the new dataset. In light of [6], we propose a new test for testing whether the distribution of pixel values is symmetric about zero using maxima statistic. Beyond that, we propose another test using a ratio statistic, which is shown to have satisfactory performance in both simulation and a real example.
The major contributions of this study are as follows. Firstly, we leverage the block median filtering technique to model the pixel-to-pixel differences between two consecutive images. To the best of our knowledge, this technique is initially introduced into the problem of image abnormality detection. Secondly, we investigate the consistency of the proposed tests under mild conditions. Besides, The finite sample properties of the tests are enhanced via simulations, including the type I error rate and the power of the test. Finally, although this study is motivated by and developed for Gastrointestinal bleeding detection, it can also be potentially used to problems in quality control, satellite fire and smoke detection, and others.
The remaining paper is organized as follows. In Section 2, we introduce the median filtering method to address this abnormality detection problem and the model setup. In Section 3, we propose two nonparametric tests and the detection procedure. In Section 4, we approximate the p-value of the proposed tests and investigate their powers under mild conditions. In Section 5, we investigate the proposed tests by simulations and real data analysis. Section 6 concludes this paper.

Problem description
An image can be represented as a three-way tensor (R, G, B) ∈ R d1×d2×3 , where R, G, B ∈ R d1×d2 are two-dimensional pixel matrices. For simplity, we consider grayscale images, the pixel values of which are expressed as a matrix X ∈ R d1×d2 . The elements of X can be accessed using two indices as in x i1,i2 , varying i j from one to d j , j = 1, 2. Suppose that there are two images X (1) and X (2) , and let x (1) i1,i2 and x (2) i1,i2 denote the respective pixel values in the grid location After the proper alignment, we consider that the differences y i1,i2 are independently distributed with the same distribution. We write y i1,i2 as where μ i1,i2 are the means, and i1,i2 are independently identically distributed (i.i.d.) random errors with median zero and standard deviation σ that may potentially be large due to limitations in current imaging technology; see Assumption A1 in Section 4. Under this assumption, the distribution of y i1,i2 is symmetric about zero if μ i1,i2 is 0, otherwise skewed to the right or left if μ i1,i2 is either positive or negative, respectively. Let G q , q ∈ {1, . . . , r} be disjoint sets representing signals of abnormalities, and δ q , q ∈ {1, . . . , r} be their respectively unknown nonzero constants. Then μ i1,i2 = 0 if (i 1 , i 2 ) ∈ G q for q ∈ {1, . . . , r}, implying the existence of abnormalities at region {(i 1 , i 2 ) ∈ G q , q ∈ {1, . . . , r}} in the second image X (2) . This model describes the phenomenon that some abnormal parts are hidden in the pixel-to-pixel image differences. The problem is to identify the existence of these abnormalities.

Remark 1.
The medical images taken within a short period may be considered aligned. Hence, the differences in pixel values of the two images could be assumed to be i.i.d. under the null hypothesis. For example, the images of the artificial flower in [29] are aligned well since the wind has a rare impact on the camera and the artificial flower. In some cases, the images may be spatially localized. For example, if the artificial flower is replaced by a real one, its location in all taken images can no longer be considered fixed. Such a problem may also arise in the medical images. Ignoring these types of problems may presumably lead to a loss of power of the test. To ensure that the two images are properly aligned, we may rotate or relocate the two images by minimizing the sum of the two images' absolute differences.

Remark 2.
Note that one may consider grayscale images (single channel: twoway tensor) or color images (multichannel: three-way tensor). One can also consider a p-way tensor, X ∈ R d1×···×dp , where d j is the dimension of mode-j of X , j = 1, . . . , p. It is thus an extension of the one-dimensional case presented in [6] to a p-dimensional one. However, for simple presentation, we only consider the grayscale images in this paper.

Block median filtering
The major problem is that the distribution of noise is hard to specify. To tackle such a problem, we plan to use the block median filtering technique. Let n be the number of pixel values of the image, that is, n = d 1 × d 2 . Suppose that the image pixels can be equally divided by κ n disjoint partitions, which we label by A ∈ R m1×m2 , = 1, 2, . . . , κ n . Denote the number of pixels in A by m n , i.e., |A | = m n = m 1 m 2 for = 1, 2, . . . , κ n . Then, compute a sequence of minimizers x , = 1, 2, . . . , κ n , satisfying that, where ρ is a given positive and symmetric loss function. Hereafter we suppress the subscript n in both κ n and m n for simplicity of notation. Note that if ρ in (2) is the absolute value function, x is a median of the partition A , i.e., The block median transformations have been applied in the analysis of microarray data for data normalization in [27,6]. In this study, we employ the median filtering as it is robust against a large collection of error distributions, including heavy-tailed, which is not unusual in literature.

Model setup
In light of [6], the sequence of block medians, {x 1 , x 2 , . . . , x κ }, is modelled as . . , r}, being disjoint sets of image pixels with convention that θ = 0 if r = 0. One should note that δ * q is another unknown constant that may differ from δ q for q ∈ {1, . . . , r}.
Our aim is to test i.e., to test if there exists one or more disjoint sets of image pixels G q , q = 1, 2, . . . , r, (r ≥ 1), such that θ = δ q = 0 for A ⊂ G q , against θ = 0 for all ∈ {1, . . . , κ}. We remark that our proposed two tests will be more powerful for large r.

Nonparametric hypothesis tests
As displayed in the first four histograms (where there are no abnormalities) in Fig. 2, the null distribution of {y i1,i2 } appears to be symmetric about 0. Further examinations reveal that when there exists an abnormality, the corresponding histogram is skewed. These observations lead us to test whether or not the null distribution of {y i1,i2 } is symmetric about zero to detect images with abnormalities. Since such images are usually fairly noisy, we consider the medians {x } and thus propose tests based on the maxima and ratio statistics.

Maxima test
As explained above, in light of [6], we propose the following maxima as the test statistic for testing H 0 against H 1 , where x is defined in (3) andσ is an estimate of the median absolute deviation (MAD), i.e.,σ We reject H 0 when M κ is larger than the corresponding critical value under a given significant level α. For convenience, we hereafter refer to this test as the "maxima test". It is noted that there exists a nuisance parameter,σ, in the maxima statistic. One can use the sample standard deviation in place of MAD forσ in (6). An inappropriate choice of the nuisance parameter may result in poor performance, say, a higher type I error rate or lower power of the test. This motivates us to consider another ratio statistic that does not involve the estimation of a nuisance parameter.

Ratio test
The estimate of the nuisance parameter in the maxima test may negatively affect the performance of the test, which may explain less optimal results in empirical examples. With the purpose of the practical applicability of the test, we may restrict our attention to a closed interval of block medians denoted as [a, b], in view that the null distribution is assumed to be symmetric about zero. Assume that the observed pixel values of images are between 0 and 1. Under H 0 , the interval may satisfy that |a| = |b|, with a < 0, b > 0. In contrast, under H 1 , the interval may satisfy that |a| = |b| with a < 0, b > 0. Next, we will show that how to construct the ratio-type statistic.
It is noted that the only information we have about the common error distribution is whether or not its probability density function is symmetric about zero in our problem. We assume that the common distribution of i1,i2 , F , is continuous and has median 0, i.e., F (0) = 1/2. Denote its probability density function by f . Let m 0 = (m − 1)/2. It is easy to show that the common density function of the block medians . Fig. 2 visually shows that the histograms from Hist-1 and Hist-4 appear to be symmetric about zero, while others exhibit some asymmetry. Thus, we consider the following null and alternative hypotheses for testing symmetry of the distribution function F : The proposed test statistic described in (7) makes use of a first-order approximation of F (x) or f (x).

Lemma 1. Given the first-order approximation to the density function f (x)
over the interval [a, b], the following statistic is proportional to the likelihood ratio statistic for testingH 0 againstH 1 , where The smaller the test statistic (7), the more evidence we have that supports the rejection of H 0 .

The proof of Lemma 1 is given in Appendix
It can be seen that the smaller the likelihood ratio, the smaller the value (1) } which, from above, has been shown to be equal to ( It is obvious then that the smaller the test statistic (7), the larger the following statistic: Therefore, we reject the null hypothesisH 0 for large values of T κ . For convenience, we call the test based on T κ the "ratio test". The ratio statistic takes advantage of the median of block medians, which can be decomposed into the sum of a Gaussian random variable and a random error with median zero under some mild assumptions made in [6]. It is noted that the numerator of T κ in (8) captures the range of the data, while the denominator of T κ is neutral to the impact of the largest (unsigned) data point. In addition, T κ is scale-invariant. Therefore, T κ is the desired test statistic for testing H 0 against H 1 .

Remark 4.
One can easily show that T k ≥ 2 as follows, T κ is expected to be close to 2 under H 0 , but to be far away from 2 under H 1 . For demonstration purpose, one can see the following example. Suppose that we have block medians min{1,−(−4)} = 5 (again, larger than 2). These results are in line with our expectations.

Detection procedure
The above two proposed tests can detect abnormalities of medical images under the following detection procedures. Suppose that we have a sequence of medical images, say, X (1) , . . . , X (K) (K ≥ 2), and the abnormal images with significant differences are X (s) , s ∈ D (0) , where D (0) is a subset of {1, . . . , K}. LetD be an estimate of D (0) . We present the following detection procedure to obtainD. Set α = 0.05.
Step 2. Compare the images X (s) and X (k) . If the approximate p-value of the maxima test or the ratio test is less than α, then keep the kth image, i.e., s = k; otherwise, drop the kth image and s = s. AndD =D ∪ {s}.
Step 3. If k = K, stop. Otherwise, let k = k + 1 and return to Step 2.
Users can choose their preferred significant level α. The approximate p-values can be computed via Eq. (11) or Eq. (14) in Section 4. The proposed detection procedure is not intended to replace medical experts. Instead, the detection procedure aims to serve as an initial screening technique and alert medical experts timely when an abnormality occurs. Generally, the detection procedure can be applied in the following scenarios.

Remark 5.
The detection procedure can determine whether the most recent image is abnormal or not. For example, suppose that there is a patient who is seeing a doctor. A medical image of the patient shows that nothing is wrong. However, the patient continues to complain of discomfort. Thus, the doctor is likely to monitor the patient's condition and thus request more (say K − 1 with K ≥ 2) images to be taken at scheduled times. Images are analyzed as they come in until perhaps, a significant difference is identified. Under such a scenario, the first image should be removed fromD, that is,D =D/{1}. If the purpose of the detection is to alert medical experts timely, we can stop the procedure when |D| = 1.

Remark 6.
The detection method enables us to keep significantly different images for further examination. For example, suppose that a person has a history of a specific disease in his/her family, currently showing no signs of the inherited disease; however, he/she may develop the disease during some period of his/her life at high risk. Suppose that radiological imaging can be applied to diagnose the disease. A doctor would ask him/her to take several radiological images, say K, in a certain period. In this scenario, the detection procedure can be used to perform an initial screening of the images and delete these images that are not statistically different from ones taken just before them.
Intuitively, an ideal estimate should contain important information, and exclude these informationless ones, i.e., D (0) ⊂D and the size ofD (denoted as |D|) is smallest. Thus, in order to measure the estimation efficiency, we define the criteria ω, i.e., where S 1 / S 2 denotes the intersection of S 1 and the complement of S 2 , and 0 ≤ a ≤ 1. The smaller the value of ω, the better the estimation. |D (0) /D| > 0 means that some important images are deleted, while |D/ D (0) | > 0 implies that some informationless images are kept. Since |D (0) /D| > 0 is more serious than |D/ D (0) | > 0, we thus assign a large weight on |D (0) /D|, say, a = 0.95.
As described in [6], Assumption A1 is satisfied by the Cauchy distribution, the Laplace distribution, the t-distribution, the Normal distribution, and other distributions. Under this assumption, we can approximate the distribution of the median of errors as Normal. Assumption A2 describes how m grows relative to κ. Simulation studies suggest that m = 100 would allow for a satisfactory Normal approximation.

Asymptotic properties of the maxima test
We have the following result on the approximation of the p-value of the maxima test.
where M κ is given in (6), The following Proposition is needed for proving Theorem 4.1. (see Lemma 1 in [6]). The proof of Theorem 4.1. is given in the Appendix A.2. Assume that α is the significant level of the hypothesis test. Theorem 4.1. implies that the type I error rate goes to α under mild conditions.
The Assumption A3 implies that there exists at least one set G q such that it includes A s . Theorem 4.2.. Suppose that H 1 given in (5) holds true. Under the Assumptions A1-A3, we have M κ / log κ → ∞ in probability as n → ∞, where M κ is given in (6). Furthermore, for any v ∈ (−∞, ∞), we have that The proof of Theorem 4.2. is provided in the Appendix A.3. This theorem guarantees that the power of the maxima test converges to one as the number of pixels goes to infinity.

Asymptotic properties of the ratio test
We have the following result on the approximation of the p-value of the ratio test.
where T κ is given in (8), Φ(x) and φ(x) denote respectively the standard normal distribution and its density function.
We ask the readers to turn to Appendix A.4 for details on the proof. Similar to Theorem 4.1., Theorem 4.3. also controls the type I error rate of the ratio test. Next, we consider the power of the ratio test. Since the main purpose of this paper is to detect abnormalities in medical images, all δ q , q ∈ {1, . . . , r} should have the same sign if there are abnormalities in such images. Thus, we make the following assumption.
Assumption A4. δ 1 , . . . , δ r with r > 1 are either all greater than zero or all less than zero.

Theorem 4.4.. Suppose that H 1 given in (5) holds true. Under the Assumptions A1-A4, we have T κ → ∞ in probability as n → ∞, where T κ is given in
One can refer to Appendix A.5 for the proof of Theorem 4.4.. Theorem 4.4. implies that the power of the ratio test converges to one under mild conditions.

Simulations
In the following, we present simulation studies to examine the performance of the proposed tests.  6N (0, 0.25) + 0.2N (0.01, 0.09), a contaminated-normal distribution that is similar to αN (0, σ 2 1 ) + (1 − α)N (0, σ 2 2 ). The simulated type I errors for the proposed two tests are reported based on 10,000 repetitions with the p-value approximated via Theorem 4.1. or Theorem 4.3. (depending on the test). We set α = 0.05. It can be seen from Table 1 that both tests perform well with the normal distributed random errors (Case 1), reflected by the type I errors abound 0.05 for the whole candidate m. When the random errors are distributed with the Laplace (Case 3) or contaminatednormal (Cases 4-5) distribution, the type I errors tends to be stable around 0.05 as m ≥ 10 2 . A special case one can see is Case 2, where the type I errors are larger than 0.05, and the values obtained via the ratio test are smaller than those via the maxima test, but their values decrease as m increases. Overall, both the proposed methods perform well in most cases in terms of type I errors as the partition size m is larger than a threshold, say, 10 2 . In addition to the type I error rates, we investigate powers of both maxima test and ratio test when H 1 holds. 10,000 simulations are carried out with δ 1 = −0.15 or 0.15 in the first grid. Table 2 lists the powers of both tests. The powers of both tests are increasing with the partition size m either for negative or positive δ 1 , across all Cases (1)(2)(3)(4)(5). Specifically, when m = 25 2 , the minimum powers of the maxima test and ratio test increase to 0.999 and 0.994, respectively. This result agrees with the power consistency in Theorems.    In this subsection, we compare the power of both proposed tests when there is δ 1 shift in the first grid and a δ 2 shift in the second grid. Let { i1,i2 } be i.i.d. from the above five distributions given in Example 1. The critical values are computed by using Theorem 4.1. or Theorem 4.3. with α = 0.05. Table 3 Power comparison for δ 1 and δ 2

Case
Test  In this example, δ 1 and δ 2 are of the same sign as required by assumption A4. We carry out 10,000 simulations. The power comparisons are given in Table  3. Clearly, as m increases, the powers of both tests are increasing, which agrees with the results in Example 1. When m is large enough, say, m ≥ 20 2 , both tests show strong powers (≥ 0.997) for all the cases.

Example 3: K = 11 and r = 1
We examine the performance of both maxima test and ratio test for abnormal images detection. Let the pixel-to-pixel differences between the th and − 1th images be generated independently from δ 1 + 0.4t(γ) distribution with γ = + 1 for = 2, . . . , 11, where δ 1 = 0.25 for the first two grids when = 5, 10, and 0 otherwise.
We carried out 10,000 simulations. Fig. 3 displays the boxplots of ω (see Eq. (10)) obtained via the maxima test (a) and the ratio test (b) under different partition sizes. It can be seen that the performance of both tests becomes better as the partition size increases, reflected by the decreasing trends of ω. As m increases from 10 2 to 20 2 , the medians of ωs drop significantly for both test methods. This result demonstrates the efficiency of the proposed tests in terms of abnormal image detection.

Real data example
Now we return to analyze the medical images with gastrointestinal bleeding described in Section 1. As is displayed in Fig. 1, from the 6th image onwards, bleeding intensifies gradually. And it is difficult to distinguish the images, X 1 , . . . , X 5  (before bleeding) and X 8 , . . . , X 11 (after bleeding) by eyeballing. This study aims to detect the initial point when the bleeding begins and the timepoints when the bleeding gets worse than that at the initial time. We consider the pixel-to-pixel differences between images by applying the detection procedure provided in Section 3.3 to solve this problem. The size of each partition is given equally, such as m = 2 2 , 5 2 , 10 2 . We report the p-values for the -th image ( = 2, . . . , 11) and the finalD obtained via both tests in Table 4. Unfortunately, the maxima test fails to detect the abnormalities in these medical images. It can be seen from Table 4 thatD via the maxima test contains all the images for different m; that is, there exists a significant difference between any two images. These results are contrary to the fact that the initial bleeding starts from the 6th image. The failure of the maxima test in real data may result from the poor estimate ofσ in Eq. (6). When it comes to the ratio test, however, it outperforms the maxima test. An interesting finding is that the size ofD is increasing as the partition size m increases from 2 2 to 10 2 , resulting in the increased power of the test, which is consistent with the Theorem 4.4.. Specifically, the intersection ofD is {6, 7} for different m. The ratio test accurately detects the initial time point when it starts bleeding since {6} ⊂ D for different m. Intuitively, there exists a slight difference between the 6th and 7th images. The results of {7} ⊂D for different m also demonstrate the outperformance of the ratio test in detecting a slight difference between two abnormal images.

Conclusion and discussion
This paper proposes two robust detection methods, the maxima test and the ratio test, by leveraging the block median technique. Theoretically, we show that the p-values for both tests can be approximated with satisfactory accuracy, and the consistency of both tests under mild assumptions. We develop the detection procedure and discuss its applications in detecting abnormalities in noisy medical images under two scenarios. Numerically, simulation studies and real data analysis are performed to examine the performance of the proposed tests. The maxima test performs well in simulations but falls short in the real data application. This phenomenon may be due to the presence of the nuisance parameter, i.e., the standard deviation of the median. Although the MAD estimator in (6) to improve the performance of the maxima test, the resultant test still does not perform satisfactorily in the real data application. However, the ratio test exhibits satisfactory performance and is generally powerful in both simulations and real data applications.
Our detection methods can figure out whether the most recent image is abnormal and enables us to keep significantly different images for further examination while removing large but informationless images regularly to save image storage space. In addition to detecting abnormal medical images, the methods may be used in air pollution detection. For example, some wildfires are human-caused due to open fire in the permitted parks. We could collect remote sensing images, especially hyperspectral images, acquired from Earth-orbiting satellites, which hold great promise in monitoring and categorizing wildfires. Hyperspectral images are characterized by having multiple layers or image planes at a specific wavelength or range of wavelengths. We may focus on some layers with fixed ranges of wavelengths that are sensitive to wildfires. The detection procedure can be applied for monitoring some special areas which may have a high risk of fires.
Although the proposed detection methods have wide applications, they have some limitations. The proposed pixel-based image comparison methods might not be reasonable when there is a systematic change along the time. For example, the image will change periodically based on the heart rate in ultrasound imaging and brain fMRI. Honestly, this limitation would hinder the application of the proposed methods in some medical imaging data sets.
The likelihood functions underH 0 andH 1 respectively given by Hence, a likelihood ratio statistic is

A.3. Proof of Theorem 4.2.
Our aim is to show that the lower bound of M κ diverges to infinity in probability.