An up-to-date review of scan statistics

Scan statistics have been a very important and active area of statistical research in the past three decades. Detecting areas with a significant concentration of points is an important task in understanding the underlying phenomena in many fields such as: epidemiology, politics, crime analysis, zoology, etc. This study reviews how scan statistics have developed in the last three decades, the main concerns of researchers in scan statistics, and how researchers have approached these concerns.

Scan statistics are the most important tool for detecting clusters. A scan statistic has the objective of detecting and evaluating the statistical significance of clusters that cannot be explained by the assumption of randomness. This is done by moving a window over the study region and identifying a region, if there is one, with a higher concentration of points than should occur by chance. Detection of such regions is traditionally performed by maximizing a likelihood ratio as will be shown.
Researchers encounter at least nine types of datasets where they might wish to identify spatial clusters using scan statistics: 1) point data; 2) case-control data; 3) aggregated data; 4) spatio-temporal data; 5) spatial survival data; 6) event data; 7) multinomial data; 8) ordinal data; and 9) time series data.
When Choynowski (1959) studied brain tumors, he considered statistical assumptions for a map and explored whether there was any city on the map with a statistically significant cluster. Since the occurrence of brain tumors is rare, he used Poisson distributions to calculate the probability of the number of sick people in each city. However, at that time, he tested the cities separately, so multiple testing problems occurred. Moreover, in his method, the border of clusters must be the same as the border of the cities. To deal with these drawbacks, Openshaw et al. (1987) proposed a graphical method called Geographical Analysis Machine (GAM). In their method, they first considered a grid on the map and then put circles around the grid points such that their radii increased step by step. Minimum and maximum radii were pre-selected. By Monte Carlo methods, one could determine which circle contained significantly more sick people in comparison to those outside the circle. Any significant circle was plotted on the map. In the end, the algorithm provided a map with some circles on it. These areas were labelled as spatial clusters. The GAM method raises at least four concerns: 1) a large number of circles must be checked to find clusters (the size of the candidate class is too large); 2) it only detects circular clusters, but in practice, clusters can have irregular shapes; 3) Monte Carlo hypothesis testing is time-consuming; and 4) due to multiple testing problems, GAM indicates some areas as clusters even if the points are distributed completely at random. Hence, researchers should choose a very small α, for example 0.001.
The above-mentioned methods were presented in previous works to detect spatial clusters. In the next section, we present the well-known spatial scan statistics. Section 2.2 is devoted to regularly and irregularly shaped spatial clusters, while Section 2.3 contains an overview of Bayesian scan statistics. In Section 3, the method of spatial cluster detection without Monte Carlo hypothesis testing is explored. Section 4 covers spatial clustering for event data. In Section 5, spatial scan statistics for general graphs are presented. Section 6 describes spatial scan statistics for continuous data. In Section 7, scan statistics for zero-inflated count data are discussed. Section 8 discusses nonparametric 114 A. Abolhassani and M. O. Prates scan statistic that allows the detection of non-circular clusters. Although more flexible, this scan statistic cannot detect arbitrary irregular clusters either.
It is important to mention that scan statistics can be extended to deal with different kinds of data, for example event data (Rosychuk, Huston and Prasad, 2006), ordinal data (Jung, Kulldorff and Klassen, 2007), continuous data (Kulldorff, Huang and Konty, 2009;Zhang, Zhang and Lin, 2012), multivariate data (Cucala et al., 2017), and others. Zhang and Lin (2014) noticed that the likelihood ratio statistic is a special case in the family of power divergence (PD) goodness-of-fit statistics, so the classical spatial scan test can be extended to the family of PD spatial scan tests. Their method is convenient because it can be combined with generalized linear models (GLMs). Besides this approach, the Wald-based spatial statistic (Liu, Liu and Zhang, 2018) is another alternative that can be combined with GLMs to detect spatial clusters.
The main advantage of scan statistics is specificity. In this context, specificity means identifying a single cluster responsible for the rejection of the null hypothesis (Gangnon and Clayton, 2004). However, scan statistics can be biased towards finding clusters in areas with greater spatial resolution. In other words, they tend to cherry-pick clusters in areas with a large number of geographically small cells (Gangnon and Clayton, 2004).
The weighted average likelihood ratio (WALR) test (Gangnon and Clayton, 2001) is an alternative method to detect spatial clusters instead of the traditional maximum likelihood approach. The test has a natural interpretation as the marginal likelihood ratio of a one cluster model to the no clustering model and with choice of the correct weights, it identifies clusters with higher power and less bias. The test is defined as unbiased, if, under the null hypothesis, each cell in the study region has an equal chance of belonging to the detected cluster. Furthermore, Gangnon and Clayton (2004) proposed two other scan statistics: a scan statistic based on a penalized likelihood ratio and a localized version of the WALR test. In these methods, the authors took advantage of the specificity of the scan statistic and the unbiasedness of the WALR test.
Based on the studies of Gangnon and Clayton (2001) and Gangnon and Clayton (2004), the spatial scan statistics method has high power in areas with fine geographic resolution and low power in areas with coarse geographic resolution. According to Gangnon (2010a), in real applications, there are many overlapping clusters in urban areas, while rural areas have few. The spatial scan statistic does not account for these local variations in the multiplicity problem. Hence, Gangnon (2010a) proposed two new spatially varying multiplicity adjustments for spatial cluster detection, one based on a nested Bonferroni adjustment and one based on local averaging. In fact, they proposed the local average likelihood ratio scan (LALRS) statistic and an unweighted version of the WALRS statistic, which are applicable in any setting.
Up to now, there are very few methodologies that quantify the uncertainty of a detected cluster. Along these lines,  develop a new method for the quantification and visualization of uncertainty associated with a detected cluster.

Minimum Spanning Trees in Spatial Cluster Detection
To find an irregularly shaped cluster, Assunção et al. (2006) proposed the dynamic minimum spanning tree (dMST) method. This method can quickly find irregularly shaped clusters. A spanning tree is a sub-graph of a connected graph. It is a tree that contains all vertices of the graph. The minimum spanning tree (MST) for a weighted graph is a spanning tree that has minimum weight. The dMST method not only detects irregularly shaped clusters but also decreases the cardinality of the candidate class i.e., Z, from many circles to just I candidates where I is the number of cells in the study region.
In this method, centroids v i and v j of cells i and j are connected to each other using an edge if these two cells are neighbors. Hence, corresponding to a map, we have an undirected graph. Under the Poisson assumption, the Kullback-Leibler divergence is calculated and used as weight w(i, j) that is allocated to edge (v i , v j ). The weights reflect the dissimilarity in the density of the number of "points" between two cells. Heavy weight means large dissimilarity between densities of the cells. Using the algorithm of Prim (1957), the MST can be found. The elimination of an edge from the MST separates it into two subtrees. Assunção et al. (2006) considered the smaller sub-tree as a candidate and calculated the likelihood ratio for this candidate using the LRT of Kulldorff and Nagarwalla (1995). Then, they returned the eliminated edge to its place and removed another edge. Again, the MST is separated into two sub-trees. The smaller sub-tree is considered the second candidate and the LRT is calculated for this sub-tree. This procedure goes on for all sub-graphs to find the MLC. Then, using the Monte Carlo procedure, one decides about the significance of the MLC as a spatial cluster.
Although the dMST method improves the scan statistic in two aspects (the flexible shape and reduced number of total candidates), it still has some deficiencies: 1) it detects just one irregularly shaped cluster; 2) it commonly detects a cluster bigger than the true cluster (overestimation or octopus effect), and 3) the Monte Carlo procedure is time-consuming.
As mentioned, one of the disadvantages of the dMST is the overestimation or octopus effect of the detected cluster. To control the overestimation, Costa, Assunção and Kulldorff (2012) proposed three spatial scan statistics to find irregularly shaped clusters. Their methods impose ad-hoc constraints on the cluster shape using three criteria: edMST (early stopping dMST), double connection, and mlink (maximum linkage).
The main difference between the edMST and the unrestricted dMST methods is that the edMST approach stops when the neighbors of a candidate do not increase the likelihood function. In the double connected spatial scan statistic, to be considered as a valid area for inclusion in the cluster, the candidate neighbor must be connected to at least two units of the current candidate and is aggregated to it if it increases the LRT. There is an exception in the first step when the current candidate has only one unit. If no neighbor satisfies the con-nection condition, then the algorithm stops and starts again from another unit. The results obtained by this method are more compact clusters than with the edMST method. Finally, the mlink spatial scan statistic is an alternative that searches only among neighbors with maximum connections to the current candidate and evaluates whether or not it must be included. For all these methods, Monte Carlo simulation is applied to determine the significance.
Recently, Zhou, Shu and Su (2015) presented another alternative named Adaptive minimum spanning tree (AMST), which is detailed in the next subsection.

Adaptive Minimum Spanning Trees in Detection of Irregularly Shaped Spatial Clusters
The adaptive minimum spanning tree (AMST) method for cluster detection was introduced by Goura, Rao and Reddy (2011). In this method, a validity index is introduced. It helps researchers to find the best partition of a graph. This index is defined as: The numerator measures the distance inside partitions of the graph and the denominator measures the distance between partitions. According to Jain and Dubes (1988), this criterion is a measure to reflect the graph separation. Formally, Note that λ C i is the rate of the points or the cases in the sub-partition C i (i.e., the ratio between the total number of cases in sub-partition C i and its population) and λ C ij is the point rate of cell j in the sub-partition C i (i.e., the ratio among the number of cases in cell j and the population of sub-partition C i ). These parameters can be estimated using the maximum likelihood method where K is the number of sub-graphs (sub-partitions) in the best partition of the original graph. According to Zhou, Shu and Su (2015), the minimum value for the validity index corresponds to the best partition of the graph. Thus, Algorithm 1 describes how to use the AMST to find spatial clusters.
Algorithm 1 Adaptive Minimum Spanning Tree Algorithm 1: find a MST using the Prim algorithm, 2: order the edge weights of the MST, 3: the second lightest weight is considered as a threshold. Remove this edge and all edges with heavier weight (in this step only two centroids are connected, and they are those with the lightest edge of the graph), 4: compute the validity index for the obtained partition, 5: add the next lightest edge to the previous partition and go back to step 4 until getting the initial MST.
According to Zhou, Shu and Su (2015), a partition with minimum validity index corresponds to the best partition of the MST, because the smallest validity index corresponds to the partition in which sub-graphs are more compact and have larger separation (gap) between sub-graphs.
Algorithm 1 was designed by Zhou, Shu and Su (2015) to control the overestimation problem presented in Assunção et al. (2006). Also, using the linear time subset scan (LTSS) property (Neill, 2012), the cardinality of Z (Zhou, Shu and Su, 2015) is reduced drastically. Neill (2012) proved that the scan statistic and some of its extensions satisfy the LTSS condition. This property makes it possible to find an exact and efficient optimization over the subsets. LTSS can be used in problems related to scan statistics. Instead of considering all scan windows to find the MLC, one just needs to consider ordered windows. Suppose K is the number of sub-partitions in the best partition of the MST. Also, suppose that P F (i) is a priority function of sub-partition i = 1, . . . , K (for example, the incidence rate of the ith subpartition). It is necessary to calculate P F (·) for all sub-partitions. Let o (j) be the index of a sub-partition with j-th greatest priority function P F (·) such that LTSS proves that, instead of checking all 2 K candidates to find the MLC, one can just consider K subsets i.e., {o 1 , . . . , o K }.
Similar to the other spatial cluster detection methods, after finding the MLC between ordered windows, Monte Carlo helps researchers to do hypothesis testing in cluster detection. This method is faster than the MST method and tackles the overestimation problem. Therefore, Zhou, Shu and Su (2015) improved the scan method to find irregularly shaped clusters in two ways: 1) by using AMST they obtained a solution for the overestimation problem of Assunção et al. (2006); and 2) by the LTSS property, they drastically decreased the cardinality of candidate classes. Furthermore, Yin and Mu (2018), by combining the MST and restricted likelihood ratio method (Tango, 2008), presented a hybrid method to detect irregularly shaped clusters more quickly.
It is important to mention the adaptive procedure proposed by Zhang and Zhu (2012) for irregularly shaped clusters in space. Their method, named spatial multiresolution cluster detection (MCD), is more effective in the detection of irregularly shaped clusters without the requirement of heavy computation. Hence, it is suitable for cluster detection for large spatial datasets, for example, images and fMRI data. Another interesting work is the particle swarm optimization method to optimize the scanning window for detecting irregular spatial clusters (Izakian and Pedrycz, 2012). As a side note, for data that can be hierarchically represented in trees, Prates, Assunção and Costa (2012) proposed a flexible scan statistic test to detect clusters using minimum description length penalization, which helps to prevent the detection of oddly shaped clusters.

Bayesian Scan Statistics
Traditionally, the scan statistics method is based on hypothesis testing and does not produce useful estimates of disease rates or cluster risks. Gangnon and Clayton (2000) developed a Bayesian inferential procedure to fit specific models for spatial clustering using cell count data. Their strategy incorporates ideas from image analysis, Bayesian model averaging, and model selection. They proposed a model for clustering in which the study region is divided into several components: a large background area and a relatively small number of clusters. A common rate (or covariate-adjusted risk) within each component is assumed.
As an advantage, with their method, it is possible to obtain estimates for the disease rates.
Later, Gangnon and Clayton (2003) proposed a hierarchical model for spatially clustered disease rates. They developed a Bayesian approach capable of estimating the parameters of the hierarchical spatial clustering model. To be able to perform inference in the model, they implemented and used a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm (Green, 1995). Their method allows for multiple clusters and produces posterior estimates of cellspecific and cluster-specific relative risk as well as cell-specific probabilities of cluster membership. Also, posterior inference about the number of clusters in the data is possible. Yan and Clayton (2006) extended the spatial cluster method of Gangnon and Clayton (2003) to accommodate spatio-temporal cluster data. Again, the inference is performed using the RJMCMC algorithm. Like Gangnon and Clayton (2003), their method produces important information about the cluster detection, such as the number of clusters, the probability of each cell belonging to a cluster, and cell-specific relative risks. The authors argued that the method is parsimonious relative to fitting several spatial cluster models over time and is sensitive in detecting spatio-temporal clusters. Also, they mentioned that the method is more appropriate for datasets having a cluster structure in comparison with some Gaussian Markov random field-based models (e.g., Waller et al., 1997). Yan and Clayton (2006) suggested extending their method to find irregularly shaped clusters in big maps. Gangnon and Clayton (2007) revisited Gangnon and Clayton (2003), including both spatial clustering and non-spatial random effects. In the prior work, the number of clusters was treated as a parameter to be estimated, requiring the use of the RJMCMC algorithm for inference. As an alternative, they considered models with a fixed, but overly large, number of clusters. Using the fixed values for the number of clusters, they were able to estimate the disease risks. The Bayes factor (the ratio of the posterior odds to the prior odds) was used to identify the local clusters. Fixing the number of clusters and not estimating them provides computational advantages since Bayesian inference avoids the RJMCMC procedure. Moreover, by not using the RJMCMC, it is easier to monitor convergence of the Markov Chain. Neill, Moore and Cooper (2005) introduced a natural Bayesian extension of scan statistics. Their method is based on a Poisson distribution with a conjugate Gamma-Poisson model. Assume that we have I cells in the map. For cell i, i.e. s i , the number of cases is C i and the baseline (for example, population at risk) is b i . The goal is to find if there is any spatial region S (set of locations s i ) for which the counts of cases are significantly higher than expected, given the baselines.
In other words, assume C i ∼ Poisson(qb i ), where q is the (unknown) underlying disease rate and the aim is to compare the null hypotheses H 0 of a uniform disease rate q = q all to the set of alternative hypothesis H 1 (S), which is q = q in for all s i ∈ S, and q = q out for all s i ∈ G − S for some q in > q out . A hierarchical Bayesian model where the disease rates q in , q out , and q all are themselves drawn from Gamma distributions is assumed. Thus, under the null hypothesis H 0 , q = q all for all s i ∈ G, where q all ∼ Ga(α all , β all ). Under the alternative hypothesis H 1 (S), q = q in for all s i ∈ S and q = q out for all s i ∈ G − S, where q in is drawn from Ga(α in , β in ) and q out ∼ Ga(α out , β out ). It is also necessary to choose the priors for α and β, which is the most challenging task in any Bayesian analysis. To set appropriate priors, the authors assume that they have access to a large historical data set, and set α and β by matching the moments of each Gamma distribution to these historical values; for more detail see Neill, Moore and Cooper (2005). On the other hand, computation of the posterior probabilities P (H 1 (S)|D) of an outbreak in each region S, and the probability P (H 0 |D) that no outbreak has occurred, given dataset D, can be achieved by: and P (H 1 (S)|D) = P (D|H 1 (S))P (H 1 (S)) P (D) where P (D) = P (D|H 0 )P (H 0 ) + S P (D|H 1 (S))P (H 1 (S)). The prior choice of P (H 0 ) and P (H 1 (S)) and the calculation of P (D|H 1 (S)) and P (D|H 1 (S)) are discussed in detail by Neill, Moore and Cooper (2005). Therefore, one can return all regions with their posterior probability of belonging to the cluster and the overall probability of an outbreak.
This Poisson-Gamma Bayesian scan was extended to a multivariate version, coined the multivariate Bayesian scan statistic (MBSS) by Neill and Cooper (2010). As shown by the authors, the MBSS presents advantages over previous event detection approaches, including improved accuracy of detection, easy interpretation and visualization of results, and the ability to model and accurately differentiate between multiple event types. Next, Neill (2011) extended the MBSS to detect irregularly shaped clusters in multivariate data. Cançado, Fernandes and da Silva (2017) introduced a Bayesian zero-inflated beta-binomial scan statistic to handle zero-inflated data (for more details see Section 7.3). Overall, Bayesian scan statistics have advantages compared to frequentest approaches: 1) they usually have higher detection power; and 2) are faster (since there is no need to perform randomization testing). Turnbull et al. (1989) introduced the cluster evaluation permutation procedure (CEPP). To detect spatial clusters, they assumed that the map has I cities (cells)

120
A. Abolhassani and M. O. Prates with N = I i=1 n i and X = I i=1 X i representing the total population and the total cases, respectively. They built a two-dimensional window for each cell such that this window contains neighbors of that cell and the total population in that window is equal to a predefined value R. The procedure for constructing the window is as follows. For each cell i = 1, 2, . . . , I, if n i < R, then the closest cell (based on Euclidean distance) is added to cell i. Suppose the closest cell to cell i is cell j. If n i + n j = R, then it is said that the window is built. If n i + n j < R, then cell j is completely absorbed in the window. If n i + n j > R, just a fraction of cell j, i.e.,

R−nj nj
will be added to build the window. This process continues until the final construction of the window. Therefore, the i-th window contains cell i and its nearest neighbors, and there are I windows such that each of them contains exactly R people. The null hypothesis, H 0 , is the case that people are distributed completely at random on the map. With this assumption, Turnbull et al. (1989) used M R = max(X 1R , . . . , X IR ) as the test statistic for hypothesis testing such that X iR , i = 1, 2, . . . , I (the number of cases in i-th window) are identical but not independent random variables. With this definition they were able to find the distribution of M R under H 0 .
Therefore, the CEPP method has at least three advantages in comparison with the GAM method: 1) there is no need to use Monte Carlo methods to compute the p-value; 2) it is not necessary to test many circles (the candidate class is not big); and 3) it is not necessary to apply it to small α values. However, it depends on specifying the size of R for the window and this is highly controversial.

Method of Besag and Newell
The CEPP method (Section 3.1) has some challenges in its computational aspects. Thus, Besag and Newell (1991) presented a method that is computationally more efficient. Consider n i , X i and H 0 as before. Besag and Newell (1991) tried to find a statistic with a known distribution to detect spatial clusters. They constructed this statistic for hypothesis testing by answering the following question: "Starting from an arbitrary cell, what is the minimum number of cells necessary to add to this cell to achieve a predefined number of sick people?". A small value for this statistic means there is a concentration of sick people in the neighborhood of the starting cell. To formulate this idea mathematically, they selected a sick person and denoted the cell of this person as A 0 . Other cells in the map are called A 1 , A 2 , . . . based on the Euclidean distance of their centroids to the center of A 0 (that is, the nearest cell to A 0 is A 1 and so on). They defined Since small values of M mean there is a cluster around A 0 , the test's significance level is: P r(M ≤ observed value|H 0 ). In turn, M > m means there are fewer than k sick people among the population with size U m . The probability of having fewer than k sick people among population with the size U m is the summation of hypergeometric probabilities. To find the P r(M < m + 1), Besag and Newell (1991) used a Poisson approximation to the hypergeometric distribution. Using this probability, significant clusters were determined and plotted on the map.
In this method, no Monte Carlo procedure is needed. However, there are two criticisms: 1) Euclidean distance may not be a good criterion to order cellsother distances may be more appropriate to reflect the similarities between cells; and 2) distributions for the number of cases in a cell are not considered. Soltani and Aboukhamseen (2015) introduced an alternative way to find spatial clusters using scan statistics. Consider the hypothesis test in (2.1) and suppose that G = Z 1 Z 2 . . . Z I is the study region. Also assume that X z,+ and n + (G) are respectively the number of points (cases) in zone z and the total number of points (cases) in the study region. For an individual in study region G, suppose A z denotes that this individual is in zone z and B + denotes that a person is labeled as a case. Soltani and Aboukhamseen (2015) defined the count measure μ on (G, F) such that μ(z) and μ(G) are respectively the number of individuals in z and G, and F is a sigma field of subsets of G. Both μ(z i ), i = 1, ..., I and μ(G) are known and the probability of the event A z is given by

Method of Soltani and Aboukhamseen
. With these assumptions, they proved that the hypothesis test in (2.1) is equivalent to the hypothesis test where P z|+ is the probability of belonging to zone z given that the label of the individual is +. Also, they proved that the exact and asymptotic distribution of points in zone z under H 0 are: where D −→ means convergence in distribution. Hence, using (3.1) and (3.2), zone z is a significant spatial cluster of level α if The main advantage of this method is the elimination of the Monte Carlo procedure to detect significant spatial clusters. Like most other scanning methods, the method requires the maximum size of the scanning window. When the appropriate size of the cluster is unknown and many cluster areas are expected to happen, the scanning procedure is repeated by varying the window size. This practice induces a multiple testing problem that is not considered by the authors.

Method of Aboukhamseen
Aboukhamseen, Soltani and Najafi (2016) extended the spatial scan statistic to the situation where the total population of the study in region G is unknown. They supposed that n + (G) is a random variable with a Poisson(λ(G)) distribution where λ(G) is unknown and that X z,+ |n + (G) ∼Bin(n + (G), ν(z)). They proved that X z,+ , X +,z c are independent under null hypothesis with marginal distribution Poisson(λ(G)ν(z)) and Poisson(λ(G)(1 − ν(z))) respectively. Using these facts one can prove that the scan window Since n + (G) has a Poisson distribution, it is possible to find a one-sided or two-sided confidence interval for λ(G). Also, using the marginal distribution of X z,+ , it is possible to find confidence intervals for λ(G)ν(z) (Garwood, 1936). Dividing the above-mentioned confidence intervals, one can find the confidence interval for ν(z). By considering the hypothesis testing (3.1) and the confidence interval for ν(z), it is possible to determine the significance of the MLC.

Spatial Clustering for Event Data
In most classical problems of scan statistics, researchers consider case-control data. However, in some problems, the consideration of disease-related events helps to perform a more adequate analysis. For example, suppose researchers want to know which hospitals in the study region have a heavy burden. If the researchers just consider the number of cases, their analysis will be biased because it is likely there are some cases who visit the hospital once while other cases may visit more than once. Hence the second group of cases places a higher burden to the hospital than to the first group. With this in mind, Rosychuk, Huston and Prasad (2006) considered maps for event data to determine spatial clusters. They proposed a compound Poisson model to detect spatial clusters for event data. Their strategy is based on the method of Besag and Newell (1991) which does not need Monte Carlo simulation. To find event spatial clusters, they assumed that the number of events in a study region is a random sum of random individual events and has a compound Poisson distribution. Suppose that X ia is the number of individuals in cell i with exactly a events and x ia is the observed value of this random variable. Let X i = a X ia be the number of cases in cell i while V i is the random variable corresponding to the total number of events in cell i.
are respectively the total number of cases and the total number of events in a study region with I cells. The test statistic is similar to the one presented by Besag and Newell (1991), i.e., to construct a scan window, they combine cells to include at least k * events. The test statistic for cell i is the number of cells combined with cell i to construct the scan window: Suppose that l i is the observed value for L i * . A small value of l i means that one finds more events in small windows and is a sign that those windows form a spatial cluster. Suppose that n i:l and X i:l are respectively the total population and the total number of cases for l i nearest neighbors of cell i. Since the number of events in the window corresponding to cell i follows a Poisson distribution under the null hypothesis, the estimate of the parameter for this distribution is λ i:li = n i:li X/n where n is the population size of the study region. Let Y j be the number of events for individual j, j = 1, 2, 3, . . . , X i:li ; hence; V i:li , the total number of events in . Using the following recursive relation (Ross, 2014) one can determine the p-value by (4.1). Notice that in practice Q(a) is unknown but can be treated as Q(a) = x a /x. Now it is possible to discuss the hypothesis testing to detect spatial clusters as before.
The method of Rosychuk, Huston and Prasad (2006) has some benefits: 1) it is suitable for event data, 2) it can apply to case-control data (assuming that each case only has one event), and 3) it does not need Monte Carlo simulation. However, there are serious drawbacks in the strategy of Rosychuk, Huston and Prasad (2006): 1) computing the given recursive relation is time-consuming even for a small value of b; 2) to apply this method, one needs a predefined cluster size which is not known in practice; and 3) the significance level depends on this size. Castellares, Prates and Abolhassani (2019) noticed that with the relationship between the compound Poisson distribution and the Neyman type A, there is no need to use the recursive relation (4.1) to achieve the LRT. This strategy makes the cluster detection process computationally feasible.
Furthermore, an approximation to the probability of the events using the negative binomial distribution to detect spatial clusters for events was proposed by Chang and Rosychuk (2015). Because of the combinatorial coefficients in the negative binomial probability mass function, calculation of the likelihood is time-consuming for large datasets. The proposal of Castellares, Prates and Abolhassani (2019) shows that the execution time for calculating the likelihood function is faster than the recursive formula (Rosychuk, Huston and Prasad, 2006) and the negative binomial method (Chang and Rosychuk, 2015) in spatial cluster detection.

Scan Statistics for General Graphs
One possible way to explain relationships between nodes in networks is to use graphs. Sometimes researchers are interested in finding which nodes have com-mon features or play a similar role in networks. These kinds of nodes can be treated as clusters. One of the earliest studies about scan statistics in graphs is the work of Priebe et al. (2005) on Enron graphs. They applied scan statistics to detect anomalies in time series of Enron Email graphs. Marchette (2012) extended the idea of scan statistics in graphs to detect anomalies in time series of graphs. An anomaly is defined as a small region of vertices with unusually high connectivity between themselves in comparison with other regions. Consider a graph G = (V (G), E(G)), where V and E are respectively vertices and edges. The idea of a graph invariant property is the main key in the method of Marchette (2012); this is a function ψ : G −→ R that does not depend on how the graph is presented.
Let the set of k-neighbors of vertex v, be N k (v); this is the set of all vertices u in the graph such that the minimum number of edges needed to reach u from v is A time aspect can also be added to a graph. Let {G t } be a collection of graphs with time index t. In this collection of graphs, only the edges change in time, so the vertices are fixed for all times. Adding a time index to graphs not only helps researchers find clusters by comparing sub-graphs but also makes it possible to find anomalies by comparing regions to their past history. As before the locality statistic is Ψ t k (v) which is the cardinality of the sub-graph induced by N k (v).
Large values of and σ t k (v) 2 are respectively the mean and the variance of ψ t−w k (v), . . . , ψ t−1 k (v). Hence, the maximum of Ψ t k (v) can be used as scan statistic at time t.
To detect the clustering pattern, Wang and Phoa (2016) defined a scan statistic for three different features: 1) Structure (S); 2) Attribute (A); and 3) both Structure and Attribute (SA), in social networks. These features are defined in Zhou, Cheng and Yu (2009). To become familiar with them, consider the example of "coauthor network" of Zhou, Cheng and Yu (2009). In this network, each node represents an author and vertex connectivity shows the relationship between authors. In the structural based cluster concept, in the "coauthor network", nodes with close connectivity form a cluster (they could have different topics), but from the standpoint of attribute-based clustering, topics are considered. Thus, authors in a cluster work on the same topics.
For a network, consider a graph G as before; suppose that k = {k 1 , . . . , k |V | } are the degrees of the vertices. Let k G be the sum of all degrees and |E(G)| be the total number of edges. Based on Erdos and Rényi (1960), the expected number of edges between any two vertices v i and v j is Wang et al. (2008) introduced the following scan statistic for detecting spatial clusters based on the "structure" in graphs: Wang and Phoa (2016) considered the "attribute" network. To find spatial clusters of attributes they considered a graph G in which there is an attribute X i associated with each vertex. Thus now G = (V, E, X) with (X = x 1 , ..., x |V | ).  considered four possible distributions for each X i : 1) binomial; 2) Poisson; 3) normal; or 4) multinomial.
1) binomial distribution: where N G and n G are respectively population size inside G and number of population with the particular attribute under study. The ratio of people with a particular attribute inside sub-graph z (i.e., inside scanning window) is p 10 . Also, p 11 is the ratio of that attribute in sub-graphz (i.e., outside scanning window). Finally, p 0 is the ratio of that attribute in G.
2) Poisson distribution: where the values for p 11 , p 10 and p 0 are similar to previous paragraph.
3) normal distribution: where n is the total number of nodes,σ 2 is the variance of all the x i 's and where k is the number of categories, n is the total number of the nodes, n z is the number of nodes in sub-graph z, n zk is the total number of nodes in z whose attribute is of category k and finally, n k is the total number of nodes in the whole graph in category k. If "attribute" and "structure" are independent, then the scan statistic for both "structure and attribute" is λ SA (Z) = λ S (Z) + λ A (Z). The sub-graph z which maximizes the scan statistic is the MLC. To test the significance of the MLC from the attribute point of view, a randomized permutation procedure is applied as follows: for situations in which the underlying distribution is normal, first, assign observed values to nodes randomly and calculate the scan statistic. The process is repeated many times (e.g., 999) and the scan statistic is calculated for each repetition. To calculate the p-value, one needs to sort these 999 simulated values with the observed value of the scan statistic from the real dataset. The p-value is given by (M + 1)/1000, where M is the number of simulated values greater than observed value of the scan statistic. In the case of the binomial and Poisson distributions, the process is similar. Instead of assigning the observed values to nodes, the values for nodes are generated under the null hypothesis using a multinomial distribution.
In the case of detecting a spatial cluster of the graph structure, randomly assigning degrees to vertices is impossible, because if the degree is randomly assigned to nodes, it is likely that the nodes and edges will not construct a valid graph (Sierksma and Hoogeveen, 1991). Hence, to obtain the significance of MLC, it is suggested to apply a probabilistic method and use the expected degree based on the random graph model (Erdos and Rényi, 1960). Let (k i , k j ) be the degree of nodes i and j respectively. Then the expected number of edges is e ij = kikj 2|E(G)| and all random graphs are generated with the same expected degrees. Therefore, it is possible to perform a Monte Carlo hypothesis testing procedure. Fortunato (2010); Woodall et al. (2017), studied cluster detection in networks in more detail.

Spatial Scan Statistics for Continuous Data
The spatial scan statistics method is commonly applied to count datasets. However, some researchers are interested in applying methods to find spatial clusters in the case of continuous spatial datasets where the measurements might have distributions such as, for example, normal, exponential, Weibull, etc. This section discusses how to construct spatial scan statistics for continuous spatial datasets.

Normal and Multivariate Gaussian Scan Statistics
To find spatial clusters of low weight infants in New York City, Kulldorff, Huang and Konty (2009) proposed a normal scan statistic. As before, suppose there are I cells in a map and the total population in the study region G is N individuals i.e., N = m k=1 N k where N k is the population of cell k. Suppose each cell has one or more individuals with spatial location i, i = 1, . . . , I and x s = i∈s x i , where x s is the summation of the weights in sub-region s.
Detecting spatial clusters is equivalent to performing the following hypothesis test: H 0 : all observations come from the same normal distribution vs. H 1 : there is at least one sub-region where the mean of observations is more (less) than outside of this sub-region. To perform this hypothesis test, Kulldorff, Huang and Konty (2009) constructed a scan statistic based on the likelihood ratio test, i.e., max z (ln L z )/(ln L 0 ), where L z and L 0 are respectively the likelihoods under H 1 and H 0 . The likelihood ratio depends on sub-region z via the mean of the sum of deviations from the mean inside and outside sub-region z, i.e., the likelihood ratio will be a maximum when σ z is maximized where σ z is the above-mentioned mean of deviations. Hence, the MLC is a sub-region z that maximizes σ z . After finding the MLC the significance of the MLC of the spatial cluster can be obtained using randomization.
Sometimes it is interesting to consider the correlation between variables in spatial clustering problems. Cucala et al. (2017) applied a multivariate Gaussian scan statistic to find spatial clusters. This method is more powerful than its independent version. Huang et al. (2009) introduced a weighted normal scan statistic to detect spatial clusters for continuous measures. The weight variable δ z is considered to reflect the uncertainty of the regional measures in zone z. Let δ z be a known measure proportional to the inverse of the uncertainty in zone z. It is possible to measure different values for studied data from different sources. For example, it is possible to record different values for pollution data from different locations in a cell and report their average value as w z . The weight δ z can be considered as the inverse of the variance of the w z s. In some situations it is not possible to calculate the variance of w z so one can use population size or the number of cases as a proxy for the inverse variance.

Weighted Normal Spatial Scan Statistics
To construct a spatial scan statistic Huang et al. (2009) first assume that for that μ z is the mean of measurements inside zone z, and σ G 2 δ z = σ 2 wz is the variance of w z . It is assumed that given δ z the w z s are independent normal with the same mean and different variances. In view of these assumptions, Hence, the MLEs for the parameters can be found, sayμ z ,μ z c andσ 2 G . Using these, the estimateσ 2 wz is obtained. To construct the spatial scan statistic, it is necessary to find the MLC by maximizing the likelihood ratio. Therefore, it is necessary to know for which sub-region z the log likelihood function is maximized under H 1 , i.e., by maximizing ln L(μ z , μ z c , σ G 2 ), the MLC is determined. Huang et al. (2009) proved that maximizing the log likelihood function is equiv- with respect to z in the candidate class. They used Monte Carlo hypothesis testing to determine the significance of the MLC.

Spatial Scan Statistics for Survival Data
To deal with continuous spatial data in spatial clustering problems Huang, Kulldorff and Gregorio (2007) presented the exponential spatial scan statistic for censored and uncensored continuous survival data. This model is suitable for finding spatial clusters of lifetimes. The assumptions are as follows: T i is the exponentially distributed survival time of individual i, θ in is the mean of T i inside zone z, and θ out is the mean of T i outside zone z. The goal is to test if there is at least one sub-region in study region G (with total population N ) such that θ in > θ out or not. They assumed that for a fixed censoring time L i , T i is observed if and only if T i ≤ L i , otherwise right censoring is present. Hence the observed time for individual i is t i = min(T i , L i ). To determine which lifetime observation is censored, they used an N dimensional vector γ such that i-th element is 1 when censoring happens and 0 otherwise. With this vector, one can count the number of non-censored individuals in any zone z, i.e., r in = i∈z γ i . Using this notation the likelihood of a zone z is: To detect a spatial cluster, as in the other methods mentioned above, the likelihood ratio test is used to find the most likely cluster. Any sub-region z which maximizes is the MLC. In the presence of censored data, after estimating parameters, the likelihood ratio is: Since the denominator is independent of z, to find the MLC it is only necessary to determine for which candidate z the numerator is maximized. After determining the MLC, it is necessary to generate simulated data under H 0 to determine the significance of the MLC. Since the distribution of survival times is unknown, it is impossible to generate datasets under the null hypothesis. One way to handle this limitation is permutation of observed pairs {(t i , γ i ), i = 1, . . . , n} between individuals. The position of individuals is fixed in the map since they are in the real dataset. To find the exact distribution of λ, one needs to calculate λ for N ! permutations, which is time-consuming even for small N . Therefore, Huang, Kulldorff and Gregorio (2007) proposed to use random selection of 9999 permutations instead of all N ! permutations. Finally, the MLC is significant at level α if the p-value is less than α (i.e., RK/(1 + 9999) < α, RK is the rank of the scan statistic for the real dataset among the original and 9999 permuted datasets). Similarly, for non-censored data, the scan statistic is constructed to find spatial clusters. Because the exponential distribution has just one parameter, it is sensitive to modest variations (Bhatt and Tiwari, 2014). Therefore, Bhatt and Tiwari (2014) proposed a more robust alternative for the exponential distribution using the Weibull distribution with two parameters. The method constructing the spatial scan statistic for this distribution is similar to the previous one.

Zero-inflated Poisson Scan Statistics
Excess of zeros can be observed in a given dataset. In spatial cluster detection, the extra zeros can lead to biased inferences (Gómez-Rubio and López-Quílez, 2010; Cançado, da Silva and da Silva, 2014). Cançado, da Silva and da Silva (2014) considered the zero-inflated Poisson distribution to detect spatial clusters for these types of datasets. Suppose that the number of cases in sub-region z has a zero-inflated Poisson distribution, i.e., X z ∼ ZIP (p, n z θ z ) such that n z and p are respectively the population size in zone z and the probability of being structurally zero for a cell. The hypothesis test is used to detect spatial clusters. Following Kulldorff (1997), the likelihood ratio test is used to perform hypothesis testing. Hence, such that f i 's and f j 's are respectively the probability mass functions (pmf) of ZIP(p, n i θ z ) and ZIP(p, n j θ 0 ). Using the pmf of the zero-inflated Poisson distribution, it is impossible to find the MLE for the parameters. By applying the method of Lambert (1992), they constructed a new likelihood, based on the knowledge of the different kinds of zeros, i.e., structural or sampling zeros. Their method is as follows.
Suppose that δ = (δ 1 , . . . , δ m ) is a vector where δ i indicates if the zero in cell i is structural (δ i =1). Hence, δ i is a binary variable, δ i ∼ Ber(p). To find the MLE of the parameters, Cançado, da Silva and da Silva (2014) worked with bivariate data (x i , δ i ), i = 1, . . . , m and found the likelihood using the pmf of (x i , δ i ), namely, Therefore, one can find a closed form for the MLE of the parameters. However, the vector δ is commonly unknown and needs to be estimated. An EM algorithm is proposed to estimate δ, so the MLE of the parameters can be found. Now, one can find the sub-region z which maximizes the likelihood ratio λ(z). This sub-region is the MLC. Since the distribution of max z λ(z) is un-known, Monte Carlo simulation is used for hypothesis testing to find spatial clusters.

Zero-inflated Binomial Scan Statistics
Cançado, Fernandes and da Silva (2017) introduced a zero-inflated binomial scan statistic. Suppose that m, n i , x i , N , C are respectively, the total number of cells, the population and cases in cell i, the total population of the study region and the total cases indicated on the map. The aim is to find sub-region z for which the probability of being a case in z, i.e., θ z , is greater than the probability of being a case outside z, say θ z c , when the dataset has extra zeros. Let X z be the number of cases in zone z such that X z ∼ Bin(n z , θ z ) where n z is the population in zone z. To work with datasets with extra zeros, it is necessary to use a mixture model, i.e., a degenerate distribution at zero and a binomial model for nonzero counts. But using this mixture, the MLEs do not have closed-form expressions. Therefore, constructing a scan statistic using the mixed distribution is not trivial. Then, Cançado, Fernandes and da Silva (2017) proposed to use the method in Lambert (1992) to construct their scan statistic. They defined the vector δ = (δ 1 , . . . , δ m ) similar to the previous subsection. Hence, δ i ∼ Ber(p). To obtain the scan statistic, one needs to maximize the likelihood ratio function (Kulldorff and Nagarwalla, 1995). So, Cançado, Fernandes and da Silva (2017) calculated L 0 (δ, x) and L 1 (δ, x) to compute zero-inflated binomial scan statistics. i.e., λ(z). The zero-inflated binomial scan statistic λ(z) is a function of θ z , θz, p, θ 0 and vector δ. To find the MLC one needs to calculate the MLE for these unknown parameters and δ. The MLE for the parameters is obtained easily, but to estimate δ one must rely on the EM algorithm. After estimating the unknown parameters and vector δ, the MLC is obtained by maximizing λ(z). As in previous cases, the distribution of λ(z) is unknown; therefore these authors also used Monte Carlo for hypothesis testing.
Additionally, de Lima et al. (2015) suggested applying a zero-inflated double Poisson model to detect spatial clusters for zero-inflated and overdispersed spatial data. In turn, Zhang et al. (2017) theoretically discussed the asymptotic properties of spatial scan statistics and considered overdispersion of lung cancer in Texas.

Bayesian Beta-binomial Scan Statistics
Cançado, Fernandes and da Silva (2017) proposed the Bayesian zero-inflated binomial scan statistic to detect spatial clusters. To construct their beta-binomial scan statistic, they assumed (x i |δ i ) ∼ Bin(n i , θ i ) and considered Beta(α 0 , β 0 ) as a prior for θ 0 . The posterior is then Beta(C + α 0 , N − C + β 0 ) where C and N are total cases and total population respectively. To find the Bayesian spatial scan statistic it is necessary to find the marginal likelihood under H 0 and H 1 . After selecting adequate priors P (H z ) and P (H 0 ), which are respectively the zone prior probability and the prior probability of having no cluster, the Bayesian beta binomial spatial scan is presented in Algorithm 2.
Algorithm 2 Bayesian Beta Binomial Algorithm 1: For each candidate sub-region z, calculate Sz = P (x|Hz)P (Hz) and z Sz. 2: Compute S 0 = P (x|H 0 )P (H 0 ) and obtain P (x) using z Sz and S 0 . 3: Compute P (Hz|x) for each z in the candidate class.
Any sub-region z which maximizes P (H z |x) is a MLC. To determine P (H z ), α z , β z , P (H 0 ), α 0 , and β 0 , one can use historical information or non-informative priors.
The advantage of the Bayesian method in spatial cluster detection problems is the freedom of this method from Monte Carlo simulation in determining significance, but computing statistical power makes no sense for this method. Hence, the authors suggested the use of the Bayes factor (BF) as an alternative criterion for power which is defined as: A BF > 1 indicates that H z is more strongly supported by the observed data x. Thus, after finding the MLC, the BF can be used to decide about the significance of the MLC. BF > 1 indicates that the detected MLC is significant. There is at least one drawback of this method: for large values of N and/or C, the method leads to numerical instability. To mitigate this problem, the logarithm of the probabilities in the above-mentioned computations can be used.

Bayesian Zero-inflated Binomial Scan Statistics
In the same paper, Cançado, Fernandes and da Silva (2017) proposed an extension of the beta-binomial scan statistic presented in the previous subsection. The Bayesian zero-inflated binomial method is proposed as follows: suppose (X i |θ i , p) ∼ ZIB(n i , θ i , p), θ i ∼Beta(α i , β i ) and p ∼Beta(α p , β p ). The null hypothesis is H 0 : θ i = θ 0 , α i = α 0 and β i = β 0 . As in the previous subsection, the MLC is a sub-region z which maximizes P (H z |x). Like Algorithm 2, Algorithm 3 below has three steps.
Any sub region z which maximizes P (H z |x), is the MLC area. As before the BF is the criterion chosen to ascertain the significance of the MLC. But since in practice the vector δ is unknown, it must be estimated. A Gibbs sampler is used to estimate the δ parameter vector.

Distribution-free Scan Statistics Based on the Index of Concentration
Cucala (2014) extends scan statistics to point processes using a distribution free scan statistic. He proves that the distribution-free scan statistic is completely equivalent to the Gaussian-based scan statistic presented by Kulldorff, Huang and Konty (2009). The distribution-free scan statistic considered by Cucala (2014) has homoscedastic and heteroscedastic versions. It is developed based on the assumption that {(x i , s i ), i = 1, . . . , m} are realizations such that s i and x i are respectively the coordinate of the centroid and the associated measure for cell i.
To construct the distribution-free scan statistic in the homoscedastic version, it is assumed that X 1 , . . . , X m are independent and identically distributed (i.i.d.) random variables related to the measure in cell i with E(X i ) = ν, V ar(X i ) = σ 2 , for all i, and Cov(X i , X j ) = 0 if i = j. Consider that the mean and variance are unknown. As before any connected sub-region z is a candidate for being a spatial cluster. If the mean of measures in a candidate region z, is significantly higher (lower) than the mean outside z, z will be considered a spatial cluster. Assume D(z) =μ(z)−μ(z c ) is the difference of means inside and outside z. Under the null hypothesis E(D(z)) = 0 and V ar(D(z)) = σ 2 ( 1 n(z) + 1 n(z c ) ), where n(z) is the total number of cells in sub-region z. Moreover, Cucala (2014) introduced an index of concentration defined by Since E(I(z)) = 0, V ar(I(z)) = σ 2 and these values do not depend on n(z), I(z) can be used to find potential clusters having different population sizes. The next step is to determine the MLC, i.e., a connected sub-region which maximizes (minimizes) the index of concentration to detect spatial clusters of hot spots (cold spots). Cucala (2014) introduced three scan statistics: 1) the positive scan statistic, λ P = max z∈Z I(z) which finds hot spots; 2) the negative scan statistic, λ N = min z∈Z I(z), which finds cool spots; and 3) the global scan statistic, λ = max z∈Z |I(z)|. After determining the MLC, because the distribution of the scan statistic is unknown, one needs to use Monte Carlo to determine the significance level of the spatial cluster. It is impossible to generate datasets under H 0 , because it is assumed that this method is distribution-free. Hence, the author used random labeling to evaluate significance. To construct the distribution-free heteroscedastic version of the scan statistic, it is supposed that the variances of measures are not equal and they depend on the weight related to cell i, i.e., V ar(X i ) = σ 2 δi for all i. As before, σ 2 is unknown. Suppose that x i is the mean of measures in cell i and this mean is the mean of δ i cases. Under these assumptions the population in zone z is n(Z) = m i=1 δ i I(s i ∈ z) and the mean of measures in z isμ(z) = m i=1 δiXiI (si∈z) n (z) . Note that I(z), λ p , λ z , λ are the same as the homoscedastic versions. Again, using random labeling one can ascertain the significance of the MLC. Jung and Cho (2015) presented a nonparametric method to find a spatial cluster for continuous datasets using the Wilcoxon rank-sum test. This method not only is an alternative to the normal scan statistic (Kulldorff, Huang and Konty, 2009), but also can be applied to heavy-tailed and skewed distributions.

Distribution-free Scan Statistics Based on the Wilcoxon Test
Suppose F in and F out are CDFs of measures inside and outside a sub-region z ∈ Z and N is the population size of the study region G. To find a spatial cluster consider the following hypothesis test: is the location shift of the CDF of the outside relative to the inside candidate. If > 0, ( < 0), the measurements tend to be higher (lower) inside candidate z compared to outside it. Since the distribution of measurements is unknown, it is impossible to define a scan statistic, so the authors suggested using the Wilcoxon rank-sum test. To apply this test, suppose that the rank of the measurement i, i.e., the rank of the x i , is o i . The Wilcoxon rank-test for candidate z is W z = i∈z o i . Using the normal approximation for W z , i.e., is approximately normal(0, 1), a p-value can be calculated. A subregion z which has minimum p-value is considered the MLC. The main benefit of this method is its flexibility in the application for heavy-tailed and skewed distributions. Another clear advantage is that this method does not require Monte Carlo simulation and computing the p-value is simple.

Spatio-temporal Clusters
Most of the proposed cluster detection methods are based only on the spatial aspect, without the time dimension. However, the time dimension can be an important factor in cluster detection problems. Knox and Bartlett (1964) pioneered the study of space-time clustering by proposing a test. Their method is based on counting the number of pairs of events which are close in space and time simultaneously. A large number of these events indicates that events form spatio-temporal clusters. The Knox and Bartlett (1964) test was generalized by Mantel (1967). These tests are appropriate when the interest is to know the existence of a space-time cluster and sound an alarm without identifying its location and duration. In other words, these tests are suitable to declare that, for example, there is a disease outbreak.
To find the location and time period of an occurring cluster, the "space-time scan statistic" is a common extension of the purely spatial scan. It is a powerful statistical framework for the analysis of point processes. In this type of scan, the goal is to detect regions of space-time where the counts are significantly higher than expected. Kulldorff et al. (1998) were the first to discuss the time dimension in cluster detection using scan statistics. In space-time scan statistics, instead of a circular window, a cylindrical window scans the map such that its base is for scanning geographical area and its height corresponds to the time dimension. This window moves in space and time to find the MLC. Afterward, the significance of the MLC is determined by Monte Carlo simulation. Kulldorff et al. (1998) tried to find spatio-temporal clusters of brain cancer in New Mexico.
Later, Kulldorff (2001) applied the space-time scan statistic to prospective disease surveillance. The method was illustrated for thyroid cancer among men in New Mexico in 1973-1992. Also, Elias et al. (2006 tried to find spatiotemporal clusters of Meningococcal disease in Germany in a study in which specimens were obtained during 42 months. The space-time scan statistic was also applied by Tonini, Tuia and Ratle (2009) to detect spatio-temporal clusters in fire sequences to find active fires in the state of Florida (US) during 2003-2006. According to their work, statistically significant clusters were detected in time and space. Clusters of forest fires are more frequent in hot seasons (spring and summer). This information helps authorities to take preventive measures at the correct time and space. Another application of space-time scan statistics was presented by Carneiro et al. (2007) involving American visceral leishmaniasis in the state of Bahia, Brazil, covering the 11-year period from 1994 to 2004.
Although there are many applications of the "space-time scan statistic", Tango, Takahashi and Kohriyama (2011);Correa, Assunção and Costa (2015); Gangnon (2010b); Tango (2016) criticized the use of the prospective space-time scan statistic. To improve some of its problems, Assunção et al. (2007) proposed a score-based space-time scan statistic which is discussed in the next subsection. Also, Prates, Kulldorff and Assuncao (2014) presented a simulation study showing that the relative risk estimates for the space-time scan statistic must be defined with care and presented bias in its estimation, while the relative risk estimator is well defined for the purely spatial scan situations being not biased as the true relative risk of the cluster increases.

A Score-based Space-time Scan Statistic
To find spatio-temporal clusters, Assunção et al. (2007) proposed a new scan statistic that detects clusters in time and space in a point process by scanning three dimensions (two dimensions for space and one dimension for time). As before, a hypothesis testing method is applied such that the null hypothesis is that the underlying point process is a homogeneous Poisson point process with separable space-time intensity versus the alternative hypothesis of the existence of at least one space-time cluster.
To create the space-time scan statistic, Assunção et al. (2007) assumed a Poisson point process in a space-time region A = A × [0, T ], such that A is a two dimensional area and T stands for the study time. They denote the space-time intensity by λ(x, y, t). Under the null hypothesis, the intensity is separable, i.e., λ(x, y, t) = λ S (x, y)λ T (t). Hence, under this hypothesis, the likelihood is a separable function of time and space, which are (functional) nuisance parameters. By introducing an alternative to the null model, they obtain a closed form score test statistic. They choose C = C s × C T as a fixed and arbitrary cylinder, such that C S denotes space and C T is the time period. The alternative hypothesis is: where > 0 and I is an indicator function. This hypothesis expresses the deviation of the point process from the separability hypothesis. Considering this alternative hypothesis, the likelihood depends on , λ S (x, y) and λ T (t) . But λ S (x, y) and λ T (t) are unknown, so applying the LRT method of Kulldorff (1997) is impossible. Therefore, Assunção et al. (2007) suggested a locally most powerful test based on the score statistic: The most likely spatio-temporal cluster is a cylinder C which maximizes U C i.e., Since the sampling distribution of (9.1) is unknown, Monte Carlo hypothesis testing is suggested (Dwass, 1957). For this test, the spatial locations of the events are fixed and by permutation of time t i , i = 1, 2, . . . , n datasets are generated based on the null hypothesis (n is the total number of events). If the rank of U obtained by a real dataset is in the k-th largest in comparison to the values of U in the m − 1 generated datasets under null hypothesis, then p = k/m is the one-sided significance level. This Monte Carlo hypothesis test to detect spatio-temporal clusters is naive and time-consuming. Hence, Assunção et al. (2007) created a more workable test process. Suppose that cylinder C 1 is contained in cylinder C 2 i.e., Therefore, they showed that it is enough to scan all distinct subsets of events and their associated enveloping cylinders. Based on these facts, an improved version of the naive Monte Carlo test is presented. Assunção and Correa (2009) proposed a method to detect clusters in space-time based on the Shiryaev-Roberts statistic and its martingale property (Kenett and Pollak, 1996). Suppose that λ(x, y, t) is the unknown rate of a Poisson point process in R 3 such that this process is observed in A × (0, T ], where A represents the space component and (0, T ] the time component. Assume that the event (x i , y i , t i ) is observed at time t i . Hence, at time t n , the number of observed events is exactly n. Imagine a cylinder C k,n with circular base B(k, ρ) and the length of the interval (t k , t n ] as its height, i.e., C k,n = B(k, ρ) × (t k , t n ], such that ρ is the radius of its base and t n > t k . Considering N (C k,n ) to be the number of cases inside the cylinder following a Poisson(μ(C k,n )), where μ(C k,n ) = C k,n λ(x, y, t)dxdydt, and assuming λ A (x, y) and λ T (t) are the marginal spatial and temporal densities, Assunção and Correa (2009) discussed the surveillance method in space-time. Their assumption is that the intensity function is separable, i.e., the intensity is proportional to the product of the time and space intensity components. They defined a coefficient of proportionality

A Surveillance Method
where μ is the expected number of events in the study space. Therefore, the intensity in the presence of a cylindrical cluster is as follows: such that I is an indicator function which shows whether or not an event belongs to the cylinder, and the constant > 0 is the relative change in event's intensity within the cylinder. Assuming there is no emerging cluster, the likelihood of the space-time Poisson process for n observed events is (Streit, 2010): λ(x, y, t)dxdydt .
( 9.3) The emerging cluster at time t k < t n is calculated by (9.3) using the intensities in (9.2). Let L k be the likelihood of the space-time Poisson processes when n events have been observed. The test statistic is given by R STCD n = n k=1 L k L ∞ = n k=1 Λ k,n , where Λ k,n = (1 + ) N (C n,k ) exp (− μ(C n,k )). To perform the hypothesis test, a value "A" must be defined as threshold. The null hypothesis (there is no spatio-temporal cluster) is rejected if the test statistic exceeds "A". The determination of "A" and are further discussed by Veloso et al. (2017).
Since μ(C k,n ) is unknown it is necessary to estimate it. This estimate is given by: (events in a cylinder with height t n ) × ( events in time interval(t k , t n ] in A) total events .
(9.4) This estimation function was proposed by Assunção and Correa (2009). However, Veloso et al. (2017) identified an error in this estimator and proposed a modified version. In the Assunção and Correa (2009) version, it was assumed that if the actual time is t n , then the most recent event is included in the total number of events in the time interval (t k , t n ] and it may be included in the number of events in the disk B(k, ρ). In the Veloso et al. (2017) version, to preserve the martingale property, the parameter estimation at time t n should not depend on the observations at the current time t n . Therefore, Veloso et al. (2017) modified the estimation of μ(C k,n ) by excluding the actual time t n . Their modification changes the denominator of (9.4) to the number of total events minus 1 and the numerator by excluding t n from the interval. Besides this modification, Veloso et al. (2017) proposed an algorithm for automatic detection of multiple emerging space-time clusters. They also proposed how to automatically estimate the variable instead of fixing it.

Transformation Method to Detect Spatial Clusters in Case-control Data
Aggregated data are used for spatial clusters on maps, where detailed information is lost in comparison with case-control data. In case-control data, the coordinate of each case is known on the map, while for aggregated data researchers only know the total number of cases and population in each cell. Therefore, when case-control data are available, it is recommended to use all the information available instead of aggregating it. Dematteı, Molinari and Daurès (2007) presented a method based on regression models and data transformation to detect multiple irregularly shaped spatial clusters. This method is an extension of the method of Molinari, Bonaldi and Daurés (2001). Their method selects the best model based on a double maximum test of H 0 : uniform distribution of cases vs. H 1 : there is at least one spatial cluster. Suppose there are n cases in study region G with total population N and X 1 , . . . , X n are i.i.d. random variables with density h(x) which denote the place of cases in G. Using the coordinates of the cases, the authors introduced two variables, "distance" and "order". The distance variable is the distance between a point and its nearest neighbor, and the order variable is the order of selection of the cases. The order variable is denoted by t. If there is a cluster in G, cases in this cluster will have consecutive selection order and the interior distances will be less than the distances outside the cluster.
To find spatial clusters using this method, suppose that x k , k = 1, . . . , n is an observation of X k and x (k) is the k-th selection when x 1 is chosen arbitrarily from all observed values of cases and given {x (1) , . . . , x (k) }, x (k+1) is the nearest case from x (k) among the n − k remaining cases which are not selected yet. Suppose that D k is the distance between X (k) and X (k+1) and d k is its observed value. The cumulative distribution and density of D k are defined as G k and g k respectively. Let d w k be the ratio between d k and the expected value of D k given that the values of x 1 , . . . , x k are known. Under the null hypothesis, if d w k > 1 then the observed distance is greater than its expected value, while for d w k < 1 the observed distance is less than its expected distance. Therefore, the null hypothesis will not be rejected if d w k is statistically close to one. The way to calculate E H0 (D k |X 1 = x 1 , . . . , X k = x k ) is given in Dematteı, Molinari and Daurès (2007). Using the ordered weighted distances, it is possible to find the location of the spatial clusters.
Consider ordered pairs (k, d w k ) with k = 1, 2, . . . , n − 1. To detect the spatial cluster bounds, Dematteı, Molinari and Daurès (2007) applied a regression of weighted distance on the order of selection. Under the no-cluster hypothesis, the appropriate model will be constant i.e., thus, free of t. If there is one cluster in the dataset, there will be two breaks (jump and fall) in the model, at points T 1 and T 2 . In other words, from point 1 to point T 1 the value of the model isd [1:T1] , from point T 1 + 1 to T 2 , the value of the model isd [T1+1:T2] and from point T 2 + 1 to point T , the value of the model isd [T2+1:T ] . T 1 and T 2 are called breaks (cluster bounds). At these points, i.e., T 1 and T 2 , the value of the model rises or falls.d is the mean weighted distances in the corresponding interval i.e.,d [a:b] which is the mean distance based on d w a , d w a+1 , . . . , d w b . The spatial cluster areas are the points for which the mean distance is low. To determine breaks, i.e., T i 's, the following strategy is used: consider as the minimum ratio of points which are between two breaks (minimum size of potential cluster), for example, 0.1, and The breaks are the points that minimize the squared error between d w t and f (t). These points can be found by using the computer program developed by Bai and Perron (2003). After finding the breaks, to visualize the cluster area, Dematteı, Molinari and Daurès (2007) suggested a disc-based method and a Voronoi tessellation method (Allard and Fraley, 1997), which are complementary to each other. A researcher may find many models with different breaks, so to select the best model it is suggested to perform hypothesis testing of no breaks vs. k breaks using the statistic proposed by Bai and Perron (1998) and to determine the significant models. To select the best model among the significant models, they suggested using the double maximum test (Bai and Perron, 1998). After selecting the best model, it is necessary to determine the p-value associated with each cluster by Monte Carlo simulation.
Indeed, in this method, when using the ordering of cases the spatial data are transformed into a one-dimensional point process. This method has at least two advantages: 1) it can detect irregularly shaped clusters; and 2) it has low computational demands. However, there are some drawbacks: 1) it is necessary to determine ; and 2) it is necessary to select the upper bound for the number of breaks. It is important to mention that even under the null hypothesis, the distances may not be distributed identically when their dependency structure is complex. Hence, Cucala (2009) proposed a method similar to the method of Dematteı, Molinari and Daurès (2007), based on a specific distribution property and introduced a flexible spatial scan test for case-control data. The aim of this method is to detect a sub-region in which the number of cases is abnormally high.
Consider an ordered sample with size n from uniform (0, 1). Suppose {S 1 . . . , S n+1 } are the length of the intervals constructed from this ordered sample. Let A 1 be a sub-region of study region G such that the distance of points inside this sub-region and the border of G is less than D 1 (distance between the first selected point and the border of G). Cucala (2009) assumed that A 1 is proportional to the population in sub-region A 1 , and proved that under H 0 , the distribution of A 1 is the same as the distribution of S 1 . Similarly considering A 2 as a sub-region external to A 1 , where the distance between points inside it and the first chosen point (i.e., X 1 ) is less than D 2 , the distribution of A 2 , is the same as the distribution of S 2 under the null hypothesis. Likewise, A 2 is proportional to the population in A 2 . Under H 0 , the area spacing {A 1 , . . . , A n+1 } has the same distribution as the uniform spacing i.e., {S 1 , . . . , S n+1 }. Considering . . , T n } are distributed as n ordered statistics from a uniform (0, 1) (because of the distributional property), so the cluster detection in {T 1 , . . . , T n } corresponds to cluster detection in the study region. Therefore, Cucala (2009) transformed the spatial cluster detection problem into a one-dimensional cluster detection problem. This kind of cluster detection can be done by applying the concentration index method of Cucala (2008). Zhang and Lin (2009) presented a model-based approach that is equivalent to the spatial scan statistics method. Their method can be applied to overdispersed data. Furthermore, it is interesting for practitioners to identify clusters of spatial units with distinct patterns in a regression coefficient.  proposed a formal statistical methodology by focusing on spatially varying coefficient regression methods such as geographically weighted regression models. They developed this new method for spatial cluster detection with a covariate. Detection of a single circular cluster and multiple clusters are possible with this new method. A limitation is that it allows for only one covariate. Relaxation of this restriction is desirable in order to explore the study region and allow for the detection of irregularly shaped clusters. Demattei and Cucala (2010) extended the transformation method and spacing method mentioned above to find spatio-temporal clusters. They introduced a spatio-temporal distance and based on the ordering technique, tried to find spatio-temporal clusters. Not only did this add time to the spatial cluster detection method but also their method can be applied to find multiple clusters in case-control data. Multiple cluster detection is discussed in Section 11. In essence, Demattei and Cucala (2010) presented a method to find multiple clusters in time and space.

Spatio-temporal Cluster Detection
It should be mentioned that Kulldorff et al. (1998) pioneered the discussion of the detection of clusters in time and space. Section 9 presents many alternatives for spatio-temporal cluster detection.

Multiple Spatial Cluster Detection in Study Regions
As discussed before, to find a spatial cluster in the study region, researchers perform hypothesis testing -H 0 : there is no spatial cluster vs. H 1 : there is at least one sub-region z classified as a spatial cluster in the study region. In all of the above-mentioned studies, after finding the MLC, authors discuss the statistical significance and explore weaker clusters i.e., the second most likely cluster (SLC) in the study region (which does not overlap with the MLC). To find the significance of the SLC, it is necessary to apply a sequential method where cases and populations inside the MLC are removed from the dataset, and then, the standard spatial scan statistic is applied.
To find multiple spatial clusters, Li et al. (2011) proposed an alternative method. Under two conditions, the combination of sub-region z i and z j , i.e., (z i , z j ), is considered as a two-cluster candidate in the case of multiple clusters: 1. z i and z j do not overlap, i.e., in centroids, cases, and controls. 2. The population of z i and z j is less than, for example, 50 percent of the total population.
In this respect, the study region is divided into three disjoint areas, z i , z j , and G ∩ (z j , z i ) c . Suppose that the elements of an ordered triple (x 1 , m 1 , p 1 ) are respectively the number of cases, the population size, and the incidence probability in z i . The triples (x 2 , m 2 , p 2 ) and (x 3 , m 3 , p 3 ) are corresponding triples for sub-region z j and G ∩ (z j , z i ) c . The goal is to perform the following hypothesis test H 0 : p 1 = p 2 = p 3 vs. H 1 : p 1 > p 3 , p 2 > p 3 in at least for one sub-region z For any two-cluster candidates, x 1 , x 2 and x 3 , are observations of an independent Poisson distribution. Li et al. (2011) found the likelihood ratio to construct the scan statistic. Any two-cluster sub-regions which maximize the likelihood ration are called the Most Likely Two-Cluster (MLTC). After finding the MLTC, it is necessary to check whether or not the two-cluster area is significant as a spatial cluster. If it is significant, one should investigate which sub-region is the first suspected cluster and which one is the second suspected one. After detecting the second suspected sub-region, using a sequential method, its significance should be tested. To find the second MLC, it is not necessary to search the study region to detect a sub-region which has the second greatest likelihood ratio given no overlap with the MLC because the second most likely cluster is exactly the second suspected sub-region in the two-cluster method.
Others have also investigated this topic, e.g., Zhang, Assunção and Kulldorff (2010), who also discussed how to determine multiple spatial clusters. Jung, Kulldorff and Richard (2010) constructed a spatial scan statistic for multinomial data. Wu and Glaz (2015) suggested a new adaptive procedure for a multiple window scan statistic. Wan et al. (2012) used an ant colony optimization to detect multiple spatial clusters. This method was proposed only for regional spatial count data. The ant colony optimization multiple cluster detection method was compared with three other methods: genetic algorithm-based spatial cluster detection (Duczmal et al., 2007), circular scan (Kulldorff, 1997), and flexible shape spatial scan statistic (Tango and Takahashi, 2005). The ant colony method outperformed the competitors in simulations.
The above-mentioned methods, i.e., Zhang, Assunção and Kulldorff (2010) and Li et al. (2011), do not allow overlapping clusters. The forward stepwise and forward stagewise methods proposed by Xu and Gangnon (2016) are effective in detection of multiple overlapping spatial clusters.

Recent Developments of Scan Statistics
Recently, Li et al. (2019) considered two types of scales: the aggregation level of the input data and the population threshold used in the cluster detection. They stated that these scales can affect the results of cluster detection using scan statistics. This effect is called detection inconsistency. To show this, Li et al. (2019) considered different scale settings and used two measures (i.e., the distance between cluster centers and the Jaccard index (Jaccard, 1901)). They measured the constancy of the detected clusters and studied three levels of aggregation (county, town, and a 900m grid) and three population thresholds (10%, 25%, and 50%). Four main results were obtained: 1. For a strong cluster and in a place with the high population density, the method is not highly sensitive to the data aggregation level. For weak true clusters and/or for less populous areas, the detection results from the different scale settings can be inconsistent. 2. The method's sensitivity to the population threshold is determined by the actual size of the true cluster. 3. The results show the superiority of a regular grid with fine resolution over the subjectively defined areal units. 4. When the population threshold is not smaller than 50%, a county-level analysis may have good quality when the disease has a strong clustering pattern in a place with high population density. However, county-level data should not be used to detect small clusters and with small population thresholds.
Jung (2019) extended scan statistics to matched case-control data. Since the Bernoulli-based scan statistic (2.1) is for independent observations, and since the case and control measures within a matched pair are not independent, the Bernoulli-based scan statistic is not suitable for matched case-control data. Hence, Jung (2019) designed hypothesis tests based on the odds ratio and used McNemar's test statistic and the Wald-type test statistic to detect spatial clusters. Their simulation study showed that the proposed methods had higher power and higher accuracy to detect spatial clusters for matched case-control data than the Bernoulli-based spatial scan statistic. Ishioka et al. (2019) used the zero-suppressed binary decision diagram (Minato, 1993) to handle the large cardinality of the candidate class. This method can be compared with the method of AMST (Zhou, Shu and Su, 2015) in detecting irregularly shaped spatial clusters. Also, Desjardins, Hohl and Delmelle (2020) used a prospective space-time scan statistic to detect clusters of Covid-19 in the United States.
Since the Poisson-based spatial scan statistic detects larger clusters by absorbing insignificant neighbors with non-elevated risks, Lee and Jung (2019) suspected that the spatial scan statistic for ordinal data may also have similar undesirable outcomes. Hence, they applied a restricted likelihood ratio to the spatial scan statistic for ordinal outcome data to circumvent this problem. Through a simulation study, they demonstrated not only that original spatial scan statistic suffer from overdetection but also that their proposed methods have reasonable or better performance compared with the original methods.
For spatio-temporal count data with an excess of zeros, Allévius and Höhle (2019) proposed an unconditional space-time scan statistic. Moreover, a functional-model-adjusted spatial scan statistic was presented by Ahmed and Genin (2020). This new spatial scan statistic is designed to adjust cluster detection for longitudinal confounding factors indexed in space. This scan statistic was developed using generalized functional linear models in which the longitudinal confounding factors were considered to be functional covariates.
In many environmental applications, the response variables are spatially correlated. As an extension of the method proposed by , Lee, Sun and Chang (2020) proposed a mixed effect model for spatial cluster detection to take the spatial correlation into account. The method developed can identify multiple potentially overlapping clusters. Recently, Lee et al. (2021) introduced a varying coefficient regression method to detect spatiotemporal clusters. They extended the spatial-only varying coefficient regression model to the spatio-temporal setting including flexible temporal patterns. The method relies on the detection of a potential cylindrical cluster of the regression coefficients, and is based on testing whether the regression coefficient is the same or not over the entire spatial domain for each time point. Additionally, it can detect multiple clusters.

Software and Packages
Up to this point in the paper we have provided a broad overview of a variety of scan statistics. We believe it is also important to give a guide to users about the available implementation tools. Therefore, a short review of software and R packages that work with scan statistics is now presented.
A diversity of R packages also provide independent implementation of scan methods or introduce new scan statistics as alternatives. DCluster (Gómez-Rubio, Ferrándiz-Ferragud and Lopez-Quílez, 2005) implements the traditional scan (Kulldorff, 1997) with bootstrap and Gumbel alternatives to calculate the cluster significance. The AMOEBA package (Valles, 2014) includes a function to detect spatial clusters based on the Getis-Ord statistic. A Bayesian Dirichlet process for spatial clusters is available at PReMiuM (Liverani et al., 2015). A version of the scan statistic to detect clusters in social networks is available at SNscan (Wang, Hsu and Phoa, 2016). The package graphscan (Loche et al., 2016) implements the distribution free methods of Cucala (2008Cucala ( , 2009). The surveillance package (Meyer et al., 2017) implements many spacetime scan methods. ClustGeo (Chavent et al., 2017) implements hierarchical clustering with spatial constraints to create spatial partitions of a map. The scanstatistics package (Allévius, 2018) (https://github.com/BenjaK/ scanstatistics) implements a number of spatial (Poisson, negative binomial, zero-inflated Poisson), space-time and the negative binomial Bayesian scan statistics. SpatialEpi (Kim and Wakefield, 2018) has an implementation of the Bayesian cluster method from Wakefield and Kim (2013) and other traditional scan statistics. SpatialEpiApp (Moraga, 2017) is a package which provides a Shiny application for spatial and space-time scan statistics. Recently, the DClusterm package (Gómez-Rubio et al., 2019) implements a model-based approach using dummy variables in GLMs. A large variety of independent implementations of scan methods (e.g., Turnbull et al., 1989;Besag and Newell, 1991;Tango and Takahashi, 2005;Assunção et al., 2006;Kulldorff et al., 2006;Costa, Assunção and Kulldorff, 2012;Neill, 2012) are available in the smerc package (French, 2020a). Finally, for case-control data, the package smacpod (French, 2020b) has some spatial cluster methods.

Conclusion
In this paper, a detailed review of the development of scan statistics over the past three decades is presented. Scan statistics were initially proposed in the sixties and gained greater attention in the nineties with the advance of computational power. Our main goal here is to provide an up-to-date overview of scan statistics methods, their diversity of applications and some available software for practitioners.
The scan statistics method is based on the likelihood ratio test. The work of Kulldorff and Nagarwalla (1995) is the starting point for using scan statistics to find circular clusters. Subsequently, detecting non-circular clusters gained importance. The circular scan method was extended to detect non-circular clusters using MST (Assunção et al., 2006). This extension not only helps researchers in the detection of non-circular clusters but also increases the speed of cluster detection. AMST (Zhou, Shu and Su, 2015) is an alternative to the MST method of Assunção et al. (2006). This alternative made it possible to detect multiple non-circular clusters.
Monte Carlo hypothesis testing methods play a crucial role in scan statistics, but are time-consuming. With this in mind, Turnbull et al. (1989); Besag and Newell (1991); Soltani and Aboukhamseen (2015); Aboukhamseen, Soltani and Najafi (2016) presented some solutions to eliminate the Monte Carlo procedure in cluster detection.
The Poisson and binomial models are the most traditional models in the context of scan statistics. But these models are not appropriate for continuous data. Thus, the normal, multivariate normal and weighted normal scan statistics were introduced by researchers. Besides these models, the exponential scan statistic (Huang, Kulldorff and Gregorio, 2007) is constructed to deal with censored and uncensored survival data.
The inflation of zeros in a given dataset is another challenge in cluster detection. The need for zero-inflated models led to the construction of zero-inflated Poisson, zero-inflated binomial, and Bayesian beta-binomial scan statistics. The variety of these models tempted researchers to introduce nonparametric methods to detect spatial clusters. Cucala (2014) showed that the distribution-free scan statistic is equivalent to the Gaussian-based scan method presented by Kulldorff, Huang and Konty (2009). Another nonparametric method, based on the Wilcoxon test, was presented by Jung and Cho (2015).
Naturally, spatial scan statistics have evolved to handle spatio-temporal clusters by including the time component in the analysis. Pioneers in this topic were Knox and Bartlett (1964); their method evolved in many directions, as described by Mantel (1967); Kulldorff et al. (1998); Kulldorff (2001); Assunção et al. (2007); Assunção and Correa (2009);Veloso et al. (2017), among others. Applications of these methods are still relevant, e.g., to the detection of Covid-19 clusters in the United States (Desjardins, Hohl and Delmelle, 2020).
Available software and R packages that implement different types of scan statistics are presented (Section 13). This section provides a shortcut for practitioners who want to apply the methods discussed as well as researchers who want to compare new proposals with existing ones.
The supply of networks, graphs, and maps grows daily. Therefore, beyond the epidemiological applications, the topic of scan statistics is still very active in networks, trajectories and text streams. Recently, the issue of inference for networks, trajectories and text streams has gained attention. For example, detecting the busiest nodes in a network and using mobility information to detect sources of diseases or extract core knowledge from texts are tasks that can use scan statistics to provide adequate answers (Assunção, Souza and Prates, 2020).
With the increasing collection of data, researchers need to scan ever larger maps to detect spatial clusters. Although some incipient methods for scanning large maps are starting to appear (Assunção, Souza and Prates, 2020), we still see the need to extend scan methods to efficiently find spatial clusters in big maps. It seems that the use of parallel computation, Bayesian methods, and graph theory can help researchers to tackle this challenge. On the other hand, to the best of our knowledge, there is little work on detecting spatial clusters for proportional data (de Lima et al., 2016) and no parametric scan method to detect spatial clusters in the presence of heavy tailed and/or asymmetric spatial data. We believe that working with stable spatial models can be a way to improve scan statistics methods.