Asymptotic independence of correlation coefficients with application to testing hypothesis of independence

Abstract: This paper first proves that the sample based Pearson’s productmoment correlation coefficient and the quotient correlation coefficient are asymptotically independent, which is a very important property as it shows that these two correlation coefficients measure completely different dependencies between two random variables, and they can be very useful if they are simultaneously applied to data analysis. Motivated from this fact, the paper introduces a new way of combining these two sample based correlation coefficients into maximal strength measures of variable association. Second, the paper introduces a new marginal distribution transformation method which is based on a rank-preserving scale regeneration procedure, and is distribution free. In testing hypothesis of independence between two continuous random variables, the limiting distributions of the combined measures are shown to follow a max-linear of two independent χ random variables. The new measures as test statistics are compared with several existing tests. Theoretical results and simulation examples show that the new tests are clearly superior. In real data analysis, the paper proposes to incorporate nonlinear data transformation into the rank-preserving scale regeneration procedure, and a conditional expectation test procedure whose test statistic is shown to have a non-standard limit distribution. Data analysis results suggest that this new testing procedure can detect inherent dependencies in the data and could lead to a more meaningful decision making.


Introduction
The determination of independence between variables enables us to measure variable associations and draw accurate conclusions in a quantitative way.Testing independence is a durable statistical research topic, and applications are very broad.The seminal nonparametric test of independence by [16,3] has been widely used in many applications.The test (we shall use HBKR test throughout the paper) is based on Hoeffding's D-statistic which only uses ranks of the observed values.In the literature, [13] introduced the extremely useful Ztransformation function and test statistics, and [18] stated that the best presentday usage in dealing with correlation coefficients is based on [12,13].Recently, [14] showed that after Box-Cox transformation to approximate normal scales, the correlation based test is more efficient when testing independence between two positive random variables.Besides Fisher's Z-transformation test, many other testing procedures have been developed, for instance, testing independence in a two-way table ([9]), testing independence between two covariance stationary time series ( [17]), testing the independence assumption in linear models ( [5]), and others such as [1,10,29], as well as excellent reference books by [6,7,21] and [27], among others.
When alternatives are either linearly dependent or nonlinearly dependent, although much work has been done, there is no public consensus on which method is the best.Empirical evidences have shown that the gamma test ( [30]) performs much better than Fisher's Z-transformation test when testing independence while the alternatives are nonlinear dependence.There are certainly other test statistics which can also achieve high detecting powers.In problems for testing independence, it is not hard to find examples such that one test performs better than another test, and vice versa.For example, Pearson's correlation coefficient based test statistics would have the best performance when the alternative hypothesis is two random variables being linearly dependent, especially when they are bivariate normal random variables.
Correlation is an extremely important and widely used analytical tool in statistical data analysis.The classical correlation coefficient is Pearson's product moment coefficient which indicates the strength and direction of a linear relationship between two random variables.Its history can be traced back to the 19th century when Sir Francis Galton introduced correlation and regression, while Karl Pearson provided the mathematical framework we are familiar with today.Since then, various (linear or nonlinear) correlation measures (coefficients) have been introduced in statistical inferences and applications.Examples include Spearman's rank correlation coefficient ( [24]), Kendall tau rank correlation coefficient ( [22]), the distance correlation coefficient ( [26]), the quotient correlation coefficient ( [30]), among many others.In the literature, there have been attentions on other dependence measures such as positive dependence, co-monotone dependence (or co-movement measures), negative dependence and setwise dependence.[11,20] are excellent reference books related to correlations and multivariate dependencies like aforementioned ones.These dependence measures will not be considered in this paper.
An effective combination of correlation measures may be more informative and useful in measuring association (strength) of the relationship, or in testing independence, between two random variables.In some applications, one correlation measure may lead to a better interpretation of data than another correlation measure does, and vice versa.As a result, it is natural to consider a combination of as many measures as possible to get a maximal strength measure of the relationship between two variables.This strategy may be too general to apply since a limiting distribution of the combined measure may be difficult to derive, and hence it is not applicable in testing hypothesis of independence.One may argue that a randomized permutation test can overcome this drawback.However, a randomized permutation test does not guarantee a better performance.Our simulation examples of combining Pearson's correlation coefficient, Spearman's rank correlation coefficient, Kendall tau rank correlation coefficient demonstrate that combining more than two measures into a maximal strength measure does not result in a better performance.The performance really depends on which measures are included in the combination.Ideally, one would want a combined measure which uses as few (say ≥ 2) measures as possible and achieves relatively better performances when compared with commonly used measures.
Considering that Pearson's correlation coefficient and the quotient correlation coefficient are asymptotically independent, i.e., they measure completely different dependence between two random variables, we propose to first combine these two sample based correlation coefficients into a new association measure of the relationship between two continuous random variables.On the other hand, Pearson's correlation coefficient may be the best linear dependence measure to model dependence in the central region of the data, and the quotient correlation is a nonlinear dependence measure to describe dependence in extreme values.These facts indicate that a combination of these two coefficients will reveal more inherent dependence between random variables.Due to the fact that the sample based Pearson's correlation coefficient is √ n convergent, while the quotient correlation coefficient is n convergent under the null hypothesis of independence, the new associate measure is defined as the maximum of the squared root of the quotient correlation coefficient and the absolute value of Pearson's correlation coefficient.
A combination of Pearson's correlation coefficient and the quotient correlation coefficient introduces a new dependence measure which leads to a new test statistic for testing independence.We note that it is possible to find some unusual examples from which both calculated coefficients are zero.In order to overcome this drawback and to make a test statistic capable of testing every alternative hypothesis, at least as broad as HBKR test, we include Hoeffding's D measure in our combination.We shall see that the magnitude of D statistic is almost negligible in many examples.The asymptotic distribution of the new test statistic follows a max-linear of two χ 2 random variables with 1 and 4 degrees of freedom respectively.This new test statistic performs better than existing tests.For example, it performs as good as (or better than) existing tests when the alternatives are linear dependence and guarantees the same detecting powers as the gamma test when alternatives are nonlinear dependence.We shall see that the newly proposed test statistic is constructed to sufficiently utilize and maximize detecting powers of each component coefficient which may be used as a test statistic alone, not just simply achieving a new test statistic which outperforms a linear correlation based test statistic.
The rest of the paper is organized as follows.Main theoretical results dealing with asymptotic independence are presented in Sections 2. Section 3 introduces the new combined maximal strength measure.The limiting distribution of the new combined correlation coefficient under the null hypothesis of two random variables being independent is derived, and then a new max linear χ 2 test statistic is introduced.In Section 4, we first derive a new rank-preserving scale regeneration marginal transformation procedure using simulation which is distribution free.The transformation procedure gives advantages in applying ex-treme value distributions.The limiting distribution of the test statistic defined in Section 3 is again shown to be max linear χ 2 distributed when the test statistic is based on the transformed data.We further propose a new testing procedure, i.e. a rank-preserving scale regeneration and conditional expectation (R.A.C.E) test, from a practical perspective.The asymptotic distribution of R.A.C.E test statistics is derived.Type I errors of the new test are illustrated.In Section 5, power comparisons in simulation examples are presented for bivariate random variables.In Section 6, we first extend our combined dependence measure to a more general form and illustrate methods of data processing and variable scale transformations.We apply the newly introduced max linear χ 2 test in testing independencies among a group of random variables which are main index variables related to cigarette taxes, cigarette sales, revenues, smoking rates, etc. Section 7 offers some remarks regarding application of our new results to statistical data analysis.Finally, some technical proofs are summarized in Section 8.

The quotient correlation defined by two extreme order statistics
Suppose X and Y are identically distributed positive random variables satisfying P(X ≥ Y ) > 0, P(X ≤ Y ) > 0. Then the quotients between X and Y are Y /X and X/Y .Like the difference X − Y , the quotients can be used to measure the relative positions between two variables.

−→ ∞ and max
−→ ∞ as n → ∞.When X and Y are neither perfectly dependent nor independent, it is easy to see that both (max i≤n {X i /Y i }) and (max i≤n {Y i /X i }) are asymptotically greater than 1 as n → ∞, or we say that (max i≤n {X i /Y i }) and (max i≤n {Y i /X i }) are very likely to fall in (1,∞) for sufficiently large n.
These relations clearly tell that max i≤n {Y i /X i } and max i≤n {X i /Y i } can be used in defining a dependence measure between two random variables.When max i≤n {Y i /X i } and max i≤n {X i /Y i } are close to 1 for any given large sample size, we can say that there are co-movements between {X i } and {Y i }, i.e. there are strong correlation between X and Y .When max i≤n {Y i /X i } or max i≤n {X i /Y i } is close to infinity or relatively very large, we can say that X and Y are near independent.
Following the classical way of defining a correlation coefficient in a range of (0,1), [30] introduced the sample based quotient correlation coefficient: Properties and geometric interpretation of (2.1) are illustrated in [30].[30] also showed that nq n L −→ ζ (a gamma random variable), assuming that X and Y are independent unit Fréchet random variables, i.e.F X (x) = F Y (x) = e −1/x , x > 0. For practical convenience, we make a convention that the marginal distributions of X and Y in (2.1) are transformed to be unit Fréchet distributed.Under this convention, quotient correlation coefficient can be applied to continuous random variables, but not discrete random variables.
One of the most usages of the quotient correlation coefficient is to test the hypothesis of independence between two random variables X and Y using the gamma test statistic nq n .The testing hypotheses are H 0 : X and Y are independent versus H 1 : X and Y are dependent.(2.2) We shall see, for example, in a later section that the gamma test statistic nq n does not achieve the best detecting power given that the alternative hypothesis is bivariate normal.The reason is that q n measures the dependence among the tail part (extreme values) of the data, so it may lose its detecting power if the dependence mainly comes from the center part of the data.We note that a more efficient test statistic related to q n for testing tail independence is the tail quotient correlation coefficient.Our focus in this paper is testing independence, not only testing tail independence.
As discussed in the introduction section, our goal is to find a test statistic which works for every alternative hypothesis of dependence and leads to a better performance than the existing test statistics.In the following sub-sections, we study asymptotic independence of order statistics, and asymptotic independence of sample correlations coefficients, which is very important in constructing a new maximal strength dependence measure and a powerful test statistic.

Asymptotic independence of order statistics and extended definition of quotient correlation coefficients
Now assume that {X i , Y i }, i ≥ 1 are unit Fréchet distributed random variables.
be their order statistics.We extend (2.1) to a general class of quotient correlation coefficients based on the (n − i + 1)th largest ratio and the jth smallest ratio: We note that the asymptotic independence of order statistics was first established in [19].Their results are very general.In Section 8, we re-state the asymptotic independence of order statistics in two lemmas (Lemmas 8.1 and 8.2) in which certain specific marginal distributions are assumed, and the limit distributions of the order statistics are characterized.Then for any fixed (i,j), from Lemma 8.2, the limiting distribution of nq n (i, j) can be obtained.We state the results in the following theorem whose proof is straightforward using Lemmas 8.1 and 8.2.
Theorem 2.1.Suppose {X i , Y i }, i ≥ 1 are iid unit Fréchet distributed random variables.Then nq n (i, j) where ζ is a random variable with the gamma(i + j,1) density given by ((i This theorem is very useful in constructing a robust statistical measure based on observed values.For example, we can also consider the limiting distribution for any weighted sum of q n (i, j) for 1 ≤ i, j ≤ k, where k ≥ 1 is a fixed integer.However, this is beyond the scope of the current paper, and we shall consider the choice of i and j and the weights as a future research project.Throughout the paper, we will deal with the case of k = 1, and the results can be generated to cases with an arbitrary k.

Asymptotic independence of Pearson's correlation coefficient and the quotient correlation coefficient
We first prove that under some mild conditions the sample mean vector of a random sample of a random vector ξ is asymptotically independent of a sample vector of componentwise maxima of a random vector η which is dependent of ξ.
as n → ∞, where Φ is the standard multivariate normal distribution function, p and q are fixed integer numbers.
Theorem 2.3.With the established notations, suppose ξ and η satisfy the conditions in Lemma 2.2.Then (2.6) as n → ∞, where r is Pearson correlation coefficient of ξ 1 and ξ 2 , and and Φ is the standard normal distribution function.Furthermore, when ξ 1 and ξ 2 are independent, and denote

1). Then we have
with F g (x) is a gamma(2,1) distribution function.

Integrating two correlation coefficients
The asymptotic independence of r n (the same as r (ξ1,ξ2) n throughout the paper) and q n reveals that these two sample based correlation coefficients measure completely different dependencies between two random variables.We can see that the most popular sample based Pearson's correlation coefficient defined by (2.5) is mainly determined by the center part of data.It is an antithetical and complementary measure of the quotient correlation, which is a very important and desirable property to satisfy when one combines measures together.A natural choice of the combination is simply taking the maximum of the two correlation coefficients.We note that Pearson's correlation coefficient requires each marginal distribution having a finite second moment.Hence we may need to perform scale transformation in calculating r n .In the rest of the paper, we will first perform marginal transformation of {(X i , Y i ), i = 1, . . ., n} to (i) unit Fréchet scale when q n is implemented; (ii) pre-specified marginal with finite second moment, for example, a standard normal distribution when r n is implemented; Such transformation is straightforward if the marginal distributions of X and Y are known.In Section 4, a distribution free scale transformation procedure is proposed to deal with the case when the marginal distributions are unknown.

How to combine two correlation coefficients: A view from testing hypotheses
Suppose two unit Fréchet random variables X and Y satisfy the following properties: Then q n can be used to test the following hypotheses: and r n can be used to test the following hypotheses: Notice that H 0 of (2.2) implies both H λ 0 and H ρ 0 , but not reversely.There is no direct connection between H λ 0 and H ρ 0 .It is not known yet that whether H λ 0 can imply H 0 .Nevertheless, if we simply use (3.3) or (3.4) to make inferences about (2.2), the conclusion may not be optimal.A better inference problem is to test This motivates us to consider a maximal variable association measure by just simply taking the maximum of two measures.We note that this idea is not something new at all in the literature.One may argue that if we can take the maximum of two measures, why not taking the maximum of as many measures as possible.Theoretically, when we deal with sample based measures, taking the maximum of many measures is not a good strategy since the limiting distribution under H c 0 can not be characterized even with the maximum of two coefficients if they are not appropriately chosen, not to mention the derivations under the alternatives.We aim to develop such a combination under which a limit distribution is characterized under H 0 , and a limit distribution and asymptotic powers are also derived under local alternatives.
We also note that (3.3) is only one of applications of q n , and we shall not consider that q n is derived for a family of distributions satisfying (3.1) only.

The maximum of two transformed and asymptotically independent correlation coefficients
Under H 0 of (2.2), the quotient coefficient q n is shown to be n convergence ( [30]), and it is well known that r n is squared root n convergence.Therefore, q n and |r n | are not at the same scales, and q n is more likely to be smaller than |r n | for a finite sample size.If we simply use the maximum of these two coefficients as a new test statistic, q n may not be useful at all.To overcome this impediment, we define a new combined variable association measure: When two random variables follow a bivariate normal distribution with the correlation coefficient |ρ| < 1, we have C n tending to |ρ| almost surely.In many cases of two random variables being linearly dependent, we shall have C n tending to linear correlation coefficient |ρ| almost surely, i.e. for sufficiently large n, C n equals |r n |.In the cases of ρ = 0, the limit of C n is the same as the limit of q 1/2 n .In the case of co-monotone or positive dependent, the limit of C n can be either from q 1/2 n or from |r n | depending on which one is in dominance.Therefore, for a finite sample size n, C n can be regarded as a nonlinear dependence measure.Here we shall regard 'linear' as a special case of 'nonlinear'.An extended and practically useful nonlinear dependence measure is defined by (6.1) in Section 6.1 and is used in our real data analysis.
Notice that in the definition of the new association measure, we have chosen not to perform a power transformation of Pearson's correlation coefficient r n , i.e. not to use |r n | 2 .The reason for this construction is two folds.First, a linear correlation is easy to implement and interpret, and it serves as the first association measure to compute in practice.It is a bench mark in checking variable associations.Second, there does not exist a commonly used nonlinear dependence measure which is as popular as Pearson's correlation coefficient.

The limiting distributions of C n under H 0
Under H 0 defined in (2.2), we have the following proposition: Proposition 3.1.Under H 0 , the limiting distributions of C n have the following forms: where χ 2 1 and χ 2 4 are two independent χ 2 random variables with 1 and 4 degrees of freedom, respectively.We denote this limit random variable as χ 2 max .In Section 4 and the Appendix section, we will show that the limiting distribution of q n is the same as a χ 2 4 when q n is derived based on nonparametric marginal transformations.This is an important property of the quotient correlation coefficient.In this paper, we propose a test statistic mainly based on Proposition 3.1, which controls empirical Type I error probabilities being less than their pre-specified nominal levels and still gives high detecting powers in testing independence between two dependent random variables.
The limiting distribution of nC 2 n leads to the following testing procedure (a new max linear χ 2 test): 2) is rejected; otherwise it is retained.Here χ 2 max,α is the upper α percentile of the random variable max(χ 2 1 , 1 2 χ 2 4 ).We note that the new max linear χ 2 test achieves asymptotic power 100% when two random variables under the alternative hypothesis are dependent and either r n or q n has a nonzero limit.When both r 2 n and q n tend to zero with a slower rate than n, the new max linear χ 2 test still achieves asymptotic power 100%.
It is worth noting that for example q n is √ n convergence when X i and Y i follow a bivariate Gumbel copula distribution, see [28] for the derivation, and hence the max linear χ 2 test will have asymptotic power one under the alternative hypothesis of Gumbel copula dependence.

Asymptotic powers under local alternatives
It can easily be shown that under any fixed alternative hypothesis H λ 1 with λ = λ 1 > 0, nq n is an asymptotic power one test, and hence nC 2 n is also an asymptotic power one test under any fixed alternative hypothesis H c 1 .We now study the asymptotic powers under a class of local alternatives: Note that the distribution function of random variable max(χ 2 1 , 1 2 χ 2 4 ) is given by where Φ is the standard normal distribution function.Denote σ in Theorem 2.3 under H 0 and H cLn 1 as σ 0 and σ n respectively, and let u α = χ 2 max,α which is the α level critical value of F max (x).Assuming nq n under the alternative has the same limit as it has under H 0 , then under H cLn 1 , the asymptotic power is It is easy to see that if we fix the cutoff value as u α , then the asymptotic power of using test statistic nr 2 n alone is 1 while the asymptotic power of using test statistic nq n alone is 1 − (1 − e −uα − 1 2 u α e −uα ).As a result, the combination of the two test statistics increases the asymptotic detecting power.

Incorporating Hoeffding's D statistic
There are some unusual situations that both r 2 n and q n tend to zeros with the rate n, and hence nC 2 n as a test statistic does not have detecting power under such unusual situations of alternative hypothesis of dependence.We propose to include Hoeffding's D statistic in our proposed maximal dependence measure as: where δ is a small positive number, say δ = 10 −16 , and D is defined as follows ( [16]).Let (X 1 , Y 1 ), . . ., (X n , Y n ) be a random sample from a population with the d.f.F (x, y), n ≥ 5, and let where Σ ′′ denotes summation over all α such that where I (x≥y) is an indicator function.It is shown in [16] that − 1 60 ≤ D n ≤ 1 30 .We have P (χ 2 max < 1/30) = 7.8716 × 10 −5 , which tells that under H 0 , the value of D is almost negligible.Under H 0 and with δ > 0, n|D n | 1+δ converges to 0 in probability, and hence C δ n has the same limit distribution as C n has.We still call nC δ n as the new max linear χ 2 test statistic.Under the alternative hypothesis, D has a positive limit, and hence nC δ n is an asymptotic power one test.The use of D as a nonparametric test (HBKR test) of independence has been well-known due to [16,3].The HBKR test has been embedded in many statistical softwares such as SAS and R. We note that D statistic uses only rank information of the data, and the magnitudes of the data are ignored.As a result, D statistic may not be as powerful as a statistic which uses both ranks and magnitudes.We aim to develop such a test statistic, and our simulation results show that nC δ n has a better performance, and the computation of nC δ n is very simple and fast.

Marginal transformation via rank-preserving scale regeneration
In the previous discussions, we assume that {(X i , Y i ), i = 1, . . ., n} can be transformed to unit Fréchet scale and any other margins with finite second moment.Such a transformation is straightforward if the marginal distributions of X and Y are known.In practical applications, however, we often face the case that the marginal distributions are unknown.In this section, we propose a rankpreserving scale regeneration marginal transformation procedure to overcome this impediment.We first generate two unit Fréchet samples of size n and denote as the ordered samples respectively, and the rank-preserving scale regeneration marginal transformation procedure results in the following transformed data Note that with this procedure, the ranks of the original data are preserved, and the transformed scales have the desired distribution.
The rank-preserving scale regeneration quotient correlation is defined as One immediate advantage of this definition is that the transformation does not depend on any correction term (for example 1/n as in an empirical distribution function based transformation).The transformation procedure gives advantages in applying extreme value theory.The following Lemma 4.1 and Theorem 4.2 are evidences.We note that the simulation procedure here is similar to the procedure in [30] which uses one simulated sample in two populations X and Y .Here we simulate one sample for each population.We have the following important lemma whose proof is given in Section 8.
Lemma 4.1.Suppose X and Y are independent absolutely continuous random variables (not necessarily being unit Fréchet), {(X i , Y i ), i = 1, . . ., n} is a random sample from (X, Y ), and {( as the ordered samples of {( Then {(X * i , Y * i ), i = 1, . . ., n} is a random sample from (X, Y ).Theorem 4.2.Suppose X and Y are independent continuous random variables (not necessarily being unit Fréchet), and 4 , as a n /n → 1, n → ∞, where χ 2 4 is a χ 2 random variable with 4 degrees of freedom.
A rigorous proof of Theorem 4.2 is given in Section 8. We note that Theorem 3.2 in [30] was proved under a stronger condition than necessary.The proof of the above theorem does not need to satisfy that condition.
We find that the form (4.2) is easy to implement and the simulation results are very close to q n which is based on a parametric transformation.Theorem 4.2 delivers the following testing procedure: 2) is rejected; otherwise it is retained.Here r R n is computed using (2.5) based on the transformed data (X * i , Y * i ), i = 1, 2, . . ., n in (4.1) and their corresponding marginally transformed normal values.
One can immediately see that Type I error probabilities of the above defined test are the same as the testing procedure introduced in Section 3.3.One can also see that asymptotic powers of the above defined test under the alternative hypothesis are equal to asymptotic powers of the testing procedure introduced in Section 3.3.

The rank-preserving scale regeneration and conditional expectation test
When the procedure (4.1) is used, one may argue that a different variable transformation may result in a different conclusion since the test is based on simulated unit Fréchet random variates.This issue can be resolved by performing a large number of max linear χ 2 tests and making a convincing conclusion based on rates of rejecting the null hypothesis.Our testing procedure is: • For a given random sample {(X i , Y i ), i = 1, . . ., n} of (X, Y ), we repeat (4.1) N times, i.e. for each j ∈ {1, . . ., N }, we generate two unit Fréchet samples of size n and denote as the ordered samples respectively, and the rank-preserving scale regeneration marginal transformation procedure results in the following transformed data ) , i = 1, . . ., n, j = 1, . . ., N. (4.3) For each j, the correlation coefficient r R nj and the simulation based quotient q R nj are computed using (2.7) and (4.2) based on the transformed data (X i,j , Y i,j ), i = 1, 2, . . ., n, j = 1, . . ., N , and their corresponding marginally transformed normal values.Define p-values P • The decision rule is: P R nj < pα , we reject the null hypothesis of independence.The cut-off value pα is derived from an approximation theorem presented below.Theorem 4.3.With the established notations, suppose {(X i , Y i ), i = 1, . . ., n} is a random sample of (X, Y ) with marginals being unit Fréchet, q n and r n are defined in (2.1) and (2.5) respectively.Then where G is the σ-algebra generated by the random vectors (X j , Y j ), j ≥ 1.
We want P (T N < pα ) ≤ α, or approximately P {E( P R n1 |G) < pα } ≤ α.The distribution of E( P R n1 |G) is generally unknown but is distribution-free under the null hypothesis.We obtain the cut-off value of pα using Monte Carlo method in which we obtain 50,000 p-values.We present the following table for a set of α values which may satisfy the need of the most applications.

α
. In practice, we compare the average p-value T N with the cut-off values pα and then make inferences.Or equivalently, we can compare p = T N /(p α /α) with α directly.In this case, we call p a rank-preserving scale regeneration and conditional expectation (R.A.C.E) test p-value.We argue that this method should provide more information in testing the hypothesis of independence and make a better and accurate conclusion.

An integrated testing procedure
We now summarize our proposed test procedures into an integrated one: • Specify the significant level α, and δ = 10 −16 ; • When marginal distributions are known, -Transform data to unit Fréchet scales and normal scales respectively, and compute q n using (2.1) and r n using (2.5); 2) is rejected; otherwise it is retained.• When marginal distributions are unknown, -Generate two unit Fréchet samples of size n and denote Z as the ordered samples respectively; -Compute q R n using (4.2) and r R n based on marginally transformed normal data; -We repeat the above two steps N times, denote q R n and r R n as q R nj and r R nj , j = 1, . . ., N respectively, and then compute P R nj < pα , we reject the null hypothesis of independence.The cut-off value pα is derived from Theorem 4.3.
Empirical Type I error probabilities based on the above procedure and finite sample sizes are illustrated in next section.Empirical detecting powers of simulation examples will be given in Section 5.

Empirical Type I error probabilities
The empirical Type I errors at levels .1 and .05are demonstrated in Figure 1.Simulation sample sizes are n = 25, 26, . . ., 49, 50, 55, 60, . . ., 95, 100, 110, 120, . . ., 230, 240.For each fixed sample size, we run the new χ 2 test 1000 times, then compute the empirical Type I error proportions.Simulations are implemented in Matlab 7.5 installed in Red Hat Enterprise Linux 5.One can see that the proposed χ 2 test controls Type I error probabilities within their nominal levels.The overall average empirical test level on the left panel is .0854,and the overall average empirical test level on the right panel is .0395.They are overall slightly conservative Type I errors.

Simulation examples
In this section, we use the following simulation procedure: (1) Simulate a bivariate random sample from a pre-specified joint distribution using Matlab 7.5 installed in Red Hat Enterprise Linux 5. (2) For each simulated univariate sequence, we use the procedure For comparison purpose, we also conduct Fisher's Z-transformation test, the gamma test introduced in [30].
In Figures 2, 3 It is clear that when data is drawn from bivariate normal, arguably Fisher's Z-transformation test should perform better than any other tests which are not constructed from the correlation coefficients.We see that the new max linear χ 2 test in this example performs as good as Fisher's Z-transformation test, the R.A.C.E test and HBKR test also perform as good as Fisher's Z-transformation test.Their corresponding power curves are complected with each other.In this example, we see that the gamma test is not as powerful as the other four tests.In other words, under the alternative hypothesis of dependence the combined test statistic (the max linear new χ 2 test statistic) boosts the power of the gamma test to as high as the most powerful test of Fisher's Z-transformation test.From Figure 3, we see that the most powerful test is the gamma test in this example, and the performance of the new max linear χ 2 test is as good as the gamma test.Notice that the dependence between X and Y in this example is in extreme values, as a result the gamma test should have the best performance, see [30].In other words, under the alternative hypothesis of dependence the combined test statistic (the new max linear χ 2 test statistic) boosts the power of Fisher's Z-transformation test statistic to as high as the most powerful test of the gamma test.Also notice that the R.A.C.E test was not as powerful as the gamma test and the new max linear χ 2 test in this example.This phenomenon suggests that when we transform data using (4.1), the original ranks of the data and how the data depends on each other affect the detecting powers.Nevertheless, it is still an acceptable approach in practice.We also make an important note here.One may concern that rank transformations kill extreme values and dependencies in data.This example shows that our proposed rank preserving simulation transformation retains dependencies in data, which is very important in practice.
In the next example, we demonstrate that the new max linear χ 2 test and the R.A.C.E test gain higher detecting power than the gamma test does.Due to the fact that the moments do not exist in this example, we note that many ordinary statistics may not be applicable to this example without performing certain variable transformation procedures such as a Box-Cox transformation or our newly proposed rank-preserving scale regeneration transformation.From Figure 4, under the alternative hypothesis of dependence the combined test statistic (the max linear χ 2 test statistic) boosts the powers of the gamma test to higher levels.Note that Fisher's Z-transformation test is not illustrated here since it is not directly applicable to original data.
These three examples clearly suggest that the new max linear χ 2 test is superior.We have extensively applied the new max linear χ 2 test to many other simulation examples and observed that the new χ 2 test outperforms existing tests.Based on these test results, it may be safe to say that the new max linear χ 2 test has a better performance in testing independence between two random variables.On the other hand, our newly proposed R.A.C.E test procedure also achieves high detecting powers in our simulated examples.In practice, the R.A.C.E test seems to be a more acceptable approach as it also suggests how the null hypothesis of independence is rejected or retained.
In the next section, we conduct a real data of tobacco markets analysis.

Real data analysis
We first review some basic data processing and variable transformation procedures.They are applied to our second new test procedure.

Measuring the maximum association: A practical perspective
In real data analysis, data are often transformed before computing various statistical measures.For example, in science and engineering, a log-log graph or a log-log correlation is often used in data analysis.In the data processing, there exist many transformation procedures in the literature, which satisfy different purposes.Popular transformations including Box-Cox transformation, logarithm transformation, absolute value transformation, sign transformation, etc.It is common to consider more than one transformation in a particular study.For this purpose, let F be a collection of candidate transformations, we extend (3.6) to be where q n (g x , g y and rank[g y (Y i )] is defined similarly.|r n (g x , g y )| is calculated using (2.5) after transforming (X * i , Y * i ) into normal scales by distribution transformation, so is D(g x , g y ).
We regard (6.1) as the F maximum association measure between two random variables.Of course, the choice of F is a practical issue, and one can choose transformation functions for any special purposes.In this paper, we consider the following family of six transformations which has very good performances in most of our real data analysis.(i) X where a k is the kth percentile of |X|.
Remark 1.When X is a positive random variable, (iii) is the same as (i), and (iv) is the same as (ii).The purpose of (ii) is to study the negative dependence between two random variables X and Y through (X(i), Y (ii)), or (X(ii), Y (i)).The purpose of (iii) is to study the dependence between values near zero in X (or respectively Y ) and values in any part in Y (or respectively in X).The purpose of (v) is to study the dependence between values near the pth percentile of |X| (or respectively of |Y |) and values in any other part in Y (or respectively in X).The rest of transformations can be interpreted accordingly.We have total 36 combinations of transformations, i.e. (X(s), Y (t)), s, t, = i, ii, iii, iv, v, vi.One can find an optimal choice of k.In our real data analysis, we will choose k from two sets of numbers {.05, .20,.50,.80,.95}and {.10, .25,.30,.45,.60,.75}.
In the next subsection, we use the variable transformations (i), (ii), (v), and (vi) with the choice of k from each of the above two sets of numbers.The number of combinations of forming new bivariate transformed sample in {(X(s), Y (t)), s, t = i, ii, v, vi} for each pair is 324.
Under the above variable transformations (or other F), our refined R.A.C.E testing procedure is:  5.
From Figure 5, we can see that only cigarette tax per pack and retail price per pack with all taxes show a clear strong linear correlation (also positive dependence and co-monotone as well).All other subplots either suggest a nonlinear correlation or no correlation.Calculated kurtosis and skewness for each variable  In this analysis, one of our primary goals is to demonstrate how to use the newly combined measure, and the new χ 2 test.We still calculate correlation coefficients of variables and perform Fisher's Z-transformation test by using original data for comparison purpose.
In Table 1 (upper triangle part), we applied Fisher-Z transformation test to all pairs.We see that there are two pairs which may suggest independence or no correlation.This conclusion may be a misleading.For example, it may not be appropriate to suggest cigarette pack sales and Youth smoking rate are uncorrelated.One would argue that the higher the Youth smoking rate, the more sales.This is a clear indication that we need a more powerful test for testing independence between two random variables.

Testing hypothesis of independence via R.A.C.E test
For each combination of (X(s), Y (t)), we repeat (4.1) 100 times, and hence we get 100 p-values.For each paired random variables, we keep the combination which has the smallest p-value, i.e., the rank-preserving scale regeneration and conditional expectation p-value.Test results are reported in Tables 2 and 3.
In Table 2, [(s), (t)] stands for the kept combination, and two numbers in the bracket below the kept combination stand for kth percentiles used in transformations.For example, in the cell of Tax and Sales, observations in Tax are divided by its 75th sample percentile, then taking the logarithm transformation, and finally taking the absolute value transformation, while observations in Sales are divided by its 75th sample percentile, then taking the logarithm transformation, the absolute value transformation, and finally taking a negative sign transformation.In the cell of Price and Costs, observations in Price are kept as their original values.In the cell of Revenue and Youth, observations in Revenue are taken a negative sign transformation.The numbers in rest cells can be interpreted similarly.We note that Table 2 also tells the directions of the dependencies.For example, Tax and Sales are negatively correlated.When Tax is at the 75th (sample) percentile or higher, Sales is more likely close to the 25th (sample) percentile.We report T N -values in Table 3 (upper triangle part) and their converted pseudo p-values ( p = T N /3.3260) (lower triangle part), see the table in Section 4.2 regarding the factor number 3.3260.From the table and taking into account of all possible transformations, we see that the T N -value between Sales and Youth is 0.0510, while the T N -value between Revenue and Adult is 0.1139.These are two largest T N -values in the analysis, and they are less than 0.1663.
Table 3 clearly gives more information on how a null hypothesis is rejected.From this table, it may be safe to conclude that all paired variables are dependent using the cut-off pα = 0.1663 when α = .05.We can see that it may be problematic for conclusions based on Table 1 as some variables have heavy tails.Especially using the new tests, we found that Sales and Costs are dependent, and Revenue and Adult are also dependent.
Our analysis results can certainly be useful in guiding social economic researchers for further studying correlations among those variables and influencing their decision making.We argue that this kind of data analysis is very impor-tant in social economic studies.With powerful statistical tools, analysis results can be influential in a sequence of political decisions, and economic decisions as well.However, this task is beyond the scope of the current paper.

Conclusions
In general, it is not easy to derive the limiting distribution for the maxima of several measures.Combining measures with the same or similar functional properties is also not desirable as they do not give much gains in measuring variable association strengths.In our combined measures, two candidate measures behave completely different.In the study of the combined measures, we are able to not only derive the limiting distributions, but also provide normalizing constants, which are especially useful for finite sample sizes.One can see that from our simulation examples, the new max linear χ 2 test at least achieves local optimality in testing hypothesis of independence.The newly combined dependence measures (6.1) can certainly be applied to many statistical applications in clustering analysis, classifications, causality studies, just to name a few.It may be safe to say that the newly combined coefficients can be used in any study which may need to calculate correlation coefficients.In the cases where a random variable does not have a finite second moment and a correlation coefficient is not meaningful, the new measure is still meaningful.The new χ 2 test can easily be implemented in any statistical software packages.We note that the new χ 2 test performs much better than some existing tests when sample sizes are small, which may be very useful when it is used in sample size calculations.
The methodology introduced in Section 4 can be used in many applications.Lemma 4.1 and Theorem 4.3 deliver very general results which may lead to new developments of statistical research topics.As illustrated in our simulation examples and real data analysis, our refined testing procedure in Subsection 6.1 may give readers a better understanding and a clearer picture how the variables are dependent of each other and how the null hypothesis of independence is rejected.

Appendix
The limit results of the following two lemmas regarding the first k smallest order statistics and the first k largest order statistics will be used in the proof of Theorem 2.1.From (8.3), to show the asymptotic independence of W n and W ′ n , it suffices to prove the asymptotic independence of V n and V ′ n .To this end, set It follows from (8.4) that V n and V ′′ n are independent.Since the asymptotic independence of V n and V ′ n is proved.This completes the proof of the lemma.

Fig 1 .
Fig 1. Empirical Type I error probabilities.The left panel is for a test with a significant level α = .1,and the right panel is for a test with a significant level α = .05.Simulations are implemented in Matlab 7.5 installed in Red Hat Enterprise Linux 5.
(4.1) to transform the simulated data into unit Fréchet scales.(3) Use distribution functions to convert the transformed unit Fréchet values into normal scales.(4) Use these transformed values to perform the new integrated tests from Section 4.3.

, 4 ,Fig 2 .Example 5 . 1 .
Fig 2.  Comparisons among the gamma test, Fisher's Z-transformation test (see[13,30] for the definition of the test), the HBKR test, the new max linear χ 2 test, and the new R.A.C.E test.The left panel is for tests with sample sizes of 25.The right panel is for tests with sample size of 50.

Example 5 . 3 .
In this example, we simulate bivariate random samples from (|X|, |Y |), where Y = X * E + W , and X is a standard Cauchy random variable, E and W are independent standard normal random variables.Sample sizes are 25,26,. . .,240.Empirical powers for all tests are plotted in Figure 4.

Fig 4 .
Fig 4. Empirical powers of three tests.The significant levels in the left panel and in the right panel are α = .1 and α = .05respectively.

Fig 5 .
Fig 5. Scatter plots for seven tobacco market variables.
[25]here s, t ∈ {i, ii, iii, iv, v, vi}, we repeat (4.1) N times, and hence we get N p-values.After taking the average T N of the N p-values, we get the rank-preserving scale regeneration and conditional expectation (R.A.C.E) p-value p = T N /(p α /α), where cut-off value pα is given in Section 4.2.Then we keep the combination which has the smallest p-value.•If the smallest p value is less than α, we reject the null hypothesis of independence.Cigarettes kill about 4.8 million people every year around the world.This is expected to rise to 10 million per year by 2020."[25].Such situation makes it a significant public health problem, and a social economic problem as well.Here, we are interested in exploring correlation measures among seven main tobacco consumption related variables.They are cigarette tax per pack in US dollar, FY 2007 cigarette pack sales in 100 millions, FY 2007 cigarette tax revenue in 100 million US dollars, retail price per pack with all taxes in US dollar, CDC state smoking costs per pack sold in US dollar, youth smoking rate, adult smoking rate.These variables are denoted as Tax, Sales, Revenue, Price, Costs, Youth, and Adult respectively in figures and tables.Our data source is [4].Behavior Risk Factor Surveillance Survey (BRFSS); State Cigarette Taxes: http://www.tobaccofreekids.org/research/factsheets/pdf/0099.pdf.The data sample size is 51 (50 US states plus Washington D.C.), i.e. each state has one record in Year 2007.The paired scatter plots are demonstrated in Figure

Table 1
This table reports p-values obtained when Fisher Z-transformation tests (upper triangle part) were applied to data drawn from Figure5show that all variables are not normally distributed; variables Sales, Revenues and Costs are showing fat tail behaviors, probably variable Adult too; variable Youth is skewed to the left, and all other variables are skewed to the right.Based on these observations, it may not be appropriate to calculate correlation coefficients and to perform Fisher's Z-transformation test since they are valid under assumptions of either a finite first moment or a finite second moment from each random variable.In other words, when using these calculated quantities, one should use them with caution.

Table 2
This table reports percentiles and combinations used when the new χ 2 tests were applied to data drawn from Figure5

Table 3
This table reports T N -values (upper triangle part) and their converted pseudo p-values ( p = T N /(3.3260))(lower triangle part) when 100 Max linear χ 2 tests were applied to each paired data