Identifiability of the proportion of null hypotheses in skew-mixture models for the p-value distribution

In many multiple testing procedures, accurate modeling of the p-value distribution is a key issue. Mixture distributions have been shown to provide adequate models for p-value densities under the null and the alternative hypotheses. An important parameter of the mixture model that needs to be estimated is the proportion of true null hypotheses, which under the mixture formulation becomes the probability mass attached to the value associated with the null hypothesis. It is well known that in a general mixture model, especially when a scale parameter is present, the mixing distribution need not be identifiable. Nevertheless, under our setting for mixture model for p-values, we show that the weight attached to the null hypothesis is identifiable under two very different types of conditions. We consider several examples including univariate and multivariate mixture models for transformed p-values. Finally, we formulate an abstract theorem for general mixtures and present other examples. AMS 2000 subject classifications: Primary 62E10; secondary 62G99.


Introduction
Many multiple testing procedures depend critically on the distribution of the p-values associated with the multiple hypotheses.Following Storey (2002), the p-value density can be represented as a mixture of a null component and an shape parameter λ is defined as where φ denotes the standard normal density and the Φ denotes the corresponding cumulative distribution function (c.d.f.).In (1.1), we have used a slight reparameterization by switching the skewness parameter λ to −λ.The skewnormal family of distributions and other related skewed distributions such as skew-elliptical and skew-symmetric distributions, have recently become popular tools for modeling and have found a wide variety of applications; see Genton (2004).The skew-normal distribution was introduced by Azzalini (1985) and generalized in the multivariate situation by Azzalini and Dalla Valle (1996) and others.The potential of the skew-normal distribution as a flexible tool for modeling increases many-fold when one considers mixtures of such distributions.Ghosal and Roy (2011) considered mixtures of skew-normal distributions to model probit-transformed p-value distribution arising in general multiple testing problem.Clearly, under probit transformation, the null p-value density of standard uniform transforms into the standard normal density, which corresponds to the parameter value (0, 1, 0) in the skew-normal family.
In general, mixtures may not be identifiable.For instance, location-scale mixture of normal densities, one of the most commonly used mixtures, is not identifiable; see Lindsay (1995), Page 54.This result renders the family of unrestricted skew-normal mixtures non-identifiable as well.Even though the entire mixing distribution may not be identifiable, some key features of it may be still identifiable.In multiple testing problems, a key estimand is the proportion of true null hypotheses among all hypotheses that are being tested.In the context of skew-normal mixture models for the probit transformed p-values, the true null proportion corresponds to the weight given to the point (µ, ω, λ) = (0, 1, 0).We present some conditions on the mixing distribution for univariate skew-normal mixtures under which the point mass at (0, 1, 0) can be identified.Further we extend the result to multivariate skew-normal mixtures, which are appropriate tools for dependent p-values.Abstract generalizations with further examples are also presented.
Our results on identifiability of the proportion of the true null hypotheses are given under two very different scenarios for the range of parameters under the alternative hypotheses.For the first type, the densities under the alternative have tails thicker than that of the null density φ(x) and the null value (0, 1, 0) is a boundary point for the possible parameter values under the alternative.In this situation, we use a technique based on the characteristic function (c.f.) to identify the proportion of true null hypotheses.For the second type of mixtures, the null value (0, 1, 0) is also a boundary point for the possible parameter values under the alternative, but the densities under the alternative have tails thinner than that of the null density φ(x).In this case, the ratios of the densities under the alternative and the null are studied to obtain the corresponding identifiability results.

Univariate skew-normal mixtures
We begin with an identifiability result for univariate skew-normal mixtures.Consider a skew-normal mixture model with weight π 0 attached to the distinguished value (0, 1, 0) corresponding to the standard normal distribution.In a multiple testing problem, the mixture density may represent the overall density of probit transformed p-values, where null hypotheses hold true randomly with probability π 0 .The following describes a setting where the true null proportion π 0 is uniquely identified from such a mixture.
Theorem 2.1.Consider a skew-normal mixture of the type where G is concentrated on the region Then f uniquely determines π 0 .
Proof.Let f (t) = e itx f (x)dx denote the c.f. of f .If X has density q(x; 0, 1, λ), then X can be represented as where Y 0 , Y 1 are independent standard normal; see Dalla Valle (2004), Proposition 1.2.3.
Note that E(e is|Y0| ) = 2e −s 2 /2 Φ(is) for all s ∈ R, where Φ is the unique entire function which agrees with the standard normal c.d.f. on R.This may be shown by evaluating 2 ∞ 0 e isy φ(y)dy by direct contour integration.Alternatively, by observing that the half-normal distribution has finite moment generating function (m.g.f.) everywhere, E(e z|Y0| ) is an entire function which agrees with the function 2 ∞ 0 e zy φ(y)dy = 2e z 2 /2 Φ(z) for z ∈ R, and hence must agree everywhere on z ∈ C.
Therefore, we obtain We shall show that the second term in (2.3) goes to zero as |t| → ∞, identifying π 0 uniquely from f .It suffices to show that for every (µ, ω, λ) with ω 2 ≥ 1 + λ 2 , (ω, λ) = (1, 0), we have and the expression in (2.4) is uniformly bounded by a constant.If it can be shown that is the c.f. of a continuous random variable, then (2.4) holds by the Riemann-Lebesgue lemma while the second assertion holds by the absolute boundedness of a c.f. by 1.
To complete the proof, we use the representation (2.2) for a general µ, ω: Theorem 2.1 implies the following result for normal mixtures.
Corollary 2.2.A normal mixture of the type where G is concentrated on the region R × (1, ∞), uniquely determines π 0 .
The conclusion is not unexpected since N (0, 1) may not be written as mixtures of N (0, ω 2 ) with ω > 1.For the skew-normal family, the corresponding natural lower bound for ω seems to be √ 1 + λ 2 , since this guarantees that the variance of the distribution is more than 1.
For modeling probit transformed p-values, a more useful region for the mixing parameter in the multiple testing context is the complementary region ω < √ 1 + λ 2 .This is due to the fact that the conditions µ ≤ 0, 1 < ω < √ 1 + λ 2 , λ > 0 ensure that the density of the original p-values is decreasing [cf., Ghosal and Roy (2011)], which is a natural shape restriction in the testing context.Indeed, the required condition rules out the normal case λ = 0.For a precise characterization of the decreasing p-value density, see Theorem 2 of Ghosal and Roy (2011).The normal mixture model may, however, be useful in the case when test statistics are modeled directly.A referee pointed out that p-values for two sided tests can sometimes lose valuable information.If there is imbalance in the distribution of the direction of alternative in a twosided t-test, then that information can be retained by preserving the sign of the t-statistic while considering the probit transform of the p-values.Since the standard normal distribution is invariant under sign change, a mixture model like in Corollary 2.2 may be appropriate for such signed transformed p-values.
For identifiability purposes, we can work with a larger set of parameters than those ensuring a decreasing p-value density under the alternative.Identifiability of π 0 is guaranteed if the p-value density under the alternative attains the minimum value 0 as the p-values approach 1, since then the weight π 0 attached to the uniform can be easily identified from the height of the mixture density for the original p-values at one; see Ghosal et al. (2008).For testing against a onesided alternative hypothesis in a monotone likelihood ratio family, the p-value density at 1 is usually zero.One-sided alternatives will be relevant whenever the direction of activity is known beforehand.For a two-sided alternative, the p-value density under the alternative is generally not zero at 1, but is usually a small number η.In the later situation, identifiability can hold only approximately in the sense that the value of π 0 can be asserted only within a range of values of span η.The condition that p-value density at 1 is zero in terms of probit transformed p-value x is equivalent to showing that the density ratio q(x; µ, ω, λ)dG(µ, ω, λ)/φ(x) → 0 as x → ∞.This motivates the following result.
If λ = 0, the skewness factor is redundant and the density ratio is bounded by which is also G-integrable by assumption.If λ = 0, necessarily ω 2 < 1 or µ < 0. In either case, For λ > 0, we use estimate (2.5) to reach the same conclusion.Therefore the density ratio goes to 0 as x → ∞ for any fixed (µ, ω, λ) ∈ Θ 2 .
Remark 2.4.In the Bayesian context, a popular method of inducing prior distribution on densities is using a mixture model and assigning a prior distribution on the mixing distribution G. Ferguson (1983) and Lo (1984) pioneered this idea for Bayesian density estimation.When G is given a Dirichlet process prior [Ferguson (1973)] with E(G) = G 0 , the condition on G in Theorem 2.3 can be met by requiring that G 0 (Θ 2 ) = 1, {λ>0} λ −1 dG 0 (µ, ω, λ) < ∞ and It is interesting to observe that the situations in Theorems 2.1 and 2.3 are diametrically opposite in that in the former case, the c.f. under the alternative has thinner tail than the c.f. under the null, while in the latter case, the density under the alternative has thinner tail than that under the null.According to a well-known "uncertainty principle" [Hardy (1933)], a function and its Fourier transform cannot both have thinner tails than the standard normal.If the mixing distributions gives weight to both Θ 1 and Θ 2 , then neither the technique of controlling the ratios of the c.f.'s, nor that of the ratios of the densities used in the proofs, will work.This is the primary reason why we need G to give weights only to one type of alternatives.The following remark clarifies the comment further.
Remark 2.5.Consider a mixture of two densities from the normal scale family, where one of the densities in the mixture, 2φ(2x) has a thinner tail than the null density φ(x) and the other density, 1 2 φ(x/2) has thicker tail than φ(x).Then f can be written as a mixture of N (0, 1) and a symmetric unimodal density, and hence the null proportion π 0 may not be identified from f .More precisely, we may write for some symmetric unimodal density h.Observe that g(x) = 2φ(2x) + 1 2 φ(x/2) − 1 2 φ(x) ≥ 0 for all x, and g ′ (x) > 0 for x < 0 and g ′ (x) < 0 for x > 0. Also note that g(x)dx = 3 2 .Therefore, with the symmetric unimodal probability density h(x) = 2 3 g(x), representation (2.7) holds.

Multivariate skew-normal mixtures
In this section, we consider analogs of the results of the last section for multivariate skew-normal mixtures.
In the multiple hypothesis testing situation considered by Ghosal and Roy (2011), null or alternative hypotheses hold true independently of each other with probability π 0 and 1 − π 0 respectively.Let H 1 , . . ., H m stand for hypothesis indicators, where 0 stands for a true null and 1 for a false null.For any H = (H 1 , . . ., H m ), let µ H (respectively, λ H ) be the vector obtained from µ (respectively, λ) by replacing the jth component by 0 whenever H j = 0. Similarly, let D H (respectively, ∆ H , Γ H ) be the diagonal matrix obtained from D (respectively, ∆, Γ) by replacing the jth diagonal entry by 1 whenever H j = 0. Given all hypothesis indicators H 1 , . . ., H m , the joint density of probit p-values (X 1 , . . ., X m ) ′ is assumed to be q(x; µ H , D H , λ H , R), and (µ j , ω j , λ j ) are i.i.d.following a joint distribution G.The correlation matrix R is kept fixed in the mixing.Thus the multivariate skew-normal mixture density f can be written as where n H stands for the number of false null hypotheses.Below, we use the following orderings: for vectors x, y, let x < y or x ≤ y stand for componentwise ordering and for matrices, let A ≥ B mean that A−B is non-negative definite while A > B stand for A ≥ B and A = B.
Theorem 3.1.Consider a skew-normal mixture of the type (3.3),where G is concentrated on a region Θ 1 such that Γ H RΓ H > R for all H = 0 and (µ, ω, λ) ∈ Θ m 1 .Then f uniquely determines π 0 .Proof.We use the c.f. based argument as in the case of univariate mixtures and apply to every term in the sum (3.3) by showing that the ratio of c.f's exp( t 2 /2)q(t; µ H , D H , λ H , R) → 0 as t → ∞ along a line.For a given sequence H of hypotheses indicators, a random variable X having density q(x; µ H , D H , λ H , R) can be represented using (3.2) as Clearly, X is the convolution of N m (0, R) with another variable since the variance-covariance matrix of Γ H Y is Γ H RΓ H ≥ R. It remains to show that the other variable in the convolution is continuous.Then the Riemann-Lebesgue lemma will apply on t approaching infinity along at least one line.
If at least one λ i = 0 for some i with i for all i.These two conditions are, of course, equivalent in the independent case.
As in the univariate case, identifiability can be established under a diametrically opposite condition on parameters using density considerations.
Proof.We shall show that for every H, the ratio dG(µ j , ω j , λ j ) converges to 0 as x tends to infinity along some line.We establish this by showing that (i) q(x; µ H , D H , λ H , R)/φ m (x; 0, R) is uniformly bounded by a G-integrable function; (ii) for every fixed (µ H , ω H , λ H ), q(x; µ H , D H , λ H , R)/φ m (x; 0, R) → 0 as x tends to infinity along some line.
If λ H = 0, we can ignore the skewness factor in the expression for skewnormal density, i.e., the density becomes a normal density.Further, ∆ H = I, i.e., Γ H = D H .As Γ H RΓ H < R, there exists ξ such that ξ Choosing x = aξ and letting a → ∞, we obtain (ii).
Remark 3.4.The ideal opposite of the condition in Theorem 3.1 is given by Γ H RΓ H < R for all H = 0, which is weaker than the condition assumed in Theorem 3.3.Using (3.5), it is possible to prove assertion (ii) in the proof only under Γ H RΓ H < R.However, it seems that ensuring positivity of λ ′ H R −1 Γ −1 H (x − µ H ) for some x not depending on the latent parameter (µ, ω, λ) is a challenge.The positivity is essential in applying the estimate of Mills ratio Φ(−t) ≤ t −1 φ(t).A successful substitution of Φ(−t) will allow the resulting exponential factor to combine with the factor already present, thus leading to ordering condition Γ H RΓ H < R in view of (3.5).In the special case when R −1 is a positive operator, i.e., R −1 x ≥ 0 for all x ≥ 0, the condition Γ H RΓ H < R will suffice in (3.4).Further, the condition on µ be simplified to µ ≤ 0.

Abstraction and further examples
The basic idea behind the two types of identifiability theorems can be put in an abstract form for an arbitrary parametric family in R m forming a univariate mixture model of Section 2. Abstraction of the multivariate mixture model of Section 3 will be more challenging, and will possibly depend on the availability of a decomposition like (3.2).For the purpose of simplicity and transparency of the conditions imposed, below we restrict to the univariate situation.
The proof of the theorem follows from the arguments used in the proofs of Theorems 2.1 and 2.3.