The Annals of Statistics

Estimation and confidence sets for sparse normal mixtures

T. Tony Cai, Jiashun Jin, and Mark G. Low
Source: Ann. Statist. Volume 35, Number 6 (2007), 2421-2449.

Abstract

For high dimensional statistical models, researchers have begun to focus on situations which can be described as having relatively few moderately large coefficients. Such situations lead to some very subtle statistical problems. In particular, Ingster and Donoho and Jin have considered a sparse normal means testing problem, in which they described the precise demarcation or detection boundary. Meinshausen and Rice have shown that it is even possible to estimate consistently the fraction of nonzero coordinates on a subset of the detectable region, but leave unanswered the question of exactly in which parts of the detectable region consistent estimation is possible.

In the present paper we develop a new approach for estimating the fraction of nonzero means for problems where the nonzero means are moderately large. We show that the detection region described by Ingster and Donoho and Jin turns out to be the region where it is possible to consistently estimate the expected fraction of nonzero coordinates. This theory is developed further and minimax rates of convergence are derived. A procedure is constructed which attains the optimal rate of convergence in this setting. Furthermore, the procedure also provides an honest lower bound for confidence intervals while minimizing the expected length of such an interval. Simulations are used to enable comparison with the work of Meinshausen and Rice, where a procedure is given but where rates of convergence have not been discussed. Extensions to more general Gaussian mixture models are also given.

First Page: Show Hide
Primary Subjects: 62G05
Secondary Subjects: 62G20, 62G32
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1201012967
Digital Object Identifier: doi:10.1214/009053607000000334
Mathematical Reviews number (MathSciNet): MR2382653
Zentralblatt MATH identifier: 05241110

References

Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289--300.
Mathematical Reviews (MathSciNet): MR1325392
Cai, T., Jin, J. and Low, M. G. (2006). Estimation and confidence sets for sparse normal mixtures. Technical report, Dept. Statistics, The Wharton School, Univ. Pennsylvania. Available at www.arxiv.org/abs/math/0612623.
Mathematical Reviews (MathSciNet): MR2382653
Digital Object Identifier: doi:10.1214/009053607000000334
Project Euclid: euclid.aos/1201012967
Zentralblatt MATH: 05241110
Cai, T. and Low, M. G. (2004). An adaptation theory for nonparametric confidence intervals. Ann. Statist. 32 1805--1840.
Mathematical Reviews (MathSciNet): MR2102494
Digital Object Identifier: doi:10.1214/009053604000000049
Project Euclid: euclid.aos/1098883773
Zentralblatt MATH: 1056.62060
Donoho, D. (1988). One-sided inference about functionals of a density. Ann. Statist. 16 1390--1420.
Mathematical Reviews (MathSciNet): MR0964930
Digital Object Identifier: doi:10.1214/aos/1176351045
Project Euclid: euclid.aos/1176351045
Zentralblatt MATH: 0665.62040
Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962--994.
Mathematical Reviews (MathSciNet): MR2065195
Digital Object Identifier: doi:10.1214/009053604000000265
Project Euclid: euclid.aos/1085408492
Zentralblatt MATH: 1092.62051
Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96--104.
Mathematical Reviews (MathSciNet): MR2054289
Digital Object Identifier: doi:10.1198/016214504000000089
Zentralblatt MATH: 1089.62502
Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035--1061.
Mathematical Reviews (MathSciNet): MR2065197
Digital Object Identifier: doi:10.1214/009053604000000283
Project Euclid: euclid.aos/1085408494
Zentralblatt MATH: 1092.62065
Ingster, Y. I. (1999). Minimax detection of a signal for $l^p_n$-balls. Math. Methods Statist. 7 401--428.
Mathematical Reviews (MathSciNet): MR1680087
Jin, J. (2004). Detecting a target in very noisy data from multiple looks. In A Festschrift for Herman Rubin (A. DasGupta, ed.) 255--286. IMS, Beachwood, OH.
Mathematical Reviews (MathSciNet): MR2126903
Digital Object Identifier: doi:10.1214/lnms/1196285396
Jin, J. (2006). Proportion of nonzero normal means: Universal oracle equivalences and uniformly consistent estimations. Technical report, Dept. Statistics, Purdue Univ.
Jin, J., Peng, J. and Wang, P. (2007). Estimating the proportion of non-null effects, with applications to CGH lung cancer data. Working manuscript.
Le Cam, L. and Yang, G. L. (1990). Asymptotics in Statistics: Some Basic Concepts. Springer, New York.
Mathematical Reviews (MathSciNet): MR1066869
Zentralblatt MATH: 0719.62003
Maraganore, D. M., de Andrade, M. et al. (2005). High-resolution whole-genome association study of Parkinson disease. Amer. J. Human Genetics 77 685--693.
Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373--393.
Mathematical Reviews (MathSciNet): MR2275246
Digital Object Identifier: doi:10.1214/009053605000000741
Project Euclid: euclid.aos/1146576267
Zentralblatt MATH: 1091.62059
Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley, New York.
Mathematical Reviews (MathSciNet): MR0838963

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics