## The Annals of Statistics

### Nonparametric estimation of component distributions in a multivariate mixture

#### Abstract

Suppose k-variate data are drawn from a mixture of two distributions, each having independent components. It is desired to estimate the univariate marginal distributions in each of the products, as well as the mixing proportion. This is the setting of two-class, fully parametrized latent models that has been proposed for estimating the distributions of medical test results when disease status is unavailable. The problem is one of inference in a mixture of distributions without training data, and until now it has been tackled only in a fully parametric setting. We investigate the possibility of using nonparametric methods. Of course, when k=1 the problem is not identifiable from a nonparametric viewpoint. We show that the problem is "almost" identifiable when k=2; there, the set of all possible representations can be expressed, in terms of any one of those representations, as a two-parameter family. Furthermore, it is proved that when $k\geq3$ the problem is nonparametrically identifiable under particularly mild regularity conditions. In this case we introduce root-n consistent nonparametric estimators of the 2k univariate marginal distributions and the mixing proportion. Finite-sample and asymptotic properties of the estimators are described.

#### Article information

Source
Ann. Statist., Volume 31, Number 1 (2003), 201-224.

Dates
First available in Project Euclid: 26 February 2003

https://projecteuclid.org/euclid.aos/1046294462

Digital Object Identifier
doi:10.1214/aos/1046294462

Mathematical Reviews number (MathSciNet)
MR1962504

Zentralblatt MATH identifier
1018.62021

Subjects
Primary: 62G05: Estimation
Secondary: 62G70

#### Citation

Hall, Peter; Zhou, Xiao-Hua. Nonparametric estimation of component distributions in a multivariate mixture. Ann. Statist. 31 (2003), no. 1, 201--224. doi:10.1214/aos/1046294462. https://projecteuclid.org/euclid.aos/1046294462

#### References

• BARBE, P. and BERTAIL, P. (1995). The Weighted Bootstrap. Springer, Berlin.
• CERRITO, P. B. (1992). Using stratification to estimate multimodal density functions with applications to regression. Comm. Statist. Simulation Comput. 21 1149-1164.
• COHEN, A. C. (1967). Estimation in mixtures of two normal distributions. Technometrics 9 15-28.
• DAY, N. E. (1969). Estimating the components of a mixture of normal distributions. Biometrika 56 463-474.
• EFRON, B. (1981). Nonparametric standard errors and confidence intervals (with discussion). Canad. J. Statist. 9 139-172.
• EVERITT, B. S. and HAND, D. J. (1981). Finite Mixture Distributions. Chapman and Hall, London.
• HADGU, A. and QU, Y. (1998). A biomedical application of latent models with random effects. Appl. Statist. 47 603-616.
• HALL, P. (1981). On the nonparametric estimation of mixture proportions. J. Roy. Statist. Soc. Ser. B 43 147-156.
• HALL, P. and PRESNELL, B. (1999). Intentionally biased bootstrap methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 143-158.
• HALL, P. and TITTERINGTON, D. M. (1984). Efficient nonparametric estimation of mixture proportions. J. Roy. Statist. Soc. Ser. B 46 465-473.
• HALL, P. and TITTERINGTON, D. M. (1985). The use of uncategorized data to improve the performance of a nonparametric estimator of a mixture density. J. Roy. Statist. Soc. Ser. B 47 155-163.
• HUI, S. L. and ZHOU, X. H. (1998). Evaluation of diagnostic tests without gold standards. Statist. Methods Medical Res. 7 354-370.
• LAIRD, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. J. Amer. Statist. Assoc. 73 805-811.
• LANCASTER, T. and IMBENS, G. (1996). Case-control studies with contaminated controls. J. Econometrics 71 145-160.
• LINDSAY, B. G. (1983a). The geometry of mixture likelihoods: A general theory. Ann. Statist. 11 86-94.
• LINDSAY, B. G. (1983b). The geometry of mixture likelihoods. II. The exponential family. Ann. Statist. 11 783-792.
• LINDSAY, B. G. and BASAK, P. (1993). Multivariate normal mixtures: A fast consistent method of moments. J. Amer. Statist. Assoc. 88 468-476.
• MCLACHLAN, G. J. and BASFORD, K. E. (1988). Mixture Models. Inference and Applications to Clustering. Dekker, New York.
• METZ, C. E. (1978). Basic principles of ROC analysis. Seminars in Nuclear Medicine 8 283-298.
• MURRAY, G. D. and TITTERINGTON, D. M. (1978). Estimation problems with data from a mixture. Appl. Statist. 27 325-334.
• O'NEILL, T. J. (1978). Normal discrimination with unclassified observations. J. Amer. Statist. Assoc. 73 821-826.
• QIN, J. (1998). Semiparametric likelihood based method for goodness of fit tests and estimation in upgraded mixture models. Scand. J. Statist. 25 681-691.
• QIN, J. (1999). Empirical likelihood ratio based confidence intervals for mixture proportions. Ann. Statist. 27 1368-1384.
• QU, Y. and HADGU, A. (1998). A model for evaluating sensitivity and specificity for correlated diagnostic tests in efficacy studies with an imperfect reference test. J. Amer. Statist. Assoc. 93 920-928.
• QU, Y., TAN, M. and KUTNER, M. H. (1996). Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52 797-810.
• QUANDT, R. E. and RAMSEY, J. B. (1978). Estimating mixtures of normal distributions and switching regressions. J. Amer. Statist. Assoc. 73 730-738.
• REDNER, R. A. and WALKER, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195-239.
• RINDSKOPF, D. and RINDSKOPF, W. (1986). The value of latent class analysis in medical diagnosis. Statist. Medicine 5 21-27.
• SHAHSHAHANI, B. M. and LANDGREBE, D. A. (1994). The effect of unlabeled samples in reducing the small sample-size problem and mitigating the Hughes phenomenon. IEEE Trans. Geosci. Remote Sensing 32 1087-1095.
• TEICHER, H. (1967). Identifiability of mixtures of product measures. Ann. Math. Statist. 38 1300- 1302.
• THOMPSON, W. D. and WALTER, S. D. (1988). A reappraisal of the kappa coefficient. J. Clinical Epidemiol. 41 949-958.
• TITTERINGTON, D. M. (1983). Minimum-distance non-parametric estimation of mixture proportions. J. Roy. Statist. Soc. Ser. B 45 37-46.
• TITTERINGTON, D. M., SMITH, A. F. M. and MAKOV, U. E. (1985). Statistical Analy sis of Finite Mixture Distributions. Wiley, Chichester.
• TORRANCE-Ry NARD, V. L. and WALTER, S. D. (1988). Effects of dependent errors in the assessment of diagnostic test performance. Statist. Medicine 16 2157-2175.
• VALENSTEIN, P. N. (1990). Evaluating diagnostic tests with imperfect standards. Amer. J. Clinical Pathology 93 252-258.
• WALTER, S. D. and IRWIG, L. M. (1988). Estimation of test error rates, disease prevalence and relative risk from misclassified data: A review. Journal of Clinical Epidemiology 41 923-937.
• CANBERRA, ACT 0200 AUSTRALIA E-MAIL: halpstat@pretty.anu.edu.au NORTHWEST HRS&D CENTER OF EXCELLENCE VA PUGET SOUND HEALTH CARE Sy STEM UNIVERSITY OF WASHINGTON 1160. S. COLUMBIAN WAY
• SEATTLE, WASHINGTON 98108 E-MAIL: Andrew.Zhou@med.va.gov