Bernoulli

  • Bernoulli
  • Volume 3, Number 3 (1997), 279-299.

The estimation of the order of a mixture model

Didier Dacunha-Castelle and Elisabeth Gassiat

Full-text: Open access

Abstract

We propose a new method to estimate the number of different populations when a large sample of a mixture of these populations is observed. It is possible to define the number of different populations as the number of points in the support of the mixing distribution. For discrete distributions having a finite support, the number of support points can be characterized by Hankel matrices of the first algebraic moments, or Toeplitz matrices of the trigonometric moments. Namely, for one-dimensional distributions, the cardinality of the support may be proved to be the least integer such that the Hankel matrix (or the Toeplitz matrix) degenerates. Our estimator is based on this property. We first prove the convergence of the estimator, and then its exponential convergence under wide assumptions. The number of populations is not a priori bounded. Our method applies to a large number of models such as translation mixtures with known or unknown variance, scale mixtures, exponential families and various multivariate models. The method has an obvious computational advantage since it avoids any computation of estimates of the mixing parameters. Finally we give some numerical examples to illustrate the effectiveness of the method in the most popular cases.

Article information

Source
Bernoulli, Volume 3, Number 3 (1997), 279-299.

Dates
First available in Project Euclid: 23 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.bj/1177334456

Mathematical Reviews number (MathSciNet)
MR1468306

Zentralblatt MATH identifier
0889.62012

Keywords
Hankel matrix mixture models order

Citation

Dacunha-Castelle, Didier; Gassiat, Elisabeth. The estimation of the order of a mixture model. Bernoulli 3 (1997), no. 3, 279--299. https://projecteuclid.org/euclid.bj/1177334456


Export citation

References

  • [1] Akaike, H. (1974) A new look at the statistical model identification. IEEE Trans. Automat. Control, 19, 716-723.
  • [2] Bock, H.H. (1994) Information and entropy in cluster analysis. In H. Bozdogan (ed.), Proceedings of the First US-Japan Conference on the Frontiers of Statistical Modeling, pp. 115-147. Deventer: Kluwer.
  • [3] Bozdogan, H. (1994) Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In H. Bozdogan (ed.) Proceedings of the First US-Japan Conference on the Frontiers of Statistical Modeling, pp. 69-113. Deventer: Kluwer.
  • [4] Brezinski, C. (1992) Biorthogonality and its Application to Numerical Analysis. New York: Marcel Dekker.
  • [5] Dacunha-Castelle, D. and Gassiat, E. (1997) Testing in locally conic models. ESAIM Probab. Statist. To appear.
  • [6] Gamboa, F. and Gassiat, E. (1997) Bayesian methods and Maximum entropy for ill posed inverse problems. Ann. Statist., 25, 328-350.
  • [7] Gamboa, F. and Gassiat, E. (1996) Blind deconvolution of discrete linear systems. Ann. Statist., 24, 1964-1981.
  • [8] Gamboa, F. and Gassiat, E. (1995) Source separation when the input sources are discrete or have constant modulus. IEEE Trans. Signal Processing. Submitted.
  • [9] Gassiat, E. and Gautherat, E. (1997) Identification of noisy linear systems with discrete random input. IEEE Trans. Inf. Theory. To appear.
  • [10] Ghosh, J.K. and Sen, P.K. (1985) On the asymptotic performance of the log-likelihood ratio statistic for the mixture model and related results. In L. Le Cam and R. Olshem (eds), Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. II, pp. 789-806. VT: Wadsworth.
  • [11] Hartigan, J. (1985) A failure of likelihood asymptotics for normal mixtures. In L. Le Cam and R. Olshen (eds), Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. II, pp. 807-811. VT: Wadsworth.
  • [12] Izenman, A.J. and Sommer, C. (1988) Philatelic mixtures and multivariate densities. J. Amer. Math. Soc., 83, 94.
  • [13] Karlin, S. and Studden, W.J. (1966) Tchebychev Systems with Applications in Analysis and Statistics. New York: Wiley.
  • [14] Krein, M.G. and Nudelman, A. (1977) The Markov Moment Problem and Extremal Problems. Providence, RI: American Mathematical Society.
  • [15] Lindsay, B.G. (1989) Moment matrices: application in mixtures. Ann. Statist., 17, 722-740.
  • [16] McLachlan, G.J. (1987) On bootstrapping the likelihood ratio test statistics for the number of components in a normal mixture. Appl. Statist., 36, 318-324.
  • [17] Petrov, V.V. (1975) Sums of Independent Random Variables. Berlin: Springer-Verlag.
  • [18] Ranneby, B.O. (1984) The maximum spacing method; An estimation method related to the maximum likelihood method. Scand. J. Statist., 11, 93-112.
  • [19] Rissanen, J. and Ristad, E.S. (1994) Unsupervised classification with stochastic complexity. In H. Bozdogan (ed.), Proceedings of the First US-Japan Conference on the Frontiers of Statistical Modeling, pp. 171-182. Deventer: Kluwer.
  • [20] Roeder, K. (1994) A graphical technique for determining the number of components in a mixture of normals. J. Amer. Statist. Assoc., 89, 487-495.
  • [21] Schwarz, G. (1978) Estimation of the dimension of a model. Ann. Statist., 6, 461-464.
  • [22] Self, S.G. and Lieng, K.L. (1987) Asymptotic properties of maximum likelihood and maximum ratio tests under non standard conditions. J. Amer. Math. Soc. (Theory and Method), 82, 605-610.