Abstract
In this paper, we consider the problem of selecting the most appropriate model, amongst a given collection of mixture models, to describe datasets likely drawn from mixture of distributions. The proposed method consists of finding the quasi-maximum likelihood estimators (QMLEs) of the various models in competition, using the Expectation-Maximization (EM) type algorithms, and subsequently estimating, for every model, a statistical distance to the true model based on the empirical cumulative distribution function (cdf) of the original dataset and the QMLE-fitted cdf. To evaluate the goodness of fit, a new metric, the Integrated Cumulative Error ($ICE$) is proposed and compared with other existing metrics for accuracy of detecting the appropriate model. We state, under mild conditions, that our estimator of the $ICE$ distance converges at the rate $\sqrt{n}$ in probability along with the consistency of our model selection procedure (ability to detect asymptotically the right model). The $ICE$ criterion shows, over a set of benchmark examples, numerically improved performance from the existing distance-based criteria in identifying the correct model. The method is applied in a material fatigue life context to model the distribution of indicators of the fatigue crack formation potency, obtained from numerical experiments.
Citation
P. Vandekerkhove. J. M. Padbidri. D. L. McDowell. "Integrated Cumulative Error (ICE) distance for non-nested mixture model selection: Application to extreme values in metal fatigue problems." Electron. J. Statist. 8 (2) 3141 - 3175, 2014. https://doi.org/10.1214/15-EJS985
Information