We propose a novel estimator for the number of mixture components (denoted by M) in a nonparametric finite mixture model. The setting that we consider is one where the analyst has repeated observations of variables that are conditionally independent given a finitely supported latent variable with M support points. Under a mild assumption on the joint distribution of the observed and latent variables, we show that an integral operator T that is identified from the data has rank equal to M. We use this observation, in conjunction with the fact that singular values of operators are stable under perturbations, to propose an estimator of M, which essentially consists of a thresholding rule that counts the number of singular values of a consistent estimator of T that are greater than a data-driven threshold. We prove that our estimator of M is consistent, and establish nonasymptotic results, which provide finite sample performance guarantees for our estimator. We present a Monte Carlo study, which shows that our estimator performs well for samples of moderate size.
Ann. Statist.
49(4):
2178-2205
(August 2021).
DOI: 10.1214/20-AOS2032
ACCESS THE FULL ARTICLE
It is not available for individual sale.
This article is only available to subscribers.
It is not available for individual sale.
It is not available for individual sale.