December 2021 Estimating the number of components in finite mixture models via the Group-Sort-Fuse procedure
Tudor Manole, Abbas Khalili
Author Affiliations +
Ann. Statist. 49(6): 3043-3069 (December 2021). DOI: 10.1214/21-AOS2072

Abstract

Estimation of the number of components (or order) of a finite mixture model is a long standing and challenging problem in statistics. We propose the Group-Sort-Fuse (GSF) procedure—a new penalized likelihood approach for simultaneous estimation of the order and mixing measure in multidimensional finite mixture models. Unlike methods which fit and compare mixtures with varying orders using criteria involving model complexity, our approach directly penalizes a continuous function of the model parameters. More specifically, given a conservative upper bound on the order, the GSF groups and sorts mixture component parameters to fuse those which are redundant. For a wide range of finite mixture models, we show that the GSF is consistent in estimating the true mixture order and achieves the n1/2 convergence rate for parameter estimation up to polylogarithmic factors. The GSF is implemented for several univariate and multivariate mixture models in the R package GroupSortFuse. Its finite sample performance is supported by a thorough simulation study, and its application is illustrated on two real data examples.

Funding Statement

Tudor Manole was supported by the Natural Sciences and Engineering Research Council of Canada and also by the Fonds de recherche du Québec–Nature et technologies. Abbas Khalili was supported by the Natural Sciences and Engineering Research Council of Canada through Discovery Grant (nserc rgpin-2015-03805 and nserc rgpin-2020-05011), and the CRM StatLab.

Acknowledgements

We would like to thank the Editor, an Associate Editor, and two referees for their insightful comments and suggestions which significantly improved the quality of this paper. We thank Jiahua Chen for discussions related to the proof of Proposition 1, Russell Steele for bringing to our attention the multinomial dataset analyzed in Section 5, and Aritra Guha for sharing an implementation of the Merge–Truncate–Merge procedure. We also thank Sivaraman Balakrishnan and Larry Wasserman for useful discussions.

Funding Statement

Tudor Manole was supported by the Natural Sciences and Engineering Research Council of Canada and also by the Fonds de recherche du Québec–Nature et technologies. Abbas Khalili was supported by the Natural Sciences and Engineering Research Council of Canada through Discovery Grant (nserc rgpin-2015-03805 and nserc rgpin-2020-05011), and the CRM StatLab.

Acknowledgements

We would like to thank the Editor, an Associate Editor, and two referees for their insightful comments and suggestions which significantly improved the quality of this paper. We thank Jiahua Chen for discussions related to the proof of Proposition 1, Russell Steele for bringing to our attention the multinomial dataset analyzed in Section 5, and Aritra Guha for sharing an implementation of the Merge–Truncate–Merge procedure. We also thank Sivaraman Balakrishnan and Larry Wasserman for useful discussions.

Citation

Download Citation

Tudor Manole. Abbas Khalili. "Estimating the number of components in finite mixture models via the Group-Sort-Fuse procedure." Ann. Statist. 49 (6) 3043 - 3069, December 2021. https://doi.org/10.1214/21-AOS2072

Information

Received: 1 October 2019; Revised: 1 December 2020; Published: December 2021
First available in Project Euclid: 14 December 2021

MathSciNet: MR4352522
zbMATH: 1486.62062
Digital Object Identifier: 10.1214/21-AOS2072

Subjects:
Primary: 62F10 , 62F12
Secondary: 62H12

Keywords: finite mixture models , maximum penalized likelihood estimation , strong identifiability , Wasserstein distance

Rights: Copyright © 2021 Institute of Mathematical Statistics

JOURNAL ARTICLE
27 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.49 • No. 6 • December 2021
Back to Top