The Annals of Statistics

Consistency of a recursive estimate of mixing distributions

Surya T. Tokdar, Ryan Martin, and Jayanta K. Ghosh
Source: Ann. Statist. Volume 37, Number 5A (2009), 2502-2522.

Abstract

Mixture models have received considerable attention recently and Newton [Sankhyā Ser. A 64 (2002) 306–322] proposed a fast recursive algorithm for estimating a mixing distribution. We prove almost sure consistency of this recursive estimate in the weak topology under mild conditions on the family of densities being mixed. This recursive estimate depends on the data ordering and a permutation-invariant modification is proposed, which is an average of the original over permutations of the data sequence. A Rao–Blackwell argument is used to prove consistency in probability of this alternative estimate. Several simulations are presented, comparing the finite-sample performance of the recursive estimate and a Monte Carlo approximation to the permutation-invariant alternative along with that of the nonparametric maximum likelihood estimate and a nonparametric Bayes estimate.

First Page: Show Hide
Primary Subjects: 62G07
Secondary Subjects: 62G05, 62L20
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1247663763
Digital Object Identifier: doi:10.1214/08-AOS639
Zentralblatt MATH identifier: 05596909
Mathematical Reviews number (MathSciNet): MR2543700

References

[1] Allison, D. B., Gadbury, G. L., Heo, M., Fernández, J. R., Lee, C.-K., Prolla, T. A. and Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data. Comput. Statist. Data Anal. 39 1–20.
Mathematical Reviews (MathSciNet): MR1895555
[2] Barron, A., Schervish, M. J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536–561.
Mathematical Reviews (MathSciNet): MR1714718
Zentralblatt MATH: 0980.62039
Digital Object Identifier: doi:10.1214/aos/1017939142
Project Euclid: euclid.aos/1018031206
[3] Bogdan, M., Ghosh, J. K. and Tokdar, S. (2008). A comparison of the Benjamini–Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen (N. Balakrishnan, E. Peña and M. Silvapulle, eds.) 211–230. IMS, Beachwood, OH.
Mathematical Reviews (MathSciNet): MR2462208
Digital Object Identifier: doi:10.1214/193940307000000158
[4] Clyde, M. A. and George, E. I. (1999). Empirical Bayes estimation in wavelet nonparametric regression. In Bayesian Inference in Wavelet-Based Models. Lecture Notes in Statist. 141 309–322. Springer, New York.
Mathematical Reviews (MathSciNet): MR1699849
Zentralblatt MATH: 0936.62008
Digital Object Identifier: doi:10.1007/978-1-4612-0567-8_19
[5] Cootes, T. and Taylor, C. (1999). A mixture model for representing shape variation. Comput. Imaging Vision 17 567–573.
[6] Durrett, R. (1996). Probability: Theory and Examples, 2nd ed. Duxbury Press, Belmont, CA.
Mathematical Reviews (MathSciNet): MR1609153
[7] Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
Mathematical Reviews (MathSciNet): MR2054289
Zentralblatt MATH: 1089.62502
Digital Object Identifier: doi:10.1198/016214504000000089
[8] Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
Mathematical Reviews (MathSciNet): MR1340510
Zentralblatt MATH: 0826.62021
Digital Object Identifier: doi:10.1080/01621459.1995.10476550
[9] Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257–1272.
Mathematical Reviews (MathSciNet): MR1126324
Zentralblatt MATH: 0729.62033
Digital Object Identifier: doi:10.1214/aos/1176348248
Project Euclid: euclid.aos/1176348248
[10] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
Mathematical Reviews (MathSciNet): MR350949
Zentralblatt MATH: 0255.62037
Digital Object Identifier: doi:10.1214/aos/1176342360
Project Euclid: euclid.aos/1176342360
[11] Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999). Posterior consistency of Dirichlet mixtures in density estimation. Ann. Statist. 27 143–158.
Mathematical Reviews (MathSciNet): MR1701105
Zentralblatt MATH: 0932.62043
Digital Object Identifier: doi:10.1214/aos/1018031105
Project Euclid: euclid.aos/1018031105
[12] Ghosh, J. K. and Tokdar, S. T. (2006). Convergence and consistency of Newton’s algorithm for estimating mixing distribution. In Frontiers in Statistics 429–443. Imp. Coll. Press, London.
Mathematical Reviews (MathSciNet): MR2326012
Zentralblatt MATH: 1119.62020
Digital Object Identifier: doi:10.1142/9781860948886_0019
[13] Johnstone, I. M. and Silverman, B. W. (1990). Speed of estimation in positron emission tomography and related inverse problems. Ann. Statist. 18 251–280.
Mathematical Reviews (MathSciNet): MR1041393
Zentralblatt MATH: 0699.62043
Digital Object Identifier: doi:10.1214/aos/1176347500
Project Euclid: euclid.aos/1176347500
[14] Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27 887–906.
Mathematical Reviews (MathSciNet): MR86464
Zentralblatt MATH: 0073.14701
Digital Object Identifier: doi:10.1214/aoms/1177728066
Project Euclid: euclid.aoms/1177728066
[15] Kushner, H. J. and Yin, G. G. (2003). Stochastic Approximation and Recursive Algorithms and Applications, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR1993642
Zentralblatt MATH: 1026.62084
[16] Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixed distribution. J. Amer. Statist. Assoc. 73 805–811.
Mathematical Reviews (MathSciNet): MR521328
Digital Object Identifier: doi:10.1080/01621459.1978.10480103
[17] Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Ann. Statist. 20 1350–1360.
Mathematical Reviews (MathSciNet): MR1186253
Zentralblatt MATH: 0763.62015
Digital Object Identifier: doi:10.1214/aos/1176348772
Project Euclid: euclid.aos/1176348772
[18] Lindsay, B. G. (1983). The geometry of mixture likelihoods: A general theory. Ann. Statist. 11 86–94.
Mathematical Reviews (MathSciNet): MR684866
Zentralblatt MATH: 0512.62005
Digital Object Identifier: doi:10.1214/aos/1176346059
Project Euclid: euclid.aos/1176346059
[19] Liu, J. S. (1996). Nonparametric hierarchical Bayes via sequential imputations. Ann. Statist. 24 911–930.
Mathematical Reviews (MathSciNet): MR1401830
Zentralblatt MATH: 0880.62038
Digital Object Identifier: doi:10.1214/aos/1032526949
Project Euclid: euclid.aos/1032526949
[20] Martin, R. and Ghosh, J. K. (2009). Stochastic approximation and Newton’s estimate of a mixing distribution. Statist. Sci. 23 365–382.
Mathematical Reviews (MathSciNet): MR2483909
Digital Object Identifier: doi:10.1214/08-STS265
Project Euclid: euclid.ss/1233153064
[21] McLachlan, G., Bean, R. and Peel, D. (2002). A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18 413–422.
[22] Newton, M. A. (2002). On a nonparametric recursive estimator of the mixing distribution. Sankhyā Ser. A 64 306–322.
Mathematical Reviews (MathSciNet): MR1981761
[23] Newton, M. A., Quintana, F. A. and Zhang, Y. (1998). Nonparametric Bayes methods using predictive updating. In Practical Nonparametric and Semiparametric Bayesian Statistics. Lecture Notes in Statist. 133 45–61. Springer, New York.
Mathematical Reviews (MathSciNet): MR1630075
Zentralblatt MATH: 0918.62030
Digital Object Identifier: doi:10.1007/978-1-4612-1732-9_3
[24] Newton, M. A. and Zhang, Y. (1999). A recursive algorithm for nonparametric analysis with missing data. Biometrika 86 15–26.
Mathematical Reviews (MathSciNet): MR1688068
Zentralblatt MATH: 0917.62045
Digital Object Identifier: doi:10.1093/biomet/86.1.15
[25] Pan, W., Lin, J. and Le, C. (2003). A mixture model approach to detecting differentially expressed genes with microarray data. Funct. Integr. Genom. 3 117–124.
[26] Quintana, F. A. and Newton, M. A. (2000). Computational aspects of nonparametric Bayesian analysis with applications to the modeling of multiple binary sequences. J. Comput. Graph. Statist. 9 711–737.
Mathematical Reviews (MathSciNet): MR1821814
[27] Reynolds, D., Quatieri, T. and Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10 19–41.
[28] Robbins, H. (1964). The empirical Bayes approach to statistical decision problems. Ann. Math. Statist. 35 1–20.
Mathematical Reviews (MathSciNet): MR163407
Zentralblatt MATH: 0138.12304
Digital Object Identifier: doi:10.1214/aoms/1177703729
Project Euclid: euclid.aoms/1177703729
[29] Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
Mathematical Reviews (MathSciNet): MR2235051
Zentralblatt MATH: 1087.62039
Digital Object Identifier: doi:10.1016/j.jspi.2005.08.031
[30] Stefanski, L. and Carroll, R. J. (1990). Deconvoluting kernel density estimators. Statistics 21 169–184.
Mathematical Reviews (MathSciNet): MR1054861
Digital Object Identifier: doi:10.1080/02331889008802238
[31] Tang, Y., Ghosal, S. and Roy, A. (2007). Nonparametric Bayesian estimation of positive false discovery rates. Biometrics 63 1126–1134.
Mathematical Reviews (MathSciNet): MR2414590
Digital Object Identifier: doi:10.1111/j.1541-0420.2007.00819.x
[32] Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528–550.
Mathematical Reviews (MathSciNet): MR898357
Zentralblatt MATH: 0619.62029
Digital Object Identifier: doi:10.1080/01621459.1987.10478458
[33] Teicher, H. (1961). Identifiability of mixtures. Ann. Math. Statist. 32 244–248.
Mathematical Reviews (MathSciNet): MR120677
Digital Object Identifier: doi:10.1214/aoms/1177705155
Project Euclid: euclid.aoms/1177705155
[34] Wang, Y. (2007). On fast computation of the non-parametric maximum likelihood estimate of a mixing distribution. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 185–198.
Mathematical Reviews (MathSciNet): MR2325271
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.00583.x
[35] Zhang, C.-H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18 806–831.
Mathematical Reviews (MathSciNet): MR1056338
Zentralblatt MATH: 0778.62037
Digital Object Identifier: doi:10.1214/aos/1176347627
Project Euclid: euclid.aos/1176347627

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?