The Annals of Statistics

Estimation in Dirichlet random effects models

Minjung Kyung, Jeff Gill, and George Casella

Full-text: Open access

Abstract

We develop a new Gibbs sampler for a linear mixed model with a Dirichlet process random effect term, which is easily extended to a generalized linear mixed model with a probit link function. Our Gibbs sampler exploits the properties of the multinomial and Dirichlet distributions, and is shown to be an improvement, in terms of operator norm and efficiency, over other commonly used MCMC algorithms. We also investigate methods for the estimation of the precision parameter of the Dirichlet process, finding that maximum likelihood may not be desirable, but a posterior mode is a reasonable approach. Examples are given to show how these models perform on real data. Our results complement both the theoretical basis of the Dirichlet process nonparametric prior and the computational work that has been done to date.

Article information

Source
Ann. Statist. Volume 38, Number 2 (2010), 979-1009.

Dates
First available: 19 February 2010

Permanent link to this document
http://projecteuclid.org/euclid.aos/1266586620

Digital Object Identifier
doi:10.1214/09-AOS731

Zentralblatt MATH identifier
05686525

Mathematical Reviews number (MathSciNet)
MR2604702

Subjects
Primary: 62F99: None of the above, but in this section
Secondary: 62P25: Applications to social sciences 62G99: None of the above, but in this section

Keywords
Linear mixed models generalized linear mixed models hierarchical models Gibbs sampling Bayes estimation

Citation

Kyung, Minjung; Gill, Jeff; Casella, George. Estimation in Dirichlet random effects models. The Annals of Statistics 38 (2010), no. 2, 979--1009. doi:10.1214/09-AOS731. http://projecteuclid.org/euclid.aos/1266586620.


Export citation

References

  • Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions. J. Roy. Statist. Soc. Ser. B 36 99–102.
  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • Barry, D. and Hartigan, J. A. (1992). Product partition models for change point problems. Ann. Statist. 20 260–279.
  • Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist. 1 353–355.
  • Booth, J. G., Casella, G. and Hobert, J. P. (2008). Clustering using objective functions and stochastic search. J. Roy. Statist. Soc. Ser. B 70 119–140.
  • Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9–25.
  • Burr, D. and Doss, H. (2005). A Bayesian semi-parametric model for random effects meta-analysis. J. Amer. Statist. Assoc. 100 242–251.
  • Casella, G. (2001). Empirical Bayes Gibbs sampling. Biostatistics 2 485–500.
  • Crowley, E. M. (1997). Product partition models for normal means. J. Amer. Statist. Assoc. 92 192–198.
  • Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates (with discussion). Ann. Statist. 14 1–67.
  • Dorazio, R. M., Mukherjee, B., Zhang, L., Ghosh, M., Jelks, H. L. and Jordan, F. (2008). Modelling unobserved sources of heterogeneity in animal abundance using a Dirichlet process prior. Biometrics 64 635–644.
  • Doss, H. (1985a). Bayesian nonparametric estimation of the median. I: Computation of the estimates. Ann. Statist. 13 1432–1444.
  • Doss, H. (1985b). Bayesian nonparametric estimation of the median. II: Asymptotic properties of the estimates. Ann. Statist. 13 1445–1464.
  • Doss, H. (1994). Bayesian nonparametric estimation for incomplete data via successive substitution sampling. Ann. Statist. 22 1763–1786.
  • Doss, H. (2008). Estimation of Bayes factors for nonparametric Bayes problems via Radon–Nikodym derivatives. Technical report, Dept. Statistics, Univ. Florida.
  • Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • Ghosh, M., Natarajan, K., Stroud, T. W. F. and Carlin, B. P. (1998). Generalized linear models for small-area estimation. J. Amer. Statist. Assoc. 93 273–282.
  • Ghosal, S. (2009). Dirichlet process, related priors and posterior asymptotics. In Bayesian Nonparametrics in Practice (N. L. Hjort et al., eds.). Cambridge Univ. Press. To appear.
  • Ghosal, S., Ghosh, J. K. and Ramamoorthi, R. V. (1999). Consistent semiparametric Bayesian inference about a location parameter. J. Statist. Plann. Inference 77 181–193.
  • Gill, J. and Casella, G. (2009). Nonparametric priors for ordinal Bayesian social science models: Specification and estimation. J. Amer. Statist. Assoc. 104 453–464.
  • Hartigan, J. A. (1990). Partition models. Comm. Statist. 19 2745–2756.
  • Hobert, J. P. and Marchev, D. (2008). A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms. Ann. Statist. 36 532–554.
  • Korwar, R. M. and Hollander, M. (1973). Contributions to the theory of Dirichlet processes. Ann. Probab. 1 705–711.
  • Kyung, M., Gill, J. and Casella, G. (2009). Sampling schemes for generalized linear Dirichlet random effects models. Technical report, Dept. Statistics, Univ. Florida. Available at www.stat.ufl.edu/~casella/Papers.
  • Liu, J. S. (1996). Nonparametric hierarchical Bayes via sequential imputations. Ann. Statist. 24 911–930.
  • Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates: I. Density estimates. Ann. Statist. 12 351–357.
  • MacEachern, S. N. and Müller, P. (1998). Estimating mixture of Dirichlet process models. J. Comput. Graph. Statist. 7 223–238.
  • McCullagh, P. and Yang, J. (2006). Stochastic classification models. In International Congress of Mathematicians III 669–686. Eur. Math. Soc., Zürich.
  • Mira, A. (2001). Ordering and improving the performance of Monte Carlo Markov chains. Statist. Sci. 16 340–350.
  • Mira, A. and Geyer, C. J. (1999). Ordering Monte Carlo Markov chains. Technical Report 632, School of Statistics, Univ. Minnesota.
  • Mukhopadhyay, S. and Gelfand, A. E. (1997). Dirichlet process mixed generalized linear models. J. Amer. Statist. Assoc. 92 633–679.
  • Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 9 249–265.
  • Naskar, M. and Das, K. (2004). Inference in Dirichlet process mixed generalized linear models by using Monte Carlo EM. Aust. N. Z. J. Stat. 46 685–701.
  • Naskar, M. and Das, K. (2006). Semiparametric analysis of two level bivariate binary data. Biometrics 62 1004–1013.
  • Pitman, J. (1996). Some developments of the Blackwell–MacQueen urn scheme. In Statistics, Probability and Game Theory (T. S. Ferguson, L. S. Shipley and J. B. MacQueen, eds.) 30 245–267. IMS, Hayward, CA.
  • Quintana, F. A. and Iglesias, P. L. (2003). Bayesian clustering and product partition models. J. Roy. Statist. Soc. Ser. B 65 557–574.
  • Roy, V. and Hobert, J. P. (2007). Convergence rates and asymptotic standard errors for Markov chain Monte Carlo algorithms for Bayesian probit regression. J. Roy. Statist. Soc. Ser. B 69 607–623.
  • Schwartz, L. (1965). On Bayes procedures. Probab. Theory Related Fields 4 10–46.
  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statist. Sinica 4 639–650.