Institute of Mathematical Statistics Collections

A Bayesian semi-parametric model for small area estimation

Donald Malec and Peter Müller

Full-text: Open access


In public health management there is a need to produce subnational estimates of health outcomes. Often, however, funds are not available to collect samples large enough to produce traditional survey sample estimates for each subnational area. Although parametric hierarchical methods have been successfully used to derive estimates from small samples, there is a concern that the geographic diversity of the U.S. population may be oversimplified in these models. In this paper, a semi-parametric model is used to describe the geographic variability component of the model. Specifically, we assume Dirichlet process mixtures of normals for county-specific random effects. Results are compared to a parametric model based on the base measure of the Dirichlet process, using binary health outcomes related to mammogram usage.

Chapter information

Bertrand Clarke and Subhashis Ghosal, eds., Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2008), 223-236

First available in Project Euclid: 28 April 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 62G07: Density estimation 62-07: Data analysis

Dirichlet process mixture models National Health Interview Survey

Copyright © 2008, Institute of Mathematical Statistics


Malec, Donald; Müller, Peter. A Bayesian semi-parametric model for small area estimation. Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, 223--236, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2008. doi:10.1214/074921708000000165.

Export citation


  • [1] Battese, G. and Fuller, W. (1981). Prediction of county crop areas using survey and satellite data. In Proceedings of the American Statistical Association, Survey Research Section 500–505.
  • [2] Dempster, A. and Tomberlin, T. (1980). The Analysis of Census Undercount form a Postenumeration Survey. In Proceedings of the Conference on Census Undercount, Arlington, VA 88–94.
  • [3] Fay, R. and Herriot, R. (1979). Estimates of income for small places: An application of James–Stein procedures to census data. J. Amer. Statist. Assoc. 74 269–277.
  • [4] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • [5] Ghosh, M., Natarajan, K., Stroud, T. W. F. and Carlin, B. (1998). Generalized linear models for small-area estimation. J. Amer. Statist. Assoc. 93 273–282.
  • [6] MacEachern, S. N. and Müller, P. (2000). Efficient MCMC schemes for robust model extensions using encompassing Dirichlet process mixture models In Robust Bayesian Analysis (F. Ruggeri and D. Ríos-Insua, eds.) 295–316. Springer, New York.
  • [7] Maiti, T. (2001). Robust generalized linear mixed models for small area estimation. J. Statist. Plann. Inference 98 225–238.
  • [8] Malec, D., Davis, W. and Cao, X. (1999). Model based small area estimates of overweight prevalence using sample selection adjustment. Statist. Med. 18 3189–3200.
  • [9] Malec, D. and Sedransk, J. (1992). Bayesian methodology for combining results from different experiments when the specifications for pooling are uncertain. Biometrika 79 593–601.
  • [10] Malec, D., Sedransk, J., Moriarity, C. and Le Clere, F. (1997). State estimates of disability using a hierarchical model: an empirical evaluation. J. Amer. Statist. Assoc. 92 815–826.
  • [11] Massey, J. T., Moore, T. F., Parsons, V. L. and Tadros, W. (1989). Design and Estimation for the National Health Interview Survey, 1985–94. National Center for Health Statistics, Vital and Health Statistics 2 110. Available at
  • [12] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, New York.
  • [13] Müller, P. and Quintana, F. (2004). Nonparametric Bayesian data analysis. Statist. Sci. 19 95–110.
  • [14] National Center for Health Statistics (1968). Synthetic state estimates of disability. PHS Publication No. 1759. Washington, D.C., U.S. Government Printing Office.
  • [15] Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 9 249–265.
  • [16] Rao, J. N. K. (2003). Small Area Estimation. Wiley, Hoboken, NJ.
  • [17] Richardson, S. and Green, P. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. Roy. Statist. Soc. Ser. B 59 731–792.
  • [18] Robert, C. (1995). Inference in mixture models. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.) 441–464. Chapman and Hall, London.
  • [19] Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. J. Amer. Statist. Assoc. 92 894–902.
  • [20] Schaible, W., Brock, D. and Schnack, G. (1977). An empirical comparison of the simple inflation, synthetic and composite estimators for small area statistics. In Proceedings of the American Statistical Association, Social Statistics Section 1017–1021.
  • [21] Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 4 1701–1762.
  • [22] U.S. Department of Health and Human Services (1989). The Area Resource File (ARF) System. Office of Data Analysis and Management (ODAM) Report No. 7–89.
  • [23] Walker, S., Damien, P., Laud, P. and Smith, A. (1999). Bayesian nonparametric inference for distributions and related functions (with discussion). J. Roy. Statist. Soc. Ser. B 61 485–527.