Statistical Science

Small Area Shrinkage Estimation

G. Datta and M. Ghosh

Full-text: Open access

Abstract

The need for small area estimates is increasingly felt in both the public and private sectors in order to formulate their strategic plans. It is now widely recognized that direct small area survey estimates are highly unreliable owing to large standard errors and coefficients of variation. The reason behind this is that a survey is usually designed to achieve a specified level of accuracy at a higher level of geography than that of small areas. Lack of additional resources makes it almost imperative to use the same data to produce small area estimates. For example, if a survey is designed to estimate per capita income for a state, the same survey data need to be used to produce similar estimates for counties, subcounties and census divisions within that state. Thus, by necessity, small area estimation needs explicit, or at least implicit, use of models to link these areas. Improved small area estimates are found by “borrowing strength” from similar neighboring areas.

The key to small area estimation is shrinkage of direct estimates toward some regression estimates obtained by using in addition administrative records and other available sources of information. These shrinkage estimates can often be motivated from both a Bayesian and a frequentist point of view, and indeed in this particular context, it is possible to obtain at least an operational synthesis between the two paradigms. Thus, on one hand, while small area estimates can be developed using a hierarchical Bayesian or an empirical Bayesian approach, similar estimates are also found using the theory of best linear unbiased prediction (BLUP) or empirical best linear unbiased prediction (EBLUP).

The present article discusses primarily normal theory-based small area estimation techniques, and attempts a synthesis between both the Bayesian and the frequentist points of view. The results are mostly discussed for random effects models and their hierarchical Bayesian counterparts. A few miscellaneous remarks are made at the end describing the current research for more complex models including some nonnormal ones. Also provided are some pointers for future research.

Article information

Source
Statist. Sci., Volume 27, Number 1 (2012), 95-114.

Dates
First available in Project Euclid: 14 March 2012

Permanent link to this document
https://projecteuclid.org/euclid.ss/1331729985

Digital Object Identifier
doi:10.1214/11-STS374

Mathematical Reviews number (MathSciNet)
MR2953498

Zentralblatt MATH identifier
1330.62286

Keywords
Area-level models BLUP confidence intervals EBLUP empirical Bayes hierarchical Bayes mean squared error multivariate second-order unbiased unit-level models

Citation

Datta, G.; Ghosh, M. Small Area Shrinkage Estimation. Statist. Sci. 27 (2012), no. 1, 95--114. doi:10.1214/11-STS374. https://projecteuclid.org/euclid.ss/1331729985


Export citation

References

  • Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error components model for prediction of county crop area using survey and satellite data. J. Amer. Statist. Assoc. 83 28–36.
  • Booth, J. G. and Hobert, J. P. (1998). Standard errors of prediction in generalized linear mixed models. J. Amer. Statist. Assoc. 93 262–272.
  • Butar, F. B. and Lahiri, P. (2003). On measures of uncertainty of empirical Bayes small-area estimators. J. Statist. Plann. Inference 112 63–76.
  • Carlin, B. P. and Gelfand, A. E. (1990). Approaches for empirical Bayes confidence intervals. J. Amer. Statist. Assoc. 85 105–114.
  • Cox, D. R. (1975). Prediction intervals and empirical Bayes confidence intervals. In Perspectives in Probability and Statistics (Papers in Honour of M. S. Bartlett on the Occasion of His 65th Birthday) (J. Gani, ed.) 47–55. Applied Probability Trust, Sheffield, UK.
  • Cressie, N. (1989). Empirical Bayes estimation of undercount in the decennial census. J. Amer. Statist. Assoc. 84 1033–1044.
  • Datta, G. S. (1992). A unified Bayesian prediction theory for mixed linear models with application. Statist. Decisions 10 337–365.
  • Datta, G. S. (2009). Model-based approach to small area estimation. In Handbook of Statistics: Sample Surveys: Inference and Analysis, Volume 29B (D. Pfeffermann and C. R. Rao, eds.) 251–288. North-Holland, Amsterdam.
  • Datta, G. S., Fay, R. E. and Ghosh, M. (1991). Hierarchical and empirical multivariate Bayes analysis in small area estimation. In Proceedings of the Seventh Annual Research Conference of the Bureau of the Census 63–79. U.S. Department of Commerce, Washington, DC.
  • Datta, G. S. and Ghosh, M. (1991a). Asymptotic optimality of hierarchical Bayes estimators and predictors. J. Statist. Plann. Inference 29 229–243.
  • Datta, G. S. and Ghosh, M. (1991b). Bayesian prediction in linear models: Applications to small area estimation. Ann. Statist. 19 1748–1770.
  • Datta, G. S., Ghosh, M., Huang, E., Isaki, C., Schultz, L. and Tsay, J. (1992). Hierarchical and empirical Bayes methods for adjustment of census undercount: The 1988 Missouri dress rehearsal data. Survey Methodology 18 95–108.
  • Datta, G. S., Ghosh, M., Nangia, N. and Natarajan, K. (1996). Estimation of median income of four-person families: A Bayesian approach. In Bayesian Analysis in Statistics and Econometrics. (D. A. Berry, K. M. Chaloner and J. K. Geweke, eds.) 129–140. Wiley, New York.
  • Datta, G. S., Ghosh, M., Smith, D. D. and Lahiri, P. (2002). On an asymptotic theory of conditional and unconditional coverage probabilities of empirical Bayes confidence intervals. Scand. J. Stat. 29 139–152.
  • Datta, G. S., Kubokawa, T., Molina, I. and Rao, J. N. K. (2011). Estimation of mean squared error of model-based small area estimators. TEST 20 367–388.
  • Datta, G. S. and Lahiri, P. (2000). A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Statist. Sinica 10 613–627.
  • Datta, G. S., Rao, J. N. K. and Smith, D. D. (2005). On measuring the variability of small area estimators under a basic area level model. Biometrika 92 183–196.
  • Dempster, A. P. and Tomberlin, T. J. (1980). The analysis of census undercount from a post-enumeration survey. In Proceedings of the Conference on Census Undercount 88–94. U.S. Department of Commerce, Washington, DC.
  • Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors—an empirical Bayes approach. J. Amer. Statist. Assoc. 68 117–130.
  • Farrell, P. J., MacGibbon, B. and Tomberlin, T. J. (1997). Empirical Bayes estimators of small area proportions in multistage designs. Statist. Sinica 7 1065–1083.
  • Fay, R. E. III and Herriot, R. A. (1979). Estimates of income for small places: An application of James–Stein procedures to census data. J. Amer. Statist. Assoc. 74 269–277.
  • Ganesh, N. (2009). Simultaneous credible intervals for small area estimation problems. J. Multivariate Anal. 100 1610–1621.
  • Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409.
  • Ghosh, M. (1992a). Hierarchical and empirical Bayes multivariate estimation. In Current Issues in Statistical Inference: Essays in Honor of D. Basu. Institute of Mathematical Statistics Lecture Notes—Monograph Series 17 151–177. IMS, Hayward, CA.
  • Ghosh, M. (1992b). Constrained Bayes estimation with applications. J. Amer. Statist. Assoc. 87 533–540.
  • Ghosh, M., Kim, M. J. and Kim, D. (2007). Constrained Bayes and empirical Bayes estimation with balanced loss functions. Comm. Statist. Theory Methods 36 1527–1542.
  • Ghosh, M., Kim, M. J. and Kim, D. H. (2008). Constrained Bayes and empirical Bayes estimation under random effects normal ANOVA model with balanced loss function. J. Statist. Plann. Inference 138 2017–2028.
  • Ghosh, M. and Maiti, T. (2004). Small-area estimation based on natural exponential family quadratic variance function models and survey weights. Biometrika 91 95–112.
  • Ghosh, M., Natarajan, K., Stroud, T. W. F. and Carlin, B. P. (1998). Generalized linear models for small-area estimation. J. Amer. Statist. Assoc. 93 273–282.
  • Ghosh, M. and Rao, J. N. K. (1994). Small area estimation: An appraisal. Statist. Sci. 9 55–93.
  • Ghosh, M. and Sinha, K. (2007). Empirical Bayes estimation in finite population sampling under functional measurement error models. J. Statist. Plann. Inference 137 2759–2773.
  • Ghosh, M., Sinha, K. and Kim, D. (2006). Empirical and hierarchical Bayesian estimation in finite population sampling under structural measurement error models. Scand. J. Stat. 33 591–608.
  • Hall, P. and Maiti, T. (2006). On parametric bootstrap methods for small area prediction. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 221–238.
  • Harville, D. A. (1990). BLUP (best linear unbiased estimation) and beyond. In Advances in Statistical Methods for Genetic Improvement of Livestock (D. Gianola and K. Hammond, eds.) 239–276. Springer, New York.
  • Henderson, C. R. (1953). Estimation of variance and covariance components. Biometrics 9 226–252.
  • Hill, J. R. (1990). A general framework for model-based statistics. Biometrika 77 115–126.
  • Isaki, C. T., Huang, E. T. and Tsay, J. H. (1991). Smoothing adjustment factors from the 1990 post enumeration survey. In Proceedings of the Social Statistics Section 338–343. Amer. Statist. Assoc., Alexandria, VA.
  • James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361–379. Univ. California Press, Berkeley, CA.
  • Jiang, J. (1998). Consistent estimators in generalized linear mixed models. J. Amer. Statist. Assoc. 93 720–729.
  • Jiang, J., Lahiri, P. and Wan, S.-M. (2002). A unified jackknife theory for empirical best prediction with M-estimation. Ann. Statist. 30 1782–1810.
  • Jiang, J. and Zhang, W. (2001). Robust estimation in generalised linear mixed models. Biometrika 88 753–765.
  • Kackar, R. N. and Harville, D. A. (1984). Approximations for standard errors of estimators of fixed and random effects in mixed linear models. J. Amer. Statist. Assoc. 79 853–862.
  • Lahiri, P. (2003). On the impact of bootstrap in survey sampling and small-area estimation. Statist. Sci. 18 199–210.
  • Lahiri, P. and Rao, J. N. K. (1995). Robust estimation of mean squared error of small area estimators. J. Amer. Statist. Assoc. 90 758–766.
  • Laird, N. M. and Louis, T. A. (1987). Empirical Bayes confidence intervals based on bootstrap samples. J. Amer. Statist. Assoc. 82 739–757.
  • Lemmer, H. H. (1988). Shrinkage estimators. In Encyclopedia of Statistical Sciences. Vol. 8 (S. Kotz, N. L. Johnson and C. B. Read, eds.) 452–456. Wiley, New York.
  • Lindley, D. V. (1962). Discussion of Professor Stein’s paper ‘Confidence sets for the mean of a multivariate normal distribution’. J. R. Stat. Soc. Ser. B 24 285–287.
  • Lindley, D. V. and Smith, A. F. M. (1972). Bayes estimates for the linear model. J. R. Stat. Soc. Ser. B Stat. Methodol. 34 1–41.
  • Louis, T. A. (1984). Estimating a population of parameter values using Bayes and empirical Bayes methods. J. Amer. Statist. Assoc. 79 393–398.
  • MacGibbon, B. and Tomberlin, T. J. (1989). Small area estimates of proportions via empirical Bayes techniques. Survey Methodology 15 237–252.
  • Malec, D. and Sedransk, J. (1985). Bayesian inference for finite population parameters in multistage cluster sampling. J. Amer. Statist. Assoc. 80 897–902.
  • Morris, C. N. (1983a). Parametric empirical Bayes confidence intervals. In Scientific Inference, Data Analysis, and Robustness (Madison, Wis., 1981) (G. E. P Box, T. Leonard and J. Wu, eds.). Publ. Math. Res. Center Univ. Wisconsin 48 25–50. Academic Press, Orlando, FL.
  • Morris, C. N. (1983b). Parametric empirical Bayes inference: Theory and applications. J. Amer. Statist. Assoc. 78 47–65.
  • Otto, M. C. and Bell, W. R. (1995). Sampling error modeling of poverty and income statistics for states. In Proceedings of the American Statistical Association, Government Statistics Section 160–165. Amer. Statist. Assoc., Alexandria, VA.
  • Pfeffermann, D. (2002). Small area estimation- new developments and directions. Int. Statist. Rev. 70 125–143.
  • Pfeffermann, D. and Tiller, R. (2005). Bootstrap approximation to prediction MSE for state-space models with estimated parameters. J. Time Series Anal. 26 893–916.
  • Prasad, N. G. N. and Rao, J. N. K. (1990). The estimation of the mean squared error of small-area estimators. J. Amer. Statist. Assoc. 85 163–171.
  • Rao, J. N. K. (1999). Some recent advances in model-based small area estimation. Survey Methodology 25 175–186.
  • Rao, J. N. K. (2001). EB and EBLUP in small area estimation. In Empirical Bayes and Likelihood Inference (Montreal, QC, 1997) (S. E. Ahmed and N. Reid, eds.). Lecture Notes in Statist. 148 33–43. Springer, New York.
  • Rao, J. N. K. (2003a). Small Area Estimation. Wiley-Interscience, Hoboken, NJ.
  • Rao, J. N. K. (2003b). Some new developments in small area estimation. J. Iran. Stat. Soc. 2 145–169.
  • Rivest, L. P. and Vandal, N. (2004). Mean squared error estimation for small areas when the small area variances are estimated. In Proceedings of the International Conference on Recent Advances in Survey Sampling, Laboratory for Research in Statistics and Probability. Carleton Univ., Ottawa, Canada.
  • Scott, A. J. and Smith, T. M. F. (1969). Estimation in multistage surveys. J. Amer. Statist. Assoc. 64 830–840.
  • Shen, W. and Louis, T. A. (1998). Triple-goal estimates in two-stage hierarchical models. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 455–471.
  • Singh, A. C., Stukel, D. M. and Pfeffermann, D. (1998). Bayesian versus frequentist measures of error in small area estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 377–396.
  • Smith, D. D. (2001). Minimum Hellinger distance estimation for the exponential distribution and hierarchical bayesian approaches in small area estimation. Unpublished Ph.D. dissertation, Dept. Statistics, Univ. Georgia, Athens, GA.
  • Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 19541955, Vol. I 197–206. Univ. California Press, Berkeley.
  • Stein, C. M. (1962). Confidence sets for the mean of a multivariate normal distribution. J. R. Stat. Soc. Ser. B 24 265–296.
  • Torabi, M., Datta, G. S. and Rao, J. N. K. (2009). Empirical Bayes estimation of small area means under a nested error linear regression model with measurement errors in the covariates. Scand. J. Stat. 36 355–368.
  • Wang, J. and Fuller, W. A. (2003). The mean squared error of small area predictors constructed with estimated area variances. J. Amer. Statist. Assoc. 98 716–723.
  • Wolter, K. M. (1985). Introduction to Variance Estimation. Springer, New York.
  • Zellner, A. (1988). Bayesian analysis in econometrics. J. Econometrics 37 27–50.
  • Zellner, A. (1994). Bayesian and non-Bayesian estimation using balanced loss functions. In Statistical Decision Theory and Related Topics, V (West Lafayette, IN, 1992) 377–390. Springer, New York.