The Annals of Statistics

Empirical best prediction under a nested error model with log transformation

Isabel Molina and Nirian Martín

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


In regression models involving economic variables such as income, log transformation is typically taken to achieve approximate normality and stabilize the variance. However, often the interest is predicting individual values or means of the variable in the original scale. Under a nested error model for the log transformation of the target variable, we show that the usual approach of back transforming the predicted values may introduce a substantial bias. We obtain the optimal (or “best”) predictors of individual values of the original variable and of small area means under that model. Empirical best predictors are defined by estimating the unknown model parameters in the best predictors. When estimation is desired for subpopulations with small sample sizes (small areas), nested error models are widely used to “borrow strength” from the other areas and obtain estimators with greater efficiency than direct estimators based on the scarce area-specific data. We show that naive predictors of small area means obtained by back-transformation under the mentioned model may even underperform direct estimators. Moreover, assessing the uncertainty of the considered predictor is not straightforward. Exact mean squared errors of the best predictors and second-order approximations to the mean squared errors of the empirical best predictors are derived. Estimators of the mean squared errors that are second-order correct are also obtained. Simulation studies and an example with Mexican data on living conditions illustrate the procedures.

Article information

Ann. Statist., Volume 46, Number 5 (2018), 1961-1993.

Received: October 2016
Revised: March 2017
First available in Project Euclid: 17 August 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62D05: Sampling theory, sample surveys
Secondary: 62G09: Resampling methods

Empirical best estimator mean squared error parametric bootstrap


Molina, Isabel; Martín, Nirian. Empirical best prediction under a nested error model with log transformation. Ann. Statist. 46 (2018), no. 5, 1961--1993. doi:10.1214/17-AOS1608.

Export citation


  • [1] Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. J. Amer. Statist. Assoc. 83 28–36.
  • [2] Butar, F. B. and Lahiri, P. (2003). On measures of uncertainty of empirical Bayes small-area estimators. J. Statist. Plann. Inference 112 63–76.
  • [3] Das, K., Jiang, J. and Rao, J. N. K. (2004). Mean squared error of empirical predictor. Ann. Statist. 32 818–840.
  • [4] Elbers, C., Lanjouw, J. O. and Lanjouw, P. (2003). Micro-level estimation of poverty and inequality. Econometrica 71 355–364.
  • [5] Fay, R. E. and Herriot, R. A. (1979). Estimation of income from small places: An application of James–Stein procedures to census data. J. Amer. Statist. Assoc. 74 269–277.
  • [6] González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D. and Santamaría, L. (2008). Bootstrap mean squared error of a small-area EBLUP. J. Stat. Comput. Simul. 78 443–462.
  • [7] Hall, P. and Maiti, T. (2006). Nonparametric estimation of mean-squared prediction error in nested-error regression models. Ann. Statist. 34 1733–1750.
  • [8] Hall, P. and Maiti, T. (2006). On parametric bootstrap methods for small area prediction. J. Roy. Statist. Soc. Ser. B 68 221–238.
  • [9] Jiang, J., Lahiri, P. and Nguyen, T. (2016). A unified Monte-Carlo jackknife for small area estimation under model selection. Ann. Math. Sci. Appl. To appear.
  • [10] Mauro, F., Molina, I., García-Abril, A., Valbuena, R. and Ayuga-Téllez, E. (2016). Remote sensing estimates and measures of uncertainty for forest variables at different aggregation levels. Environmetrics 27 225–238.
  • [11] Miller, J. J. (1973). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. Ann. Statist. 5 746–762.
  • [12] Molina, I. (2009). Uncertainty under a multivariate nested-error regression model with logarithmic transformation. J. Multivariate Anal. 100 963–980.
  • [13] Molina, I. and Marhuenda, Y. (2015). sae: An R package for small area estimation. R J. 7 81–98.
  • [14] Molina, I. and Martín, N. (2018). Supplement to “Empirical best prediction under a nested error model with log transformation.” DOI:10.1214/17-AOS1608SUPP.
  • [15] Molina, I., Nandram, B. and Rao, J. N. K. (2014). Small area estimation of general parameters with application to poverty indicators: A hierarchical Bayes approach. Ann. Appl. Stat. 8 852–885.
  • [16] Molina, I. and Rao, J. N. K. (2010). Small area estimation of poverty indicators. Canad. J. Statist. 38 369–385.
  • [17] Pfeffermann, D. (2013). New important developments in small area estimation. Statist. Sci. 28 40–68.
  • [18] Pfeffermann, D. and Tiller, R. (2005). Bootstrap approximation to prediction MSE for state-space models with estimated parameters. J. Time Series Anal. 26 893–916.
  • [19] Rao, J. N. K. and Molina, I. (2015). Small Area Estimation, 2nd ed. Wiley, Hoboken, NJ.
  • [20] Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components. Wiley, New York.
  • [21] Slud, E. and Maiti, T. (2006). Mean-squared error estimation in transformed Fay–Herriot models. J. Roy. Statist. Soc. Ser. B 68 239–257.

Supplemental materials

  • Supplement to “Empirical best prediction under a nested error model with log transformation”. This document contains results on the bias of the proposed and existing predictors, simulation results for prediction at the individual level, on the performance of the bootstrap MSE estimator compared with the analytical estimator, and additional results on the application with Mexican data.