Statistical Science
- Statist. Sci.
- Volume 32, Number 2 (2017), 190-205.
Model-Assisted Survey Estimation with Modern Prediction Techniques
F. Jay Breidt and Jean D. Opsomer
Full-text: Access has been disabled (more information)
Abstract
This paper reviews the design-based, model-assisted approach to using data from a complex survey together with auxiliary information to estimate finite population parameters. A general recipe for deriving model-assisted estimators is presented and design-based asymptotic analysis for such estimators is reviewed. The recipe allows for a very broad class of prediction methods, with examples from the literature including linear models, linear mixed models, nonparametric regression and machine learning techniques.
Article information
Source
Statist. Sci. Volume 32, Number 2 (2017), 190-205.
Dates
First available in Project Euclid: 11 May 2017
Permanent link to this document
http://projecteuclid.org/euclid.ss/1494489811
Digital Object Identifier
doi:10.1214/16-STS589
Keywords
Machine learning nonparametric regression nearest neighbors neural network regression trees survey asymptotics
Citation
Breidt, F. Jay; Opsomer, Jean D. Model-Assisted Survey Estimation with Modern Prediction Techniques. Statist. Sci. 32 (2017), no. 2, 190--205. doi:10.1214/16-STS589. http://projecteuclid.org/euclid.ss/1494489811.
References
- Aragon, Y., Goga, C. and Ruiz-Gazen, A. (2006). Estimation non-paramétrique de quantiles en présence d’information auxiliaire. In Méthodes D’Enquêtes et Sondages. Pratiques Européenne et Nord-américaine (P. Lavellée and L.-P. Rivest, eds.) 377–382. Dunod, Paris.
- Baffetta, F., Corona, P. and Fattorini, L. (2010). Design-based diagnostics for k-NN estimators of forest resources. Can. J. For. Res. 41 59–72.
- Baffetta, F., Fattorini, L., Franceschi, S. and Corona, P. (2009). Design-based approach to k-nearest neighbours technique for coupling field and remotely sensed data in forest surveys. Remote Sens. Environ. 113 463–475.
- Bardsley, P. and Chambers, R. L. (1984). Multipurpose estimation from unbalanced samples. J. R. Stat. Soc. Ser. C. Appl. Stat. 33 290–299.Zentralblatt MATH: 0576.62010
- Battese, G. E., Harter, R. M. and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. J. Amer. Statist. Assoc. 83 28–36.
- Beaumont, J. F. and Bocci, C. (2008). Another look at ridge calibration. Metron 66 5–20.Zentralblatt MATH: 1151.62010
- Beaumont, J.-F., Haziza, D. and Ruiz-Gazen, A. (2013). A unified approach to robust estimation in finite population sampling. Biometrika 100 555–569.Mathematical Reviews (MathSciNet): MR3094437
Zentralblatt MATH: 06286959
Digital Object Identifier: doi:10.1093/biomet/ast010 - Bickel, P. J. and Freedman, D. A. (1984). Asymptotic normality and the bootstrap in stratified sampling. Ann. Statist. 12 470–482.
- Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51 279–292.
- Breidt, F. J., Claeskens, G. and Opsomer, J. D. (2005). Model-assisted estimation for complex surveys using penalised splines. Biometrika 92 831–846.
- Breidt, F. J. and Opsomer, J. D. (2000). Local polynomial regression estimators in survey sampling. Ann. Statist. 28 1026–1053.Mathematical Reviews (MathSciNet): MR1810918
Zentralblatt MATH: 1105.62302
Digital Object Identifier: doi:10.1214/aos/1015956706
Project Euclid: euclid.aos/1015956706 - Breidt, F. J. and Opsomer, J. D. (2008). Endogenous post-stratification in surveys: Classifying with a sample-fitted model. Ann. Statist. 36 403–427.
- Breidt, F. J., Opsomer, J. D. and Sanchez-Borrego, I. (2016). Nonparametric variance estimation under fine stratification: An alternative to collapsed strata. J. Amer. Statist. Assoc. 111 822–833.
- Breidt, F. J., Opsomer, J. D., Johnson, A. A. and Ranalli, M. G. (2007). Semiparametric model-assisted estimation for natural resource surveys. Surv. Methodol. 33 35–44.
- Breiman, L. (1996). Bagging predictors. Mach. Learn. 24 123–140.Zentralblatt MATH: 0858.68080
- Breiman, L. (2001). Random forests. Mach. Learn. 45 5–32.
- Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA.Zentralblatt MATH: 0541.62042
- Cardot, H., Goga, C. and Lardin, P. (2013). Uniform convergence and asymptotic confidence bands for model-assisted estimators of the mean of sampled functional data. Electron. J. Stat. 7 562–596.
- Cardot, H. and Josserand, E. (2011). Horvitz–Thompson estimators for functional data: Asymptotic confidence bands and optimal allocation for stratified sampling. Biometrika 98 107–118.
- Cassel, C. M., Särndal, C. E. and Wretman, J. H. (1976). Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63 615–620.
- Chambers, R. L. (1996). Robust case-weighting for multipurpose establishment surveys. J. Off. Stat. 12 3–32.
- Cochran, W. G. (1977). Sampling Techniques, 3rd ed. Wiley, New York.
- Dahlke, M., Breidt, F. J., Opsomer, J. D. and Van Keilegom, I. (2013). Nonparametric endogenous post-stratification estimation. Statist. Sinica 23 189–211.
- Datta, G. S. and Ghosh, M. (1991). Bayesian prediction in linear models: Applications to small area estimation. Ann. Statist. 19 1748–1770.
- Deville, J.-C. and Goga, C. (2004). Estimation par régression par polynômes locaux dans des enquêtes sur plusieurs échantillons. In Echantillonnage et Méthodes D’Enquêtes (P. Ardilly, ed.) 156–162. Dunod, Paris.
- Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 87 376–382.Mathematical Reviews (MathSciNet): MR1173804
Zentralblatt MATH: 0760.62010
Digital Object Identifier: doi:10.1080/01621459.1992.10475217 - Elliott, M. R. and Little, R. J. A. (2000). Model-based alternatives to trimming survey weights. J. Off. Stat. 16 191–209.
- Fay, R. E. and Herriot, R. A. (1979). Estimation of income from small places: An application of James–Stein procedures to census data. J. Amer. Statist. Assoc. 74 269–277.Mathematical Reviews (MathSciNet): MR548019
Digital Object Identifier: doi:10.1080/01621459.1979.10482505 - Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1–141.Mathematical Reviews (MathSciNet): MR1091842
Zentralblatt MATH: 0765.62064
Digital Object Identifier: doi:10.1214/aos/1176347963
Project Euclid: euclid.aos/1176347963 - Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
- Fuller, W. A. (2002). Regression estimation for survey samples (with discussion). Surv. Methodol. 28 5–23.
- Ghosh, M. and Rao, J. N. K. (1994). Small area estimation: An appraisal. Statist. Sci. 9 55–93.
- Goga, C. (2004). Estimation de l’évolution d’un total en présence d’information auxiliaire: Une approche par splines de régression. C. R. Math. Acad. Sci. Paris 339 441–444.Mathematical Reviews (MathSciNet): MR2092760
Digital Object Identifier: doi:10.1016/j.crma.2004.07.011 - Goga, C. (2005). Réduction de la variance dans les sondages en présence d’information auxiliaire: Une approche non paramétrique par splines de régression. Canad. J. Statist. 33 163–180.
- Guggemos, F. and Tillé, Y. (2010). Penalized calibration in survey sampling: Design-based estimation assisted by mixed models. J. Statist. Plann. Inference 140 3199–3212.
- Hájek, J. (1960). Limiting distributions in simple random sampling from a finite population. Magy. Tud. Akad. Mat. Kut. Intéz. Közl. 5 361–374.
- Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.Zentralblatt MATH: 0973.62007
- Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
- Isaki, C. T. and Fuller, W. A. (1982). Survey design under the regression superpopulation model. J. Amer. Statist. Assoc. 77 89–96.
- Krewski, D. and Rao, J. N. K. (1981). Inference from stratified samples: Properties of the linearization, jackknife and balanced repeated replication methods. Ann. Statist. 9 1010–1019.Mathematical Reviews (MathSciNet): MR628756
Zentralblatt MATH: 0474.62013
Digital Object Identifier: doi:10.1214/aos/1176345580
Project Euclid: euclid.aos/1176345580 - Lazzeroni, L. C. and Little, R. J. A. (1998). Random-effects models for smoothing poststratification weights. J. Off. Stat. 14 61–78.
- Li, X. and Opsomer, J. D. (2006). Model averaging in survey estimation. In Proceedings of the Section on Survey Research Methods. Amer. Statist. Assoc., Alexandria, VA.
- McConville, K. (2011). Improved Estimation for Complex Surveys Using Modern Regression Techniques. Ph.D. thesis, Colorado State University.
- McConville, K. S. and Breidt, F. J. (2013). Survey design asymptotics for the model-assisted penalised spline regression estimator. J. Nonparametr. Stat. 25 745–763.Zentralblatt MATH: 06231497
- McRoberts, R. E., Næsset, E. and Gobakken, T. (2013). Inference for lidar-assisted estimation of forest growing stock volume. Remote Sens. Environ. 128 268–275.
- McRoberts, R. E., Tomppo, E. O. and Næsset, E. (2010). Advances and emerging issues in national forest inventories. Scand. J. For. Res. 25 368–381.
- Montanari, G. E. and Ranalli, M. G. (2005). Nonparametric model calibration estimation in survey sampling. J. Amer. Statist. Assoc. 100 1429–1442.
- Montanari, G. E. and Ranalli, M. G. (2009). Multiple and ridge model calibration. In Proceedings of Workshop on Calibration and Estimation in Surveys. Statistics Canada, Ottawa, ON.
- Næsset, E., Bollandsås, O. M., Gobakken, T., Gregoire, T. G. and Ståhl, G. (2013). Model-assisted estimation of change in forest biomass over an 11 year period in a sample survey supported by airborne lidar: A case study with post-stratification to provide “activity data”. Remote Sens. Environ. 128 299–314.
- Opsomer, J. D., Breidt, F. J., Moisen, G. G. and Kauermann, G. (2007). Model-assisted estimation of forest resources with generalized additive models. J. Amer. Statist. Assoc. 102 400–409.
- Opsomer, J. D., Claeskens, G., Ranalli, M. G., Kauermann, G. and Breidt, F. J. (2008). Non-parametric small area estimation using penalized spline regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 265–286.Mathematical Reviews (MathSciNet): MR2412642
Zentralblatt MATH: 05563354
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.00635.x - Park, M. and Fuller, W. A. (2005). Towards nonnegative regression weights for survey samples. Surv. Methodol. 31 85–93.
- Park, M. and Fuller, W. A. (2009). The mixed model for survey regression estimation. J. Statist. Plann. Inference 139 1320–1331.
- Rao, J. N. K. (2003). Small Area Estimation. Wiley-Interscience, New York.
- Rao, J. N. K. and Singh, A. C. (1997). A ridge-shrinkage method for range-restricted weight calibration in survey sampling (Pkg: P57-85). In ASA Proceedings of the Section on Survey Research Methods 57–65. Amer. Statist. Assoc., Alexandria, VA.
- Robinson, P. and Särndal, C. E. (1983). Asymptotic properties of the generalized regression estimator in probability sampling. Sankhya, Ser. B 45 240–248.
- Rueda, M., Sánchez-Borrego, I. and Arcos, A. (2009). Mean estimation in the presence of change points. Appl. Math. Lett. 22 1257–1261.
- Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
- Sánchez-Borrego, I., Rueda, M. and Muñoz, J. (2012). Nonparametric methods in sample surveys. Application to the estimation of cancer prevalence. Qual. Quant. 46 405–414.
- Särndal, C.-E. (2010). The calibration approach in survey theory and practice. Surv. Methodol. 33 99–119.
- Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
- Silva, P. N. and Skinner, C. J. (1997). Variable selection for regression estimation in finite populations. Surv. Methodol. 23 23–32.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.Zentralblatt MATH: 0850.62538
- Tipton, J., Opsomer, J. and Moisen, G. (2013). Properties of endogenous post-stratified estimation using remote sensing data. Remote Sens. Environ. 139 130–137.
- Toth, D. and Eltinge, J. L. (2011). Building consistent regression trees from complex sample data. J. Amer. Statist. Assoc. 106 1626–1636.
- Wang, L. (2009). Single-index model-assisted estimation in survey sampling. J. Nonparametr. Stat. 21 487–504.
- Wang, J. C., Opsomer, J. D. and Wang, H. (2014). Bagging non-differentiable estimators in complex surveys. Surv. Methodol. 40 189–209.
- Wang, L. and Wang, S. (2011). Nonparametric additive model-assisted estimation for survey data. J. Multivariate Anal. 102 1126–1140.
- Wu, C. (2003). Optimal calibration estimators in survey sampling. Biometrika 90 937–951.Zentralblatt MATH: 06598490
- Wu, C. F. J. and Sitter, R. R. (2001). A model-calibration approach to using complete auxiliary information from survey data. J. Amer. Statist. Assoc. 96 185–193.
- Zheng, H. and Little, R. J. A. (2003). Penalized spline model-based estimation of finite population total from probability-proportional-to-size samples. J. Off. Stat. 19 99–117.
- Zheng, H. and Little, R. J. A. (2004). Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Surv. Methodol. 30 209–218.
- Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Uniform convergence and asymptotic confidence bands for model-assisted estimators of the mean of sampled functional data
Cardot, Hervé, Goga, Camelia, and Lardin, Pauline, Electronic Journal of Statistics, 2013 - Survey Estimates by Calibration on Complex Auxiliary Information
Estevao, Victor M. and Särndal, Carl-Erik, International Statistical Review, 2006 - Endogenous post-stratification in surveys: Classifying with a sample-fitted model
Breidt, F. Jay and Opsomer, Jean D., The Annals of Statistics, 2008
- Uniform convergence and asymptotic confidence bands for model-assisted estimators of the mean of sampled functional data
Cardot, Hervé, Goga, Camelia, and Lardin, Pauline, Electronic Journal of Statistics, 2013 - Survey Estimates by Calibration on Complex Auxiliary Information
Estevao, Victor M. and Särndal, Carl-Erik, International Statistical Review, 2006 - Endogenous post-stratification in surveys: Classifying with a sample-fitted model
Breidt, F. Jay and Opsomer, Jean D., The Annals of Statistics, 2008 - High-dimensional data: p > > n in mathematical statistics and bio-medical applications
Van De Geer, Sara A. and Van Houwelingen, Hans C., Bernoulli, 2004 - Model Selection in Linear Mixed Models
Müller, Samuel, Scealy, J. L., and Welsh, A. H., Statistical Science, 2013 - Variable length Markov chains
Bühlmann, Peter and Wyner, Abraham J., The Annals of Statistics, 1999 - Local polynomial regresssion estimators in survey
sampling
Breidt, F. Jay and Opsomer, Jean D., The Annals of Statistics, 2000 - Learning to Rank in Vector Spaces and Social Networks
Chakrabarti, Soumen, Internet Mathematics, 2007 - Small Area Shrinkage Estimation
Datta, G. and Ghosh, M., Statistical Science, 2012 - Comparative Performance of Surrogate-Assisted MOEAs for Geometrical Design of
Pin-Fin Heat Sinks
Kanyakam, Siwadol and Bureerat, Sujin, Journal of Applied Mathematics, 2012
