Statistical Science
- Statist. Sci.
- Volume 32, Number 2 (2017), 176-189.
Probability Sampling Designs: Principles for Choice of Design and Balancing
Yves Tillé and Matthieu Wilhelm
Full-text: Access has been disabled (more information)
Abstract
The aim of this paper is twofold. First, three theoretical principles are formalized: randomization, overrepresentation and restriction. We develop these principles and give a rationale for their use in choosing the sampling design in a systematic way. In the model-assisted framework, knowledge of the population is formalized by modelling the population and the sampling design is chosen accordingly. We show how the principles of overrepresentation and of restriction naturally arise from the modelling of the population. The balanced sampling then appears as a consequence of the modelling. Second, a review of probability balanced sampling is presented through the model-assisted framework. For some basic models, balanced sampling can be shown to be an optimal sampling design. Emphasis is placed on new spatial sampling methods and their related models. An illustrative example shows the advantages of the different methods. Throughout the paper, various examples illustrate how the three principles can be applied in order to improve inference.
Article information
Source
Statist. Sci. Volume 32, Number 2 (2017), 176-189.
Dates
First available in Project Euclid: 11 May 2017
Permanent link to this document
http://projecteuclid.org/euclid.ss/1494489810
Digital Object Identifier
doi:10.1214/16-STS606
Keywords
Balanced sampling design-based model-based inference entropy pivotal method cube method spatial sampling
Citation
Tillé, Yves; Wilhelm, Matthieu. Probability Sampling Designs: Principles for Choice of Design and Balancing. Statist. Sci. 32 (2017), no. 2, 176--189. doi:10.1214/16-STS606. http://projecteuclid.org/euclid.ss/1494489810.
References
- Antal, E. and Tillé, Y. (2011). A direct bootstrap method for complex sampling designs from a finite population. J. Amer. Statist. Assoc. 106 534–543.Mathematical Reviews (MathSciNet): MR2847968
Zentralblatt MATH: 1232.62030
Digital Object Identifier: doi:10.1198/jasa.2011.tm09767 - Berger, Y. G. (1998a). Rate of convergence to normal distribution for the Horvitz–Thompson estimator. J. Statist. Plann. Inference 67 209–226.
- Berger, Y. G. (1998b). Rate of convergence for asymptotic variance for the Horvitz–Thompson estimator. J. Statist. Plann. Inference 74 149–168.
- Berger, Y. G. and De La Riva Torres, O. (2016). Empirical likelihood confidence intervals for complex sampling designs. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 319–341.
- Breidt, F. J. and Chauvet, G. (2012). Penalized balanced sampling. Biometrika 99 945–958.Zentralblatt MATH: 06111562
- Breidt, F. J. and Opsomer, J. D. (2017). Model-assisted survey estimation with modern prediction techniques. Statist. Sci. 32 190–205.
- Brewer, K. R. W. (1963). Ratio estimation in finite populations: Some results deductible from the assumption of an underlying stochastic process. Aust. J. Stat. 5 93–105.
- Brewer, K. R. W. and Donadio, M. E. (2003). The high entropy variance of the Horvitz–Thompson estimator. Surv. Methodol. 29 189–196.
- Brewer, K. R. W. and Hanif, M. (1983). Sampling with Unequal Probabilities. Lecture Notes in Statistics 15. Springer, New York.Zentralblatt MATH: 0514.62015
- Cardot, H. and Josserand, E. (2011). Horvitz–Thompson estimators for functional data: Asymptotic confidence bands and optimal allocation for stratified sampling. Biometrika 98 107–118.
- Chauvet, G. (2012). On a characterization of ordered pivotal sampling. Bernoulli 18 1320–1340.Zentralblatt MATH: 1329.62054
Digital Object Identifier: doi:10.3150/11-BEJ380
Project Euclid: euclid.bj/1352727813 - Chauvet, G., Bonnéry, D. and Deville, J. C. (2011). Optimal inclusion probabilities for balanced sampling. J. Statist. Plann. Inference 141 984–994.
- Chauvet, G. and Tillé, Y. (2005). Fast SAS macros for balancing samples: User’s guide. Software manual, Univ. Neuchâtel. Available at http://www2.unine.ch/statistics/page10890.html.
- Chen, X. (1993). Poisson-Binomial distribution, conditional Bernoulli distribution and maximum entropy. Technical report, Dept. Statistics, Harvard Univ.
- Chen, S. X., Dempster, A. P. and Liu, J. S. (1994). Weighted finite population sampling to maximize entropy. Biometrika 81 457–469.
- Cochran, W. G. (1977). Sampling Techniques. Wiley, New York.
- Deville, J. C. (2000). Note sur l’algorithme de Chen, Dempster et Liu. Technical report, CREST-ENSAI, Rennes.
- Deville, J. C. (2014). Échantillonnage équilibré exact Poisonnien. In 8ème Colloque Francophone sur les Sondages 1–6. Société française de statistique, Dijon.
- Deville, J. C. and Särndal, C. E. (1992). Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 87 376–382.
- Deville, J. C. and Tillé, Y. (2000). Selection of several unequal probability samples from the same population. J. Statist. Plann. Inference 86 215–227.Mathematical Reviews (MathSciNet): MR1763190
Zentralblatt MATH: 0953.62013
Digital Object Identifier: doi:10.1016/S0378-3758(99)00164-0 - Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: The cube method. Biometrika 91 893–912.
- Deville, J.-C. and Tillé, Y. (2005). Variance approximation under balanced sampling. J. Statist. Plann. Inference 128 569–591.
- Falorsi, P. D. and Righi, P. (2008). A balanced sampling approach for multi-way stratification designs for small area estimation. Surv. Methodol. 34 223–234.
- Falorsi, P. D. and Righi, P. (2016). A unified approach for defining optimal multivariate and multi-domains sampling designs. In Topics in Theoretical and Applied Statistics 145–152. Springer, New York.
- Fuller, W. A. (1970). Sampling with random stratum boundaries. J. R. Stat. Soc. Ser. B. Stat. Methodol. 32 209–226.Zentralblatt MATH: 0198.51904
- Godambe, V. P. and Joshi, V. M. (1965). Admissibility and Bayes estimation in sampling finite populations I. Ann. Math. Stat. 36 1707–1722.
- Grafström, A. (2010). Entropy of unequal probability sampling designs. Stat. Methodol. 7 84–97.Mathematical Reviews (MathSciNet): MR2591712
Zentralblatt MATH: 1230.62004
Digital Object Identifier: doi:10.1016/j.stamet.2009.10.005 - Grafström, A. and Lisic, J. (2016). BalancedSampling: Balanced and spatially balanced sampling. R package version 1.5.2.
- Grafström, A. and Lundström, N. L. P. (2013). Why well spread probability samples are balanced? Open J. Stat. 3 36–41.
- Grafström, A., Lundström, N. L. P. and Schelin, L. (2012). Spatially balanced sampling through the pivotal method. Biometrics 68 514–520.
- Grafström, A. and Tillé, Y. (2013). Doubly balanced spatial sampling with spreading and restitution of auxiliary totals. Environmetrics 24 120–131.
- Hájek, J. (1959). Optimum strategy and other problems in probability sampling. Čas. Pěst. Mat. 84 387–423.
- Hájek, J. (1981). Sampling from a Finite Population. Dekker, New York.
- Hodges, J. L. Jr. and Le Cam, L. (1960). The Poisson approximation to the Poisson binomial distribution. Ann. Math. Stat. 31 737–740.Mathematical Reviews (MathSciNet): MR117812
Zentralblatt MATH: 0100.14301
Digital Object Identifier: doi:10.1214/aoms/1177705799
Project Euclid: euclid.aoms/1177705799 - Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
- Jay Breidt, F. and Chauvet, G. (2011). Improved variance estimation for balanced samples drawn via the cube method. J. Statist. Plann. Inference 141 479–487.
- Jessen, R. J. (1978). Statistical Survey Techniques. Wiley, New York.
- Jiang, J. (2007). Linear and Generalized Linear Mixed Models and Their Applications. Springer, New York.Zentralblatt MATH: 1152.62040
- Kincaid, T. M. and Olsen, A. R. (2015). spsurvey: Spatial survey design and analysis. R package version 3.1.
- Kruskal, W. and Mosteller, F. (1979a). Representative sampling, I: Nonscientific literature. Int. Stat. Rev. 47 13–24.
- Kruskal, W. and Mosteller, F. (1979b). Representative sampling, II: Scientific literature, excluding statistics. Int. Stat. Rev. 47 111–127.
- Kruskal, W. and Mosteller, F. (1979c). Representative sampling, III: The current statistical literature. Int. Stat. Rev. 47 245–265.
- Kruskal, W. and Mosteller, F. (1980). Representative sampling, IV: The history of the concept in statistics, 1895–1939. Int. Stat. Rev. 48 169–195.Mathematical Reviews (MathSciNet): MR586104
Zentralblatt MATH: 0471.62011
Digital Object Identifier: doi:10.2307/1403151 - Laplace, P. S. (1847). Théorie Analytique des Probabilités. Imprimerie royale, Paris.
- Legg, J. C. and Yu, C. L. (2010). Comparison of sample set restriction procedures. Surv. Methodol. 36 69–79.
- Lohr, S. (2009). Sampling: Design and Analysis. Brooks/Cole, Boston.
- Marker, D. A. and Stevens, D. L. Jr. (2009). Sampling and inference in environmental surveys. In Sample Surveys: Design, Methods and Applications. Handbook of Statististics 29 487–512. Elsevier/North-Holland, Amsterdam.
- Møller, J. and Waagepetersen, R. P. (2003). Statistical Inference and Simulation for Spatial Point Processes. Chapman & Hall/CRC, London.
- Narain, R. D. (1951). On sampling without replacement with varying probabilities. J. Indian Soc. Agricultural Statist. 3 169–174.Mathematical Reviews (MathSciNet): MR45354
- Nedyalkova, D. and Tillé, Y. (2008). Optimal sampling and estimation strategies under linear model. Biometrika 95 521–537.
- Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. J. R. Stat. Soc. 97 558–606.
- Neyman, J. (1938). Contribution to the theory of sampling human population. J. Amer. Statist. Assoc. 33 101–116.
- R Development Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
- Rousseau, S. and Tardieu, F. (2004). La macro SAS CUBE d’échantillonnage équilibré, Documentation de l’utilisateur. Technical report, Insee, Paris.
- Royall, R. M. (1970a). Finite population sampling—On labels in estimation. Ann. Math. Stat. 41 1774–1779.
- Royall, R. M. (1970b). On finite population sampling theory under certain linear regression models. Biometrika 57 377–387.
- Royall, R. M. (1976a). Likelihood functions in finite population sampling theory. Biometrika 63 605–614.Mathematical Reviews (MathSciNet): MR652530
Zentralblatt MATH: 0344.62012
Digital Object Identifier: doi:10.1093/biomet/63.3.605 - Royall, R. M. (1976b). The linear least-squares prediction approach to two-stage sampling. J. Amer. Statist. Assoc. 71 657–664.
- Royall, R. M. (1992). The model based (prediction) approach to finite population sampling theory. In Current Issues in Statistical Inference: Essays in Honor of D. Basu (M. Ghosh and P. K. Pathak, eds.). Lecture Notes—Monograph Series 17 225–240. IMS, Hayward, CA.
- Royall, R. M. and Herson, J. (1973a). Robust estimation in finite populations I. J. Amer. Statist. Assoc. 68 880–889.
- Royall, R. M. and Herson, J. (1973b). Robust estimation in finite populations. II. Stratification on a size variable. J. Amer. Statist. Assoc. 68 890–893.Mathematical Reviews (MathSciNet): MR386088
Zentralblatt MATH: 0272.62006
Digital Object Identifier: doi:10.1080/01621459.1973.10481441 - Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
- Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
- Sen, A. R. (1953). On the estimate of the variance in sampling with varying probabilities. J. Indian Soc. Agricultural Statist. 5 119–127.Mathematical Reviews (MathSciNet): MR68179
- Stein, C. (1990). Application of Newton’s identities to a generalized birthday problem and to the Poisson-Binomial distribution. Technical Report TC 354, Dept. Statistics, Stanford Univ., Stanford, CA.
- Stevens, D. L. Jr. and Olsen, A. R. (1999). Spatially restricted surveys over time for aquatic resources. J. Agric. Biol. Environ. Stat. 4 415–428.
- Stevens, D. L. Jr. and Olsen, A. R. (2003). Variance estimation for spatially balanced samples of environmental resources. Environmetrics 14 593–610.
- Stevens, D. L. Jr. and Olsen, A. R. (2004). Spatially balanced sampling of natural resources. J. Amer. Statist. Assoc. 99 262–278.
- Sukhatme, P. V. (1954). Sampling Theory of Surveys with Applications. The Indian Society of Agricultural Statistics, New Delhi, India; The Iowa State College Press, Ames, Iowa.
- Thompson, S. K. (2012). Sampling, 3rd ed. Wiley, Hoboken, NJ.
- Tillé, Y. (2006). Sampling Algorithms. Springer, New York.Zentralblatt MATH: 55.0644.04
- Tillé, Y. and Favre, A. C. (2004). Co-ordination, combination and extension of optimal balanced samples. Biometrika 91 913–927.Zentralblatt MATH: 55.0644.04
- Tillé, Y. and Favre, A. C. (2005). Optimal allocation in balanced sampling. Statist. Probab. Lett. 74 31–37.Zentralblatt MATH: 55.0644.04
- Tillé, Y. and Matei, A. (2015). sampling: Survey sampling. R package version 2.7.Zentralblatt MATH: 55.0644.04
- Valliant, R., Dever, J. A. and Kreuter, F. (2013). Practical Tools for Designing and Weighting Survey Samples. Springer, New York.Zentralblatt MATH: 1282.62027
- Valliant, R., Dorfman, A. H. and Royall, R. M. (2000). Finite Population Sampling and Inference: A Prediction Approach. Wiley-Interscience, New York.Zentralblatt MATH: 0964.62007
- Yates, F. and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. J. R. Stat. Soc. Ser. B. Stat. Methodol. 15 235–261.Zentralblatt MATH: 0052.15301

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Orthogonal simple component analysis: A new, exploratory approach
Anaya-Izquierdo, Karim, Critchley, Frank, and Vines, Karen, The Annals of Applied Statistics, 2011 - Qualitative and Quantitative Integrated Modeling for Stochastic Simulation and Optimization
Yan, Xuefeng, Zhou, Yong, Wen, Yan, and Chai, Xudong, Journal of Applied Mathematics, 2013 - Introduction
Balding, David J. and Gastwirth, Joseph L., International Statistical Review, 2003
- Orthogonal simple component analysis: A new, exploratory approach
Anaya-Izquierdo, Karim, Critchley, Frank, and Vines, Karen, The Annals of Applied Statistics, 2011 - Qualitative and Quantitative Integrated Modeling for Stochastic Simulation and Optimization
Yan, Xuefeng, Zhou, Yong, Wen, Yan, and Chai, Xudong, Journal of Applied Mathematics, 2013 - Introduction
Balding, David J. and Gastwirth, Joseph L., International Statistical Review, 2003 - Logicist statistics. I. Models and modeling
Dempster, A. P., Statistical Science, 1998 - Modeling through group invariance: an interesting example with potential applications
Li, Heng, The Annals of Statistics, 2002 - Model-based inferences from adaptive cluster sampling
Rapley, V. E. and Welsh, A. H., Bayesian Analysis, 2008 - A penalized likelihood estimation approach to semiparametric sample selection binary response modeling
Marra, Giampiero and Radice, Rosalba, Electronic Journal of Statistics, 2013 - Causality and Association: The Statistical and Legal Approaches
Mengersen, K., Moynihan, S. A., and Tweedie, R. L., Statistical Science, 2007 - Two rationales behind the `buy-and-hold or sell-at-once' strategy
S. C. P., Yam, S. P., Yung, and W., Zhou, Journal of Applied Probability, 2009 - On a characterization of ordered pivotal sampling
Chauvet, Guillaume, Bernoulli, 2012
