Statistical Science

Probability Sampling Designs: Principles for Choice of Design and Balancing

Yves Tillé and Matthieu Wilhelm

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

The aim of this paper is twofold. First, three theoretical principles are formalized: randomization, overrepresentation and restriction. We develop these principles and give a rationale for their use in choosing the sampling design in a systematic way. In the model-assisted framework, knowledge of the population is formalized by modelling the population and the sampling design is chosen accordingly. We show how the principles of overrepresentation and of restriction naturally arise from the modelling of the population. The balanced sampling then appears as a consequence of the modelling. Second, a review of probability balanced sampling is presented through the model-assisted framework. For some basic models, balanced sampling can be shown to be an optimal sampling design. Emphasis is placed on new spatial sampling methods and their related models. An illustrative example shows the advantages of the different methods. Throughout the paper, various examples illustrate how the three principles can be applied in order to improve inference.

Article information

Source
Statist. Sci., Volume 32, Number 2 (2017), 176-189.

Dates
First available in Project Euclid: 11 May 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1494489810

Digital Object Identifier
doi:10.1214/16-STS606

Mathematical Reviews number (MathSciNet)
MR3648954

Zentralblatt MATH identifier
1381.62032

Keywords
Balanced sampling design-based model-based inference entropy pivotal method cube method spatial sampling

Citation

Tillé, Yves; Wilhelm, Matthieu. Probability Sampling Designs: Principles for Choice of Design and Balancing. Statist. Sci. 32 (2017), no. 2, 176--189. doi:10.1214/16-STS606. https://projecteuclid.org/euclid.ss/1494489810


Export citation

References

  • Antal, E. and Tillé, Y. (2011). A direct bootstrap method for complex sampling designs from a finite population. J. Amer. Statist. Assoc. 106 534–543.
  • Berger, Y. G. (1998a). Rate of convergence to normal distribution for the Horvitz–Thompson estimator. J. Statist. Plann. Inference 67 209–226.
  • Berger, Y. G. (1998b). Rate of convergence for asymptotic variance for the Horvitz–Thompson estimator. J. Statist. Plann. Inference 74 149–168.
  • Berger, Y. G. and De La Riva Torres, O. (2016). Empirical likelihood confidence intervals for complex sampling designs. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 319–341.
  • Breidt, F. J. and Chauvet, G. (2012). Penalized balanced sampling. Biometrika 99 945–958.
  • Breidt, F. J. and Opsomer, J. D. (2017). Model-assisted survey estimation with modern prediction techniques. Statist. Sci. 32 190–205.
  • Brewer, K. R. W. (1963). Ratio estimation in finite populations: Some results deductible from the assumption of an underlying stochastic process. Aust. J. Stat. 5 93–105.
  • Brewer, K. R. W. and Donadio, M. E. (2003). The high entropy variance of the Horvitz–Thompson estimator. Surv. Methodol. 29 189–196.
  • Brewer, K. R. W. and Hanif, M. (1983). Sampling with Unequal Probabilities. Lecture Notes in Statistics 15. Springer, New York.
  • Cardot, H. and Josserand, E. (2011). Horvitz–Thompson estimators for functional data: Asymptotic confidence bands and optimal allocation for stratified sampling. Biometrika 98 107–118.
  • Chauvet, G. (2012). On a characterization of ordered pivotal sampling. Bernoulli 18 1320–1340.
  • Chauvet, G., Bonnéry, D. and Deville, J. C. (2011). Optimal inclusion probabilities for balanced sampling. J. Statist. Plann. Inference 141 984–994.
  • Chauvet, G. and Tillé, Y. (2005). Fast SAS macros for balancing samples: User’s guide. Software manual, Univ. Neuchâtel. Available at http://www2.unine.ch/statistics/page10890.html.
  • Chen, X. (1993). Poisson-Binomial distribution, conditional Bernoulli distribution and maximum entropy. Technical report, Dept. Statistics, Harvard Univ.
  • Chen, S. X., Dempster, A. P. and Liu, J. S. (1994). Weighted finite population sampling to maximize entropy. Biometrika 81 457–469.
  • Cochran, W. G. (1977). Sampling Techniques. Wiley, New York.
  • Deville, J. C. (2000). Note sur l’algorithme de Chen, Dempster et Liu. Technical report, CREST-ENSAI, Rennes.
  • Deville, J. C. (2014). Échantillonnage équilibré exact Poisonnien. In 8ème Colloque Francophone sur les Sondages 1–6. Société française de statistique, Dijon.
  • Deville, J. C. and Särndal, C. E. (1992). Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 87 376–382.
  • Deville, J. C. and Tillé, Y. (2000). Selection of several unequal probability samples from the same population. J. Statist. Plann. Inference 86 215–227.
  • Deville, J.-C. and Tillé, Y. (2004). Efficient balanced sampling: The cube method. Biometrika 91 893–912.
  • Deville, J.-C. and Tillé, Y. (2005). Variance approximation under balanced sampling. J. Statist. Plann. Inference 128 569–591.
  • Falorsi, P. D. and Righi, P. (2008). A balanced sampling approach for multi-way stratification designs for small area estimation. Surv. Methodol. 34 223–234.
  • Falorsi, P. D. and Righi, P. (2016). A unified approach for defining optimal multivariate and multi-domains sampling designs. In Topics in Theoretical and Applied Statistics 145–152. Springer, New York.
  • Fuller, W. A. (1970). Sampling with random stratum boundaries. J. R. Stat. Soc. Ser. B. Stat. Methodol. 32 209–226.
  • Godambe, V. P. and Joshi, V. M. (1965). Admissibility and Bayes estimation in sampling finite populations I. Ann. Math. Stat. 36 1707–1722.
  • Grafström, A. (2010). Entropy of unequal probability sampling designs. Stat. Methodol. 7 84–97.
  • Grafström, A. and Lisic, J. (2016). BalancedSampling: Balanced and spatially balanced sampling. R package version 1.5.2.
  • Grafström, A. and Lundström, N. L. P. (2013). Why well spread probability samples are balanced? Open J. Stat. 3 36–41.
  • Grafström, A., Lundström, N. L. P. and Schelin, L. (2012). Spatially balanced sampling through the pivotal method. Biometrics 68 514–520.
  • Grafström, A. and Tillé, Y. (2013). Doubly balanced spatial sampling with spreading and restitution of auxiliary totals. Environmetrics 24 120–131.
  • Hájek, J. (1959). Optimum strategy and other problems in probability sampling. Čas. Pěst. Mat. 84 387–423.
  • Hájek, J. (1981). Sampling from a Finite Population. Dekker, New York.
  • Hodges, J. L. Jr. and Le Cam, L. (1960). The Poisson approximation to the Poisson binomial distribution. Ann. Math. Stat. 31 737–740.
  • Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
  • Jay Breidt, F. and Chauvet, G. (2011). Improved variance estimation for balanced samples drawn via the cube method. J. Statist. Plann. Inference 141 479–487.
  • Jessen, R. J. (1978). Statistical Survey Techniques. Wiley, New York.
  • Jiang, J. (2007). Linear and Generalized Linear Mixed Models and Their Applications. Springer, New York.
  • Kincaid, T. M. and Olsen, A. R. (2015). spsurvey: Spatial survey design and analysis. R package version 3.1.
  • Kruskal, W. and Mosteller, F. (1979a). Representative sampling, I: Nonscientific literature. Int. Stat. Rev. 47 13–24.
  • Kruskal, W. and Mosteller, F. (1979b). Representative sampling, II: Scientific literature, excluding statistics. Int. Stat. Rev. 47 111–127.
  • Kruskal, W. and Mosteller, F. (1979c). Representative sampling, III: The current statistical literature. Int. Stat. Rev. 47 245–265.
  • Kruskal, W. and Mosteller, F. (1980). Representative sampling, IV: The history of the concept in statistics, 1895–1939. Int. Stat. Rev. 48 169–195.
  • Laplace, P. S. (1847). Théorie Analytique des Probabilités. Imprimerie royale, Paris.
  • Legg, J. C. and Yu, C. L. (2010). Comparison of sample set restriction procedures. Surv. Methodol. 36 69–79.
  • Lohr, S. (2009). Sampling: Design and Analysis. Brooks/Cole, Boston.
  • Marker, D. A. and Stevens, D. L. Jr. (2009). Sampling and inference in environmental surveys. In Sample Surveys: Design, Methods and Applications. Handbook of Statististics 29 487–512. Elsevier/North-Holland, Amsterdam.
  • Møller, J. and Waagepetersen, R. P. (2003). Statistical Inference and Simulation for Spatial Point Processes. Chapman & Hall/CRC, London.
  • Narain, R. D. (1951). On sampling without replacement with varying probabilities. J. Indian Soc. Agricultural Statist. 3 169–174.
  • Nedyalkova, D. and Tillé, Y. (2008). Optimal sampling and estimation strategies under linear model. Biometrika 95 521–537.
  • Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. J. R. Stat. Soc. 97 558–606.
  • Neyman, J. (1938). Contribution to the theory of sampling human population. J. Amer. Statist. Assoc. 33 101–116.
  • R Development Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rousseau, S. and Tardieu, F. (2004). La macro SAS CUBE d’échantillonnage équilibré, Documentation de l’utilisateur. Technical report, Insee, Paris.
  • Royall, R. M. (1970a). Finite population sampling—On labels in estimation. Ann. Math. Stat. 41 1774–1779.
  • Royall, R. M. (1970b). On finite population sampling theory under certain linear regression models. Biometrika 57 377–387.
  • Royall, R. M. (1976a). Likelihood functions in finite population sampling theory. Biometrika 63 605–614.
  • Royall, R. M. (1976b). The linear least-squares prediction approach to two-stage sampling. J. Amer. Statist. Assoc. 71 657–664.
  • Royall, R. M. (1992). The model based (prediction) approach to finite population sampling theory. In Current Issues in Statistical Inference: Essays in Honor of D. Basu (M. Ghosh and P. K. Pathak, eds.). Lecture Notes—Monograph Series 17 225–240. IMS, Hayward, CA.
  • Royall, R. M. and Herson, J. (1973a). Robust estimation in finite populations I. J. Amer. Statist. Assoc. 68 880–889.
  • Royall, R. M. and Herson, J. (1973b). Robust estimation in finite populations. II. Stratification on a size variable. J. Amer. Statist. Assoc. 68 890–893.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
  • Sen, A. R. (1953). On the estimate of the variance in sampling with varying probabilities. J. Indian Soc. Agricultural Statist. 5 119–127.
  • Stein, C. (1990). Application of Newton’s identities to a generalized birthday problem and to the Poisson-Binomial distribution. Technical Report TC 354, Dept. Statistics, Stanford Univ., Stanford, CA.
  • Stevens, D. L. Jr. and Olsen, A. R. (1999). Spatially restricted surveys over time for aquatic resources. J. Agric. Biol. Environ. Stat. 4 415–428.
  • Stevens, D. L. Jr. and Olsen, A. R. (2003). Variance estimation for spatially balanced samples of environmental resources. Environmetrics 14 593–610.
  • Stevens, D. L. Jr. and Olsen, A. R. (2004). Spatially balanced sampling of natural resources. J. Amer. Statist. Assoc. 99 262–278.
  • Sukhatme, P. V. (1954). Sampling Theory of Surveys with Applications. The Indian Society of Agricultural Statistics, New Delhi, India; The Iowa State College Press, Ames, Iowa.
  • Thompson, S. K. (2012). Sampling, 3rd ed. Wiley, Hoboken, NJ.
  • Tillé, Y. (2006). Sampling Algorithms. Springer, New York.
  • Tillé, Y. and Favre, A. C. (2004). Co-ordination, combination and extension of optimal balanced samples. Biometrika 91 913–927.
  • Tillé, Y. and Favre, A. C. (2005). Optimal allocation in balanced sampling. Statist. Probab. Lett. 74 31–37.
  • Tillé, Y. and Matei, A. (2015). sampling: Survey sampling. R package version 2.7.
  • Valliant, R., Dever, J. A. and Kreuter, F. (2013). Practical Tools for Designing and Weighting Survey Samples. Springer, New York.
  • Valliant, R., Dorfman, A. H. and Royall, R. M. (2000). Finite Population Sampling and Inference: A Prediction Approach. Wiley-Interscience, New York.
  • Yates, F. and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. J. R. Stat. Soc. Ser. B. Stat. Methodol. 15 235–261.