Bernoulli

  • Bernoulli
  • Volume 19, Number 5A (2013), 2067-2097.

Confidence bands for Horvitz–Thompson estimators using sampled noisy functional data

Hervé Cardot, David Degras, and Etienne Josserand

Full-text: Open access

Abstract

When collections of functional data are too large to be exhaustively observed, survey sampling techniques provide an effective way to estimate global quantities such as the population mean function. Assuming functional data are collected from a finite population according to a probabilistic sampling scheme, with the measurements being discrete in time and noisy, we propose to first smooth the sampled trajectories with local polynomials and then estimate the mean function with a Horvitz–Thompson estimator. Under mild conditions on the population size, observation times, regularity of the trajectories, sampling scheme, and smoothing bandwidth, we prove a Central Limit theorem in the space of continuous functions. We also establish the uniform consistency of a covariance function estimator and apply the former results to build confidence bands for the mean function. The bands attain nominal coverage and are obtained through Gaussian process simulations conditional on the estimated covariance function. To select the bandwidth, we propose a cross-validation method that accounts for the sampling weights. A simulation study assesses the performance of our approach and highlights the influence of the sampling scheme and bandwidth choice.

Article information

Source
Bernoulli Volume 19, Number 5A (2013), 2067-2097.

Dates
First available in Project Euclid: 5 November 2013

Permanent link to this document
https://projecteuclid.org/euclid.bj/1383661214

Digital Object Identifier
doi:10.3150/12-BEJ443

Mathematical Reviews number (MathSciNet)
MR3129044

Zentralblatt MATH identifier
06254554

Keywords
CLT functional data local polynomial smoothing maximal inequalities space of continuous functions suprema of Gaussian processes survey sampling weighted cross-validation

Citation

Cardot, Hervé; Degras, David; Josserand, Etienne. Confidence bands for Horvitz–Thompson estimators using sampled noisy functional data. Bernoulli 19 (2013), no. 5A, 2067--2097. doi:10.3150/12-BEJ443. https://projecteuclid.org/euclid.bj/1383661214


Export citation

References

  • [1] Adler, R.J. and Taylor, J.E. (2007). Random Fields and Geometry. Springer Monographs in Mathematics. New York: Springer.
  • [2] Berger, Y.G. (1998). Rate of convergence to normal distribution for the Horvitz–Thompson estimator. J. Statist. Plann. Inference 67 209–226.
  • [3] Breidt, F.J. and Opsomer, J.D. (2000). Local polynomial regresssion estimators in survey sampling. Ann. Statist. 28 1026–1053.
  • [4] Bunea, F., Ivanescu, A.E. and Wegkamp, M.H. (2011). Adaptive inference for the mean of a Gaussian process in functional data. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 531–558.
  • [5] Cardot, H., Chaouch, M., Goga, C. and Labruère, C. (2010). Properties of design-based functional principal components analysis. J. Statist. Plann. Inference 140 75–91.
  • [6] Cardot, H., Dessertaine, A. and Josserand, E. (2010). Semiparametric models with functional responses in survey sampling setting: Model assisted estimation of electricity consumption curves. In Compstat 2010 (Y. Lechevallier and G. Saporta, eds.) 411–420. Heidelberg: Physica-Verlag.
  • [7] Cardot, H. and Josserand, E. (2011). Horvitz-Thompson estimators for functional data: Asymptotic confidence bands and optimal allocation for stratified sampling. Biometrika 98 107–118.
  • [8] Chen, J. and Rao, J.N.K. (2007). Asymptotic normality under two-phase sampling designs. Statist. Sinica 17 1047–1064.
  • [9] Chiky, R. and Hébrail, G. (2008). Summarizing distributed data streams for storage in data warehouses. In DaWaK 2008 (I.-Y. Song, J. Eder and T.M. Nguyen, eds.). Lecture Notes in Computer Science 65–74. New York: Springer.
  • [10] Claeskens, G. and van Keilegom, I. (2003). Bootstrap confidence bands for regression curves and their derivatives. Ann. Statist. 31 1852–1884.
  • [11] Cuevas, A., Febrero, M. and Fraiman, R. (2006). On the use of the bootstrap for estimating functions with functional data. Comput. Statist. Data Anal. 51 1063–1074.
  • [12] Degras, D. (2009). Nonparametric estimation of a trend based upon sampled continuous processes. C. R. Math. Acad. Sci. Paris 347 191–194.
  • [13] Degras, D.A. (2011). Simultaneous confidence bands for nonparametric regression with functional data. Statist. Sinica 21 1735–1765.
  • [14] Erdős, P. and Rényi, A. (1959). On the central limit theorem for samples from a finite population. Publ. Math. Inst. Hungar. Acad. Sci. 4 49–61.
  • [15] Eubank, R.L. and Speckman, P.L. (1993). Confidence bands in nonparametric regression. J. Amer. Statist. Assoc. 88 1287–1301.
  • [16] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability 66. London: Chapman & Hall.
  • [17] Faraway, J.J. (1997). Regression analysis for a functional response. Technometrics 39 254–261.
  • [18] Fey, A., van der Hofstad, R. and Klok, M.J. (2008). Large deviations for eigenvalues of sample covariance matrices, with applications to mobile communication systems. Adv. in Appl. Probab. 40 1048–1071.
  • [19] Fuller, W.A. (2009). Sampling Statistics. New York: Wiley.
  • [20] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
  • [21] Hájek, J. (1960). Limiting distributions in simple random sampling from a finite population. Publ. Math. Inst. Hungar. Acad. Sci. 5 361–374.
  • [22] Hájek, J. (1964). Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann. Math. Statist. 35 1491–1523.
  • [23] Hájek, J. (1981). Sampling from a Finite Population (V. Dupač, ed.). Statistics: Textbooks and Monographs 37. New York: Dekker.
  • [24] Hart, J.D. and Wehrly, T.E. (1993). Consistency of cross-validation when the data are curves. Stochastic Process. Appl. 45 351–361.
  • [25] Isaki, C.T. and Fuller, W.A. (1982). Survey design under the regression superpopulation model. J. Amer. Statist. Assoc. 77 89–96.
  • [26] Krivobokova, T., Kneib, T. and Claeskens, G. (2010). Simultaneous confidence bands for penalized spline estimators. J. Amer. Statist. Assoc. 105 852–863.
  • [27] Landau, H.J. and Shepp, L.A. (1970). On the supremum of a Gaussian process. Sankhyā Ser. A 32 369–378.
  • [28] Mas, A. (2007). Testing for the mean of random curves: A penalization approach. Stat. Inference Stoch. Process. 10 147–163.
  • [29] Opsomer, J.D. and Miller, C.P. (2005). Selecting the amount of smoothing in nonparametric regression estimation for complex surveys. J. Nonparametr. Stat. 17 593–611.
  • [30] Rice, J.A. and Silverman, B.W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53 233–243.
  • [31] Robinson, P.M. and Särndal, C.E. (1983). Asymptotic properties of the generalized regression estimator in probability sampling. Sankhyā Ser. B 45 240–248.
  • [32] Särndal, C.E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer Series in Statistics. New York: Springer.
  • [33] Sun, J. and Loader, C.R. (1994). Simultaneous confidence bands for linear regression and smoothing. Ann. Statist. 22 1328–1345.
  • [34] Tsybakov, A.B. (2009). Introduction to Nonparametric Estimation. Springer Series in Statistics. New York: Springer. Revised and extended from the 2004 French original, translated by Vladimir Zaiats.
  • [35] van der Vaart, A.D. and Wellner, J.A. (2000). Weak Convergence and Empirical Processes. With Applications to Statistics. New York: Springer.