The Annals of Statistics

Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier

Raymond J. Carroll, Aurore Delaigle, and Peter Hall

Full-text: Open access


The data functions that are studied in the course of functional data analysis are assembled from discrete data, and the level of smoothing that is used is generally that which is appropriate for accurate approximation of the conceptually smooth functions that were not actually observed. Existing literature shows that this approach is effective, and even optimal, when using functional data methods for prediction or hypothesis testing. However, in the present paper we show that this approach is not effective in classification problems. There a useful rule of thumb is that undersmoothing is often desirable, but there are several surprising qualifications to that approach. First, the effect of smoothing the training data can be more significant than that of smoothing the new data set to be classified; second, undersmoothing is not always the right approach, and in fact in some cases using a relatively large bandwidth can be more effective; and third, these perverse results are the consequence of very unusual properties of error rates, expressed as functions of smoothing parameters. For example, the orders of magnitude of optimal smoothing parameter choices depend on the signs and sizes of terms in an expansion of error rate, and those signs and sizes can vary dramatically from one setting to another, even for the same classifier.

Article information

Ann. Statist., Volume 41, Number 6 (2013), 2739-2767.

First available in Project Euclid: 17 December 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression

Centroid method discrimination kernel smoothing quadratic discrimination smoothing parameter choice training data


Carroll, Raymond J.; Delaigle, Aurore; Hall, Peter. Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier. Ann. Statist. 41 (2013), no. 6, 2739--2767. doi:10.1214/13-AOS1158.

Export citation


  • Araki, Y., Konishi, S., Kawano, S. and Matsui, H. (2009). Functional logistic discrimination via regularized basis expansions. Comm. Statist. Theory Methods 38 2944–2957.
  • Benhennia, K. and Degras, D. (2011). Local polynomial regression based on functional data. Unpublished manuscript. Available at
  • Berlinet, A., Biau, G. and Rouvière, L. (2008). Functional supervised classification with wavelets. Ann. I.S.U.P. 52 61–80.
  • Biau, G., Bunea, F. and Wegkamp, M. H. (2005). Functional classification in Hilbert spaces. IEEE Trans. Inform. Theory 51 2163–2172.
  • Cardot, H., Degras, D. and Josserand, E. (2013). Confidence bands for Horvitz–Thompson estimators using sampled noisy functional data. Bernoulli 19 2067–2097.
  • Cardot, H. and Josserand, E. (2011). Horvitz–Thompson estimators for functional data: Asymptotic confidence bands and optimal allocation for stratified sampling. Biometrika 98 107–118.
  • Carroll, R. J., Delaigle, A. and Hall, P. (2013). Supplement to “Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier.” DOI:10.1214/13-AOS1158SUPP.
  • Cuevas, A., Febrero, M. and Fraiman, R. (2007). Robust estimation and classification for functional data via projection-based depth notions. Comput. Statist. 22 481–496.
  • Delaigle, A. and Hall, P. (2012). Achieving near perfect classification for functional data. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 267–286.
  • Delaigle, A., Hall, P. and Bathia, N. (2012). Componentwise classification and clustering of functional data. Biometrika 99 299–313.
  • Epifanio, I. (2008). Shape descriptors for classification of functional data. Technometrics 50 284–294.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability 66. Chapman & Hall, London.
  • Fromont, M. and Tuleau, C. (2006). Functional classification with margin conditions. In Learning Theory—Proceedings of the 19th Annual Conference on Learning Theory, Pittsburgh, 2006 (J. G. Carbonell and J. Siekmann, eds.). Springer, New York.
  • Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 109–126.
  • Hall, P. and Hosseini-Nasab, M. (2009). Theory for high-order bounds in functional principal components analysis. Math. Proc. Cambridge Philos. Soc. 146 225–256.
  • Hall, P. and Kang, K.-H. (2005). Bandwidth choice for nonparametric classification. Ann. Statist. 33 284–306.
  • Hall, P. and Van Keilegom, I. (2007). Two-sample tests in functional data analysis starting from discrete data. Statist. Sinica 17 1511–1531.
  • Leng, X. and Müller, H.-G. (2006). Classification using functional data analysis for temporal gene expression data. Bioinformatics 22 68–76.
  • Li, Y. and Hsing, T. (2010a). Deciding the dimension of effective dimension reduction space for functional and high-dimensional data. Ann. Statist. 38 3028–3062.
  • Li, Y. and Hsing, T. (2010b). Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Ann. Statist. 38 3321–3351.
  • López-Pintado, S. and Romo, J. (2006). Depth-based classification for functional data. In Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 72 103–119. Amer. Math. Soc., Providence, RI.
  • Manning, C. D., Raghavan, P. and Schütze, H. (2008). Introduction to Information Retrival. Cambridge Univ. Press, Cambridge.
  • Panaretos, V. M., Kraus, D. and Maddocks, J. H. (2010). Second-order comparison of Gaussian random functions and the geometry of DNA minicircles. J. Amer. Statist. Assoc. 105 670–682.
  • Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York.
  • Rossi, F. and Villa, N. (2006). Support vector machine for functional data classification. Neurocomputing 69 730–742.
  • Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 90 1257–1270.
  • Vilar, J. A. and Pértega, S. (2004). Discriminant and cluster analysis for Gaussian stationary processes: Local linear fitting approach. J. Nonparametr. Stat. 16 443–462.
  • Wang, X., Ray, S. and Mallick, B. K. (2007). Bayesian curve classification using wavelets. J. Amer. Statist. Assoc. 102 962–973.
  • Wu, S. and Müller, H.-G. (2011). Response-adaptive regression for longitudinal data. Biometrics 67 852–860.

Supplemental materials

  • Supplementary material: Supplement to “Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier”. The supplementary file contains the proof of Theorems 2 and 3, as well as additional simulation results.