The Annals of Statistics

Backfitting in smoothing spline ANOVA

Zhen Luo

Full-text: Open access


A computational scheme for fitting smoothing spline ANOVA models to large data sets with a (near) tensor product design is proposed. Such data sets are common in spatial-temporal analyses. The proposed scheme uses the backfitting algorithm to take advantage of the tensor product design to save both computational memory and time. Several ways to further speed up the backfitting algorithm, such as collapsing component functions and successive over-relaxation, are discussed. An iterative imputation procedure is used to handle the cases of near tensor product designs. An application to a global historical surface air temperature data set, which motivated this work, is used to illustrate the scheme proposed.

Article information

Ann. Statist., Volume 26, Number 5 (1998), 1733-1759.

First available in Project Euclid: 21 June 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G07: Density estimation 65D10: Smoothing, curve fitting 65F10: Iterative methods for linear systems [See also 65N22]
Secondary: 62H11: Directional data; spatial statistics 65U05 86A32: Geostatistics

Gauss–Seidel algorithm tensor product design spatial-temporal analysis additive model collapsing grouping SOR global historical temperature data


Luo, Zhen. Backfitting in smoothing spline ANOVA. Ann. Statist. 26 (1998), no. 5, 1733--1759. doi:10.1214/aos/1024691355.

Export citation


  • Ansley, C. F. and Kohn, R. (1994). Convergence of the backfitting algorithm for additive models. J. Austral. Math. Soc. Ser. A 57 316-329.
  • Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337-404.
  • Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models (with discussion). Ann. Statist. 17 453-555.
  • Chen, Z., Gu, C. and Wahba, G. (1989). Discussion of "Linear smoothers and additive models" by Buja, Hastie and Tibshirani. Ann. Statist. 17 515-517.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy al Statist. Soc. Ser. B 39 1-38.
  • Girard, D. (1989). A fast "Monte Carlo cross-validation" procedure for large least squares problems with noisy data. Numer. Math. 56 1-23.
  • Golub, G. H. and Van Loan, C. F. (1989). Matrix Computations, 2nd ed. Johns Hopkins Univ. Press.
  • Green, P. J. (1990). On use of the EM Algorithm for Penalized Likelihood Estimation. J. Roy al Statist. Soc. Ser. B 52 443-452.
  • Gu, C. (1989). RKPACK and its applications: Fitting smoothing spline models. Technical Report 857, Dept. Statistics, Univ. Wisconsin-Madison. Gu, C. and Wahba, G. (1993a). Semiparametric analysis of variance with tensor product thin plate splines. J. Roy al Statist. Soc. Ser. B 55 353-368. Gu, C. and Wahba, G. (1993b). Smoothing spline ANOVA with component-wise Bayesian confidence intervals. J. Comput. Graph. Statist. 2 97-117.
  • Hansen, J. and Lebedeff, S. (1987). Global trends of measured surface air temperature. J. Geophysical Research 92 13,345-13,372. Jones, P. D., Raper, S. C. B., Cherry, B. S. G., Goodess, C. M., Wigley, T. M. L., Santer, B.,
  • Kelly, P. M., Bradley, R. S. and Diaz, H. F. (1991). An updated global grid point surface air temperature anomaly data set: 1851-1988. Environmental Sciences Division Publication 3520, U.S. Dept. Energy, Washington, DC.
  • Jones, P. D., Raper, S. C. B., Bradley, R. S., Diaz, H. F., Kelly, P. M. and Wigley, T. M. L. (1986). Northern hemisphere surface air temperature variations: 1851-1984. J. Climate and Applied Meteorology 25 161-179.
  • Karl, T. R., Knight, R. W. and Christy, J. R. (1994). Global and hemispheric temperature trends: uncertainties related to inadequate spatial sampling. J. Climate 7 1144-1163.
  • Liu, J. S. (1994). The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Amer. Statist. Assoc. 89 958-966.
  • Luo, Z. (1996). Backfitting in smoothing spline ANOVA with application to historical global temperature data (thesis). Technical Report 964, Dept. Statistics, Univ. Wisconsin, Madison.
  • Luo, Z., Wahba, G. and Johnson, D. R. (1998). Spatial-temporal analysis of temperature using smoothing spline ANOVA. J. Climate 11 18-28.
  • Madden, R. A., Shea, D. J., Branstator, G. W., Tribbia, J. J. and Weber, R. O. (1993). The effects of imperfect spatial and temporal sampling on estimates of the global mean temperature: Experiments with model data. J. Climate 6 1057-1066.
  • O'Sullivan, F. (1985). Discussion of "Some aspects of the spline smoothing approach to nonparametric regression curve fitting" by Silverman. J. Roy. Statist. Soc. Ser. B 47 39-40.
  • Roberts, G. O. and Sahu, S. K. (1997). Updating schemes, Correlation structure, blocking and parameterization for the Gibbs sampler. J. Roy. Statist. Soc. Ser. B 59 291-317.
  • Stein, M. (1990). Uniform asy mptotic optimality of linear predictions of a random field using an incorrect second-order structure. Ann. Statist. 18 850-872.
  • Tapia, R. A. and Thompson, J. R. (1978). Nonparametric Probability Density Estimation. Johns Hopkins Univ. Press.
  • Varga, R. S. (1962). Matrix Iterative Analy sis. Prentice-Hall, Englewood Cliffs, NJ.
  • Vinnikov, K. Ya., Groisman, P. Ya. and Lugina, K. M. (1990). Empirical data on contemporary global climate changes (temperature and precipitation). J. Climate 3 662-677.
  • Wahba, G. (1981). Spline interpolation and smoothing on the sphere. SIAM J. Sci. Statist. Comput. 2 5-16. [Erratum (1982) 3 385-386.]
  • Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
  • Wahba, G. and Luo, Z. (1997). Smoothing spline ANOVA fits for very large, nearly regular data sets, with application to historical global climate data. Ann. Numer. Math. 4 579-597.
  • Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995). Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann. Statist. 23 1865-1895.
  • Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11 95-103.
  • Yates, F. (1933). The analysis of replicated experiments when the field results are incomplete. Empire J. Experimental Agriculture 1 129-142.
  • Young, D. M. (1971). Iterative Solution of Large Linear Sy stems. Academic Press, New York.