Statistical Science

Flexible smoothing with B-splines and penalties

Paul H. C. Eilers and Brian D. Marx

Full-text: Open access

Abstract

B-splines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots and a difference penalty on coefficients of adjacent B-splines. We show connections to the familiar spline penalty on the integral of the squared second derivative. A short overview of B-splines, of their construction and of penalized likelihood is presented. We discuss properties of penalized B-splines and propose various criteria for the choice of an optimal penalty parameter. Nonparametric logistic regression, density estimation and scatterplot smoothing are used as examples. Some details of the computations are presented.

Article information

Source
Statist. Sci. Volume 11, Number 2 (1996), 89-121.

Dates
First available in Project Euclid: 27 November 2002

Permanent link to this document
http://projecteuclid.org/euclid.ss/1038425655

Digital Object Identifier
doi:10.1214/ss/1038425655

Mathematical Reviews number (MathSciNet)
MR1435485

Zentralblatt MATH identifier
0955.62562

Keywords
Generalized linear models smoothing nonparametric models splines density estimation

Citation

Eilers, Paul H. C.; Marx, Brian D. Flexible smoothing with B -splines and penalties. Statist. Sci. 11 (1996), no. 2, 89--121. doi:10.1214/ss/1038425655. http://projecteuclid.org/euclid.ss/1038425655.


Export citation

References

  • Ashford, R. and Walker, P. J. (1972). Quantal response analysis for a mixture of populations. Biometrics 28 981-988.
  • Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analy sis: Theory and Practice. MIT Press.
  • Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatter plots. J. Amer. Statist. Assoc. 74 829-836.
  • Cox, M. G. (1981). Practical spline approximation. In Topics in Numerical Analy sis (P. R. Turner, ed.). Springer, Berlin.
  • de Boor, C. (1977). Package for calculating with B-splines. SIAM J. Numer. Anal. 14 441-472.
  • de Boor, C. (1978). A Practical Guide to Splines. Springer, Berlin.
  • Dierckx, P. (1993). Curve and Surface Fitting with Splines. Clarendon, Oxford.
  • Diggle P. and Marron J. S. (1988). Equivalence of smoothing parameter selectors in density and intensity estimation. J. Amer. Statist. Assoc. 83 793-800.
  • Eilers, P. H. C. (1990). Smoothing and interpolation with generalized linear models. Quaderni di Statistica e Matematica Applicata alle Scienze Economico-Sociali 12 21-32. Eilers, P. H. C. (1991a). Penalized regression in action: estimating pollution roses from daily averages. Environmetrics 2 25-48. Eilers, P. H. C. (1991b). Nonparametric density estimation with grouped observations. Statist. Neerlandica 45 255-270.
  • Eilers, P. H. C. (1995). Indirect observations, composite link models and penalized likelihood. In Statistical Modelling (G. U. H. Seeber et al., eds.). Springer, New York.
  • Eilers, P. H. C. and Marx, B. D. (1992). Generalized linear models with P-splines. In Advances in GLIM and Statistical Modelling (L. Fahrmeir et al., eds.). Springer, New York.
  • Eubank, R. L. (1988). Spline Smoothing and Nonparametric Regression. Dekker, New York.
  • Friedman, J. and Silverman, B. W. (1989). Flexible parsimonious smoothing and additive modeling (with discussion). Technometrics 31 3-39.
  • Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London.
  • Green, P. J. and Yandell, B. S. (1985). Semi-parametric generalized linear models. In Generalized Linear Models (B. Gilchrist et al., eds.). Springer, New York. Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J. and Os
  • trowski, E. (1994). A Handbook of Small Data Sets. Chapman and Hall, London.
  • H¨ardle, W. (1990). Applied Nonparametric Regression. Cambridge Univ. Press.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Kooperberg, C. and Stone, C. J. (1991). A study of logspline density estimation. Comput. Statist. Data Anal. 12 327-347.
  • Kooperberg, C. and Stone, C. J. (1992). Logspline density estimation for censored data. J. Comput. Graph. Statist. 1 301- 328.
  • Marron, J. S. and Ruppert, D. (1994). Transformations to reduce boundary bias in kernel density estimation. J. Roy. Statist. Soc. Ser. B 56 653-671.
  • Marx, B. D. and Eilers, P. H. C. (1994). Direct generalized additive modelling with penalized likelihood. Paper presented at the 9th Workshop on Statistical Modelling, Exeter, 1994.
  • Marx, B. D. and Eilers, P. H. C. (1996). Direct generalized additive modelling with penalized likelihood. Unpublished manuscript.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
  • O'Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems (with discussion). Statist. Sci. 1 505-527.
  • O'Sullivan, F. (1988). Fast computation of fully automated logdensity and log-hazard estimators. SIAM J. Sci. Statist. Comput. 9 363-379.
  • Reinsch, C. (1967). Smoothing by spline functions. Numer. Math. 10 177-183.
  • Sakamoto, Y., Ishiguro, M. and Kitagawa, G. (1986). Akaike Information Criterion Statistics. Reidel, Dordrecht.
  • Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York.
  • Silverman, B. W. (1985). Some aspects of the spline smoothing approach to nonparametric regression curve fitting (with discussion). J. Roy. Statist. Soc. Ser. B 47 1-52.
  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analy sis. Chapman and Hall, London.
  • Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
  • Wand, M. P. and Jones, M. C. (1993). Kernel Smoothing. Chapman and Hall, London.
  • Whittaker, E. T. (1923). On a new method of graduation. Proc. Edinburgh Math. Soc. 41 63-75.
  • ron and Park (1992). In the following, I provide a brief review to explain the defects and some remedy to the classical selectors for kernel regression estimate. Let us assume the simplest model of a circular design with equally spaced design points. yt = µ xt + t, where t are i.i.d. noise. For the kernel estimate µ with a bandwidth, we often use the mean of sum of squared errors
  • totically equivalent in Rice (1984). All of these procedures rely on the residual sum of squares RSS. Mallows (1973) proposed the procedure based on the observation that
  • Chiu, S.-T. (1996). A comparative review of bandwidth selection for kernel density estimation. Statist. Sinica 6 129-145.
  • Hall, P. and Johnstone, I. (1992). Empirical functionals and efficient smoothing parameter selection. J. Roy. Statist. Soc. Ser. B 54 519-521.
  • Hall, P., Marron, J. S. and Park, B. U. (1992). Smoothed crossvalidation. Probab. Theory Related Fields 92 1-20.
  • Mallows, C. (1973). Some comments on Cp. Technometrics 15 661-675.
  • Rice, J. (1984). Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215-1230.
  • Scott, D. W. and Terrell, G. R. (1987). Biased and unbiased cross-validation in density estimation. J. Amer. Statist. Assoc. 82 1131-1146.
  • 2 U, Q2 = BTB U, = diag and U is an orthogonal matrix such that Q-1/2 2 DTDQ-1/2 2 = U UT. The columns of G can be identified with a new set of functions known as the Demmler- Reinsch (DR) basis. Specifically these are piecewise poly nomial functions, so that the elements of G satisfy xi = Gi. Besides having useful orthogonality properties the DR basis can be ordered by frequency and larger values of will exhibit more oscillations (in fact 1 zero crossings). Figure 1(a) plots several of the basis functions for m = 133 equally spaced x's and 20 equally spaced interior knots. Figure 1(b) illustrates the expected poly nomial increase in the size of as a function of. The Demmler-Reinsch basis provides an informative interpretation of the spline estimate. Let f d
  • and Xiang and Wahba (1996). For the density estimation problem in Section 8, I could not find the definition of the H matrix to understand the AIC proposed, but whatever it is, it should be subject to the same scrutiny before being recommended as "optimal." In ordinary Gaussian regression, the optimality of GCV is well established in the literature. For the AIC score presented in (27), however, I would like some empirical evidence to be convinced of its optimality. The skepticism is partly due to some empirical evidence suggesting that the trace of H may not be a consistent characterization of the effective dimension of the model. Such evidence can be found in Gu (1996), available online at http:// www.stat.lsa.umich.edu/~chong/ps/modl.ps.
  • Barry, D. (1993). Testing for additivity of a regression function. Ann. Statist. 21 235-254.
  • Cox, D. D. and Chang, Y.-F. (1990). Iterated state space algorithms and cross validation for generalized smoothing splines. Technical Report 49, Dept. Statistics, Univ. Illinois.
  • Cox, D. D., Koh, E., Wahba, G. and Yandell, B. S. (1988). Testing the (parametric) null model hy pothesis in (semiparametric) partial and generalized spline models. Ann. Statist. 16 113-119.
  • Gu, C. (1992). Cross validating non Gaussian data. Journal of Computational and Graphical Statistics 1 169-179.
  • Gu, C. (1996). Model indexing and smoothing parameter selection in nonparametric function estimation. Technical Report 93-55 (rev.), Dept. Statistics, Purdue Univ.
  • Wahba, G. (1983). Bayesian "confidence intervals" for the crossvalidated smoothing spline. J. Roy. Statist. Soc. Ser. B 45 133-150.
  • Xiang, D. and Wahba, G. (1996). A generalized approximate cross validation for smoothing splines with non-Gaussian date. Statist. Sinica. To appear.
  • Gijbels, 1996). Conservation of moments seems unimportant. In regression, I do not see the desirability. In density estimation, simple corrections of kernel density estimates for variance inflation exist, but make little difference away from the normal density (Jones,
  • 1991). Indeed, getting means and variances right is a normality-based concept, so corrected kernel estimators act in a normal-driven semiparametric manner. Efron and Tibshirani (1996) propose more sophisticated moment conservation, but initial indications are that this is no better nor worse than alternative semiparametric density estimators (Hjort,
  • 1996). "The computations, including those for crossvalidation, are relatively inexpensive and easily incorporated into standard software." Again, proponents of the two competing methods I have mentioned would claim the same for the first half of this and advocates of regression splines would claim the lot. The authors make no particularly novel contribution to automatic bandwidth selection. Crossvalidation and AIC are in a class of methods (e.g., H¨ardle, 1990, pages 166-167) which, while not being downright bad, allow scope for improvement.
  • 1996). Binning is the major computational device of
  • all kernel-ty pe estimators (Fan and Marron, 1994). The local likelihood approach is already deeply understood theoretically. Comparison of P-splines's reasonable boundary performance with local poly nomials's reasonable boundary performance is not yet available through theory or simulations. An interesting point mentioned in the paper is the apparent continuum between few-parameter parametric fits at one end and fully "nonparametric" techniques at the other, with many-parameter par
  • Ansley, C. F., Kohn, R. and Wong, C. M. (1993). Nonparametric spline regression with prior information. Biometrika 80 75- 88.
  • Efron, B. and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Ann. Statist. 24 000-000.
  • Fan, J. and Gijbels, I. (1996). Local Poly nomial Modelling and Its Applications. Chapman and Hall, London.
  • Fan, J. and Marron, J. S. (1994). Fast implementations of nonparametric curve estimators. J. Comput. Graph. Statist. 3 35-56.
  • Hjort, N. L. (1996). Performance of Efron and Tibshirani's semiparametric denisty estimator. Unpublished manuscript.
  • Hjort, N. L. and Jones, M. C. (1996). Locally parametric nonparametric density estimation. Ann. Statist. 24 1619-1647.
  • Jones, M. C. (1991). On correcting for variance inflation in kernel density estimation. Comput. Statist. Data Anal. 11 3-15.
  • Jones, M. C. (1996). On close relations of local likelihood density estimation. Unpublished manuscript.
  • Loader, C. R. (1996). Local likelihood density estimation. Ann. Statist. 24 1602-1618
  • Marron, J. S. (1996). A personal view of smoothing and statistics (with discussion). Comput. Statist. To appear.
  • Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 90 1257-1270.
  • Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer, New York.
  • timality and minimax properties (Fan, 1993). For density estimation Engel and Gasser (1995) show a minimax property of the fixed bandwith kernel method within a large class of estimators containing penalized likelihood estimators. The presented paper does not provide any argument, neither theoretical nor by simulations, supporting any superiority of P-splines over their many competitors. In the regression case, the theoretical properties of P-splines might be evaluated by combining arguments of de Boor (1978) on the asy mptotic bias and variance of B-splines in (dependence on m, the spline order k and the smoothness of the underlying function) with the well-known results on smoothing splines. The authors propose to use AIC or crossvalidation to select the smoothing parameter. However, a careful look at their method reveals that there are in fact two free parameters: and the number n of knots. If n m, then we essentially obtain a smoothing spline fit, while results might be very different if n m. Indeed, the estimate might crucially depend on n. Therefore, why not determine and n by cross-validation or a related method? The following theoretical arguments may suggest that such a procedure will work. Note that AIC and cross-validation are very close to unbiased risk estimation which consists of estimating the optimal values of and n by minimizing m
  • et al. (1996). Software for the 1992 version, written in C and interfaced to S-PLUS, is publically available from Statlib. (The 1992 version of LOGSPLINE employ s only knot deletion; here, however, we focus on the 1996 version, which uses both knot addition and knot deletion.) LOGSPLINE can provide estimates on both finite and infinite intervals, and it can handle censored data. The results of LOGSPLINE on the Old Faithful data and the suicide data are very similar to the corresponding results of P-splines [the suicide data is an example in Kooperberg and Stone (1992)]. Here we consider a much more challenging data set. The solid line in Figure 1 shows the logspline density estimate based on a random sample of 7,125 annual net incomes in the United Kingdom [Family Expenditure Survey (1968-1983)]. (The data have been rescaled to have mean 1.) The nine knots that were selected by LOGSPLINE are indicated. Note that four of these knots are extremely close to the peak near 0 24. This peak is due to the UK old age pension, which caused many people to have nearly identical incomes. In Kooperberg and Stone (1992) we concluded that the height and location of this peak are accurately estimated by LOGSPLINE. There are several reasons why this data is more challenging than the Old Faithful and suicide data: the data set is much larger, so that it is more of a challenge to computing resources (the LOGSPLINE estimate took 9 seconds on a Sparc 10 workstation); the width of the peak is about 0.02, compared to the range 11.5 of the data; there is a severe outlier (the largest observation is 11.5, the second largest is 7.8); and the rise of the density to the left of the peak is very steep. To get an impression of what the P-splines procedure would yield for this data, I first removed the largest observation so that there would not be any long gaps in the data, reducing the maximum observation to 7.8. The dashed line in Figure 1 is the LOGSPLINE estimate to the data with fixed knots at i/20 × 7 8, for i = 0 1 20 (using 20 intervals, as in most P-spline examples.) The resulting fit should be similar to a P-spline fit with = 0. In this estimate it appears that the narrow peak is completely missed and that, because of the steep rise of the density to the left of the peak and the lack of sufficiently many knots near the peak, two modes are estimated where only one mode exists.
  • ors as in Wahba (1978). In particular, the usual priors are specified independently of sample size, whereas one would want to use more B-splines with a larger sample. Furthermore, the integral of the second derivative squared is easier to interpret from a non-Bayesian perspective than the sum of squares of second differences of B-spline coefficients. I take issue with the authors's claim that their method does not have boundary problems. P-splines are approximately equivalent to smoothing splines
  • which do have boundary effects (Speckman, 1983). To explain, consider minimizing from equation (5),
  • Sheather (1996). We use AIC and get essentially the same amount of smoothing, "a second generation result at a first generation price." Again, we do not wish to imply that AIC is the final answer, but show that it is more useful than sometimes suggested.
  • LR, local regression; LRB, local regression with binning; SS, smoothing splines; SSB, smoothing splines with band solver; RSF, regression splines with fixed knots; RSA, regression splines with adaptive knots; PS, P-splines. The row "Adaptive flexibility available" means that a software implementation is readily available
  • Cook, R. D. and Weisberg, S. (1994). Regression Graphics. Wiley, New York.
  • Eilers, P. H. C. (1988). Autoregressive models with latent variables. In COMPSTAT 1988 Proceedings (D. Edwards and N. E. Raun, eds.). physica-Verlag.
  • Engel, J. and Gasser, T. (1995). A minimax result for a class of nonparametric density estimators. Nonparametric Statistics 4 327-334.
  • Family Expenditure Survey (1968-1983). Annual base tapes
  • and reports (1968-1983). Dept. Employ ment, Statistics Division, Her Majesty's Stationary Office, London.
  • Fan, J. (1993). Local linear regression smoothers and their minimax efficiency. Ann. Statist. 21 196-216.
  • Fan, J. and Gijbels, I. (1995). Data-driven bandwidth selection in local poly nomial fitting: variable bandwidth and spatial adaptation. J. Roy. Statist. Soc. Ser. B 57 371-394.
  • Fan, J., Hall, P., Martin, M. A. and Patil, P. (1996). On local smoothing of nonparametric curve estimators. J. Amer. Statist. Assoc. 91 258-266. Foley, J. D., van Dam, A., Feiner, S. K. and Hughes, J. F.
  • (1996). Computer Graphics: Principles and Practice. Addison-Wesley, Reading, MA.
  • Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141.
  • Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. To appear.
  • Kneip, A. (1994). Ordered linear smoothers. Ann. Statist. 22 835-866.
  • Kooperberg, C., Bose, S. and Stone, C. J. (1997). Poly chotomous regression. J. Amer. Statist. Assoc. To appear. Kooperberg, C., Stone, C. J. and Truong, Y. K. (1995a). Hazard regression. J. Amer. Statist. Assoc. 90 78-94. Kooperberg, C., Stone, C. J. and Truong, Y. K. (1995b). Logspline estimation of a possibly mixed spectral distribution. J. Time Ser. Anal. 16 359-388.
  • Speckman, P. L. (1983). Spline smoothing and optimal rates of convergence in nonparametric regression models. Ann. Statist. 13 970-983. Stone, C. J., Hansen, M., Kooperberg, C. and Truong, Y. K.
  • (1996). Poly nomial splines and their tensor products in extended linear modeling. Ann. Statist. To appear.
  • Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. Roy. Statist. Soc. Ser. B 40 364-372.

See also

  • Includes: S-T. Chiu. Comment.
  • Includes: Douglas Nychka, David Cummins. Comment.
  • Includes: Chong Gu. Comment.
  • Includes: M. C. Jones. Comment.
  • Includes: Joachim Engel, Alois Kneip. Comment.
  • Includes: Charles Kooperberg. Comment.
  • Includes: Dennis D. Cox. Comment.
  • Includes: Stephan R. Sain, David W. Scott. Comment.
  • Includes: Paul H. C. Eilers, Brian D. Marx. Rejoinder.