Statistical Science

Spline Adaptation in Extended Linear Models (with comments and a rejoinder by the authors

Mark H. Hansen and Charles Kooperberg

Full-text: Open access

Abstract

In many statistical applications, nonparametric modeling can provide insights into the features of a dataset that are not obtainable by other means. One successful approach involves the use of (univariate or multivariate) spline spaces. As a class, these methods have inherited much from classical tools for parametric modeling. For example, stepwise variable selection with spline basis terms is a simple scheme for locating knots (breakpoints) in regions where the data exhibit strong, local features. Similarly, candidate knot configurations (generated by this or some other search technique), are routinely evaluated with traditional selection criteria like AIC or BIC. In short, strategies typically applied in parametric model selection have proved useful in constructing flexible, low-dimensional models for nonparametric problems.

Until recently, greedy, stepwise procedures were most frequently suggested in the literature. Research into Bayesian variable selection, however, has given rise to a number of new spline-based methods that primarily rely on some form of Markov chain Monte Carlo to identify promising knot locations. In this paper, we consider various alternatives to greedy, deterministic schemes, and present a Bayesian framework for studying adaptation in the context of an extended linear model (ELM). Our major test cases are Logspline density estimation and (bivariate) Triogram regression models. We selected these because they illustrate a number of computational and methodological issues concerning model adaptation that arise in ELMs.

Article information

Source
Statist. Sci., Volume 17, Number 1 (2002), 2-51.

Dates
First available in Project Euclid: 11 June 2002

Permanent link to this document
https://projecteuclid.org/euclid.ss/1023798997

Digital Object Identifier
doi:10.1214/ss/1023798997

Mathematical Reviews number (MathSciNet)
MR1910073

Zentralblatt MATH identifier
1013.62044

Keywords
Adaptive triangulations AIC BIC density estimation extended linear models finite elements free knot splines GCV linear splines multivariate splines regression

Citation

Hansen, Mark H.; Kooperberg, Charles. Spline Adaptation in Extended Linear Models (with comments and a rejoinder by the authors. Statist. Sci. 17 (2002), no. 1, 2--51. doi:10.1214/ss/1023798997. https://projecteuclid.org/euclid.ss/1023798997


Export citation

References

  • AKAIKE, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control AC-19 716-723.
  • BESAG, J. and HIGDON, D. (1999). Bayesian inference for agricultural field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 61 691-746.
  • BREIMAN, L. (1991). The -method for estimating multivariate functions from noisy data. Technometrics 33 125-143.
  • BREIMAN, L. (1993). Hinging hy perplanes for regression, classification and function approximation. IEEE Trans. Inform. Theory 39 999-1013.
  • BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE,
  • C. J. (1984). Classification and Regression Trees. Wadsworth, Pacific Grove, CA.
  • COURANT, R. (1943). Variational methods for the solution of problems of equilibrium and vibrations. Bull. Amer. Math. Soc. 49 1-23.
  • DE BOOR, C. (1978). A Practical Guide to Splines. Springer, New York.
  • DENISON, D. G. T., MALLICK, B. K. and SMITH, A. F. M. (1998a). Automatic Bayesian curve fitting. J. Roy. Statist. Soc. Ser. B 60 333-350.
  • DENISON, D. G. T., MALLICK, B. K. and SMITH, A. F. M. (1998b). A Bayesian CART algorithm. Biometrika 85 363-377.
  • Dy N, N., LEVIN, D. and RIPPA, S. (1990a). Data dependent triangulations for piecewise linear interpolation. IMA J. Numer. Anal. 10 137-154.
  • Dy N, N., LEVIN, D. and RIPPA, S. (1990b). Algorithms for the construction of data dependent triangulations. In Algorithms for Approximation 2 (J. C. Mason and M. G. Cox, eds.) 185- 192. Chapman and Hall, New York.
  • FRIEDMAN, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141.
  • FRIEDMAN, J. H. and SILVERMAN, B. W. (1989). Flexible parsimonious smoothing and additive modeling (with discussion). Technometrics 31 3-39.
  • GREEN, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732.
  • GREEN, P. J. and SILVERMAN, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London.
  • GU, C., BATES, D. M., CHEN, Z. and WAHBA, G. (1989). The computation of generalized cross-validation functions through a Householder tridiagonalization with applications to the fitting of interaction spline models. SIAM J. Matrix Appl. Anal. 10 457-480.
  • HALPERN, E. F. (1973). Bayesian spline regression when the number of knots is unknown. J. Roy. Statist. Soc. Ser. B 35 347-360.
  • HANSEN, M. (1994). Extended linear models, multivariate splines and ANOVA. Ph.D. dissertation, Univ. California, Berkeley.
  • HANSEN, M., KOOPERBERG, C. and SARDY, S. (1998). Triogram models. J. Amer. Statist. Assoc. 93 101-119.
  • HOLMES, C. C. and MALLICK, B. K. (2001). Bayesian regression with multivariate linear splines. J. Roy. Statist. Soc. Ser. B 63 3-18.
  • HUANG, J. Z. (1998). Projection estimation in multiple regression with application to functional ANOVA models. Ann. Statist. 26 242-272.
  • HUANG, J. Z. (2001). Concave extended linear modeling: A theoretical sy nthesis. Statist. Sinica 11 173-197.
  • JUPP, D. L. B. (1978). Approximation to data by splines with free knots. SIAM J. Numer. Anal. 15 328-343.
  • KOENKER, R. and MIZERA, I. (2001). Penalized Triograms: Total variation regularization for bivariate smoothing. Technical report. (Available at www.econ.uiuc.edu/roger/research/ goniolatry/gon.html.)
  • KOOPERBERG, C., BOSE, S. and STONE, C. J. (1997). Poly chotomous regression. J. Amer. Statist. Assoc. 92 117-127.
  • KOOPERBERG, C. and STONE, C. J. (1991). A study of logspline density estimation. Comput. Statist. Data Anal. 12 327-347.
  • KOOPERBERG, C. and STONE, C. J. (1992). Logspline density estimation for censored data. J. Comput. Graph. Statist. 1 301-328.
  • KOOPERBERG, C. and STONE, C. J. (2002). Comparison of parametric, bootstrap, and Bayesian approaches to obtaining confidence intervals for logspline density estimation. Unpublished manuscript.
  • KOOPERBERG, C. and STONE, C. J. (2002). Confidence intervals for logspline density estimation. Available at http://bear. fhcrc.org/ clk/ref.html.
  • LINDSTROM, M. (1999). Penalized estimation of free-knot splines. J. Comput. Graph. Statist. 8 333-352.
  • NICHOLLS, G. (1998). Bayesian image analysis with Markov chain Monte Carlo and colored continuum triangulation models. J. Roy. Statist. Soc. Ser. B 60 643-659.
  • QUAK, E. and SCHUMAKER, L. L. (1991). Least squares fitting by linear splines on data dependent triangulations. In Curves and Surfaces (P. J. Laurent, A. Le Méhauté and L. L. Schumaker, eds.) 387-390. Academic Press, New York.
  • SCHUMAKER, L. L. (1993). Spline Functions: Basic Theory. Wiley, New York.
  • SCHWARZ, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464.
  • SIBSON, R. (1978). Locally equiangular triangulations. Computer Journal 21 243-245.
  • SILVERMAN, B. W. (1985). Some aspects of the spline smoothing approach to nonparametric regression curve fitting (with discussion). J. Roy. Statist. Soc. Ser. B 47 1-52.
  • SMITH, M. (1996). Nonparametric regression: A Markov chain Monte Carlo approach. Ph.D. dissertation, Univ. New South Wales, Australia.
  • SMITH, M. and KOHN, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317-344.
  • SMITH, M. and KOHN, R. (1998). Nonparametric estimation of irregular functions with independent or autocorrelated errors. In Practical Nonparametric and Semiparametric Bayesian Statistics (D. Dey, P. Müller and D. Sinha, eds.) 133-150. Springer, New York.
  • SMITH, P. L. (1982a). Curve fitting and modeling with splines using statistical variable selection techniques. Report NASA 166034, NASA, Langley Research Center, Hampton, VA.
  • SMITH, P. L. (1982b). Hy pothesis testing in B-spline regression. Comm. Statist. Part B-Simulation and Comput. 11 143-157.
  • STONE, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689-705.
  • STONE, C. J. (1994). The use of poly nomial splines and their tensor products in multivariate function estimation (with discussion). Ann. Statist. 22 118-184.
  • STONE, C. J., HANSEN M., KOOPERBERG, C. and TRUONG, Y. K.
  • (1997). Poly nomial splines and their tensor products in extended linear modeling (with discussion). Ann. Statist. 25 1371-1470.
  • STONE, C. J. and HUANG, J. Z. (2002). Free knot splines in concave extended linear modeling. J. Statist. Plann. Inference. To appear.
  • STONE, C. J. and KOO, C.-Y. (1986). Logspline density estimation. Contemp. Math. 59 1-15.
  • WAHBA, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.