Source: Statist. Sci.
Volume 17, Number 1
In many statistical applications, nonparametric modeling can provide
insights into the features of a dataset that are not obtainable by other means.
One successful approach involves the use of (univariate or multivariate) spline
spaces. As a class, these methods have inherited much from classical tools for
parametric modeling. For example, stepwise variable selection with spline basis
terms is a simple scheme for locating knots (breakpoints) in regions where the
data exhibit strong, local features. Similarly, candidate knot configurations
(generated by this or some other search technique), are routinely evaluated
with traditional selection criteria like AIC or BIC. In short, strategies
typically applied in parametric model selection have proved useful in
constructing flexible, low-dimensional models for nonparametric problems.
Until recently, greedy, stepwise procedures were most frequently
suggested in the literature. Research into Bayesian variable selection,
however, has given rise to a number of new spline-based methods that primarily
rely on some form of Markov chain Monte Carlo to identify promising knot
locations. In this paper, we consider various alternatives to greedy,
deterministic schemes, and present a Bayesian framework for studying adaptation
in the context of an extended linear model (ELM). Our major test cases are
Logspline density estimation and (bivariate) Triogram regression models. We
selected these because they illustrate a number of computational and
methodological issues concerning model adaptation that arise in ELMs.
AKAIKE, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control AC-19 716-723.
BESAG, J. and HIGDON, D. (1999). Bayesian inference for agricultural field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 61 691-746.
BREIMAN, L. (1991). The -method for estimating multivariate functions from noisy data. Technometrics 33 125-143.
BREIMAN, L. (1993). Hinging hy perplanes for regression, classification and function approximation. IEEE Trans. Inform. Theory 39 999-1013.
BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE,
C. J. (1984). Classification and Regression Trees. Wadsworth, Pacific Grove, CA.
COURANT, R. (1943). Variational methods for the solution of problems of equilibrium and vibrations. Bull. Amer. Math. Soc. 49 1-23.
Mathematical Reviews (MathSciNet): MR4,200e
DE BOOR, C. (1978). A Practical Guide to Splines. Springer, New York.
DENISON, D. G. T., MALLICK, B. K. and SMITH, A. F. M. (1998a). Automatic Bayesian curve fitting. J. Roy. Statist. Soc. Ser. B 60 333-350.
DENISON, D. G. T., MALLICK, B. K. and SMITH, A. F. M. (1998b). A Bayesian CART algorithm. Biometrika 85 363-377.
Dy N, N., LEVIN, D. and RIPPA, S. (1990a). Data dependent triangulations for piecewise linear interpolation. IMA J. Numer. Anal. 10 137-154.
Dy N, N., LEVIN, D. and RIPPA, S. (1990b). Algorithms for the construction of data dependent triangulations. In Algorithms for Approximation 2 (J. C. Mason and M. G. Cox, eds.) 185- 192. Chapman and Hall, New York.
FRIEDMAN, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141.
FRIEDMAN, J. H. and SILVERMAN, B. W. (1989). Flexible parsimonious smoothing and additive modeling (with discussion). Technometrics 31 3-39.
Mathematical Reviews (MathSciNet): MR997668
GREEN, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732.
GREEN, P. J. and SILVERMAN, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London.
GU, C., BATES, D. M., CHEN, Z. and WAHBA, G. (1989). The computation of generalized cross-validation functions through a Householder tridiagonalization with applications to the fitting of interaction spline models. SIAM J. Matrix Appl. Anal. 10 457-480.
HALPERN, E. F. (1973). Bayesian spline regression when the number of knots is unknown. J. Roy. Statist. Soc. Ser. B 35 347-360.
HANSEN, M. (1994). Extended linear models, multivariate splines and ANOVA. Ph.D. dissertation, Univ. California, Berkeley.
HANSEN, M., KOOPERBERG, C. and SARDY, S. (1998). Triogram models. J. Amer. Statist. Assoc. 93 101-119.
HOLMES, C. C. and MALLICK, B. K. (2001). Bayesian regression with multivariate linear splines. J. Roy. Statist. Soc. Ser. B 63 3-18.
HUANG, J. Z. (1998). Projection estimation in multiple regression with application to functional ANOVA models. Ann. Statist. 26 242-272.
HUANG, J. Z. (2001). Concave extended linear modeling: A theoretical sy nthesis. Statist. Sinica 11 173-197.
JUPP, D. L. B. (1978). Approximation to data by splines with free knots. SIAM J. Numer. Anal. 15 328-343.
KOENKER, R. and MIZERA, I. (2001). Penalized Triograms: Total variation regularization for bivariate smoothing. Technical report. (Available at www.econ.uiuc.edu/roger/research/ goniolatry/gon.html.)
KOOPERBERG, C., BOSE, S. and STONE, C. J. (1997). Poly chotomous regression. J. Amer. Statist. Assoc. 92 117-127.
KOOPERBERG, C. and STONE, C. J. (1991). A study of logspline density estimation. Comput. Statist. Data Anal. 12 327-347.
KOOPERBERG, C. and STONE, C. J. (1992). Logspline density estimation for censored data. J. Comput. Graph. Statist. 1 301-328.
KOOPERBERG, C. and STONE, C. J. (2002). Comparison of parametric, bootstrap, and Bayesian approaches to obtaining confidence intervals for logspline density estimation. Unpublished manuscript.
KOOPERBERG, C. and STONE, C. J. (2002). Confidence intervals for logspline density estimation. Available at http://bear. fhcrc.org/ clk/ref.html.
LINDSTROM, M. (1999). Penalized estimation of free-knot splines. J. Comput. Graph. Statist. 8 333-352.
NICHOLLS, G. (1998). Bayesian image analysis with Markov chain Monte Carlo and colored continuum triangulation models. J. Roy. Statist. Soc. Ser. B 60 643-659.
QUAK, E. and SCHUMAKER, L. L. (1991). Least squares fitting by linear splines on data dependent triangulations. In Curves and Surfaces (P. J. Laurent, A. Le Méhauté and L. L. Schumaker, eds.) 387-390. Academic Press, New York.
SCHUMAKER, L. L. (1993). Spline Functions: Basic Theory. Wiley, New York.
SCHWARZ, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464.
Mathematical Reviews (MathSciNet): MR468014
SIBSON, R. (1978). Locally equiangular triangulations. Computer Journal 21 243-245.
SILVERMAN, B. W. (1985). Some aspects of the spline smoothing approach to nonparametric regression curve fitting (with discussion). J. Roy. Statist. Soc. Ser. B 47 1-52.
SMITH, M. (1996). Nonparametric regression: A Markov chain Monte Carlo approach. Ph.D. dissertation, Univ. New South Wales, Australia.
SMITH, M. and KOHN, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317-344.
SMITH, M. and KOHN, R. (1998). Nonparametric estimation of irregular functions with independent or autocorrelated errors. In Practical Nonparametric and Semiparametric Bayesian Statistics (D. Dey, P. Müller and D. Sinha, eds.) 133-150. Springer, New York.
SMITH, P. L. (1982a). Curve fitting and modeling with splines using statistical variable selection techniques. Report NASA 166034, NASA, Langley Research Center, Hampton, VA.
SMITH, P. L. (1982b). Hy pothesis testing in B-spline regression. Comm. Statist. Part B-Simulation and Comput. 11 143-157.
Mathematical Reviews (MathSciNet): MR649960
STONE, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689-705.
Mathematical Reviews (MathSciNet): MR790566
STONE, C. J. (1994). The use of poly nomial splines and their tensor products in multivariate function estimation (with discussion). Ann. Statist. 22 118-184.
STONE, C. J., HANSEN M., KOOPERBERG, C. and TRUONG, Y. K.
(1997). Poly nomial splines and their tensor products in extended linear modeling (with discussion). Ann. Statist. 25 1371-1470.
STONE, C. J. and HUANG, J. Z. (2002). Free knot splines in concave extended linear modeling. J. Statist. Plann. Inference. To appear.
STONE, C. J. and KOO, C.-Y. (1986). Logspline density estimation. Contemp. Math. 59 1-15.
Mathematical Reviews (MathSciNet): MR870445
WAHBA, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.