Bayesian Analysis

A Computational Bayesian Method for Estimating the Number of Knots In Regression Splines

Minjung Kyung

Full-text: Open access

Abstract

To determine the size of the drug-involved offender population that could be served effectively and efficiently by partnerships between courts and treatment in the United States, a synthetic dataset is created by Bhati and Roman (2009). Because of hidden structure and aggregation necessary to create variables, there exists latent variance that can not be explained fully by a normal random effect model. Semiparametric regression is a well-known and frequently used tool to capture the functional dependence between variables with fixed effect parametric and nonlinear regression. A new Gibbs sampler is developed here for the number and positions of knots in regression splines by expressing semiparametric regression as a linear mixed model with a random effect term for the basis functions. Our Gibbs sampler exploits the properties of the multinomial-Dirichlet distribution, and is shown to be an improvement, in terms of operator norm and efficiency, over add/delete one MCMC algorithms. We find that the Dirichlet distribution with small values of the parameters results in a smaller number of knots and, in general, good fit to the data. This approach is shown to reveal previously unseen structures in the synthetic dataset of Bhati and Roman.

Article information

Source
Bayesian Anal. Volume 6, Number 4 (2011), 793-828.

Dates
First available in Project Euclid: 13 June 2012

Permanent link to this document
https://projecteuclid.org/euclid.ba/1339616544

Digital Object Identifier
doi:10.1214/11-BA629

Mathematical Reviews number (MathSciNet)
MR2869965

Zentralblatt MATH identifier
1330.62194

Keywords
Regression Splines Multinomial-Dirichlet distribution Bayesian Semiparametric Regression

Citation

Kyung, Minjung. A Computational Bayesian Method for Estimating the Number of Knots In Regression Splines. Bayesian Anal. 6 (2011), no. 4, 793--828. doi:10.1214/11-BA629. https://projecteuclid.org/euclid.ba/1339616544


Export citation

References

  • Aronszajn, N., 1950. “Theory of Reproducing Kernels." Transactions of the American Mathematical Society 68, 337-404.
  • Bhati, A. S. and Roman J., 2009. Empirical Investigation of “Going to Scale" in Drug Interventions in the United States, 1990, 2003 [Computer file]. ICPSR26101-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-08-26. doi:10.3886/ICPSR26101
  • Biller, C., 2000. “Adaptive Bayesian Regression Splines in Semiparametric Generalized Linear Models.” Journal of Computational and Graphical Statistics 9, 122-140.
  • Blei, D. M. and Jordan, M. I., 2006. “Variational Inference for Dirichlet Process Mixtures." Bayesian Analysis 1, 121-144.
  • Breiman, L., 1991. “The $\prod$ Method for Estimating Multivariate Functions from Noisy Data." Technometrics 33, 125-160.
  • Brinkman, N. D., 1981. “Ethanol Fuel - A Single Cylinder Engine Study of Efficiency and Exhaust Emissions." SAE Transactions 90, 1410-1424.
  • Carroll, R., 1982. “Adapting for Heteroscedasticity In Linear Models.” The Annals of Statistics 10, 1224-1233.
  • Claeskens, G., Krivobokova, T., and Opsomer, J. D., 2009. “Asymptotic Properties of Penalized Spline Estimators." Biometrika 96, 529-544.
  • Cleveland, W. S. and Devlin, S. J., 1988. “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting." Journal of the American Statistical Association 83, 596-610.
  • Denison, D. G. T., Mallick, B. K., and Smith, A. F. M., 1998. “Automatic Bayesian Curve Fitting." Journal of the Royal Statistical Society. Series B 60, 333-350.
  • DiMatteo, I., Genovese, C. R., and Kass, R. E., 2001. “Bayesian Curve-Fitting with Free-Knot Splines." Biometrika 88, 1055-1071.
  • Eilers, P. H. C. and Marx, B. D., 1996. “Flexible Smoothing with $B$-splines and Penalties." Statistical Science 11, 89-102.
  • Escobar, M. D. and West, M., 1995. “Bayesian Density Estimation and Inference Using Mixtures." Journal of the American Statistical Association 90, 577-588.
  • Fahrmeir L. and Lang, S., 2001. “Bayesian Inference for Generalized Additive Mixed Models Based on Markov Random Field Priors." Journal of the Royal Statistical Society. Series C (Applied Statistics) 50, 201-220.
  • French, J. L., Kammann, E. E. and Wand, M.P., 2001. Comment on “Semiparametric Nonlinear Mixed-Effects Models and Their Applications" by Ke and Wang. Journal of the American Statistical Association 96, 1285-1288.
  • Friedman, J. H. and Silverman, B. W., 1989. “Flexible Parsimonious Smoothing and Additive Modeling." Technometrics 31, 3-21.
  • Girón, F. J., Moreno, E., and Casella, G., 2007. “Objective Bayesian Analysis of Multiple Changepoints for Linear Models." Bayesian Statistics 8 (J. M. Bernardo, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) Oxford University Press 1-27.
  • Gramacy, R. B., 2007. “tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models." Journal of Statistical Software 19, Issue 9.
  • Gramacy, R. B. and Lee, H. K. H., 2008. “Bayesian Tree Gaussian Process Models With an Application to Computer Modeling." Journal of the American Statistical Association 103, 1119-1130.
  • Gray, R. J., 1994. “Spline-Based Tests in Survival Analysis. " Biometrics 50, 640-652.
  • Green, P.J., 1995. “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination." Biometrika 82, 711-732.
  • Green, P.J. and Silverman, B.W., 1994. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman & Hall, London.
  • Gu, C., 2002. Smoothing Spline ANOVA Models. Springer.
  • Halpern, E. F., 1973. “Bayesian Spline Regression When the Number of Knots is Unknown." Journal of the Royal Statistical Society, Series B 2, 347-360.
  • Härdle, W., Müller, M., Sperlich, S. and Werwatz, A., 2004. Nonparametric and Semiparametric Models, Springer.
  • Harville, D., 1976. “Extension Of The Gauss-Markov Theorem To Include The Estimation Of Random Effects." The Annals of Statistics 4, 384-395.
  • Hastie, T. J., 1996. “Pseudosplines." Journal of the Royal Statistical Society, Series B 58, 379-396.
  • Hastie, T. J. and Tibshirani, R. J., 1990. Generalized Additive Models, Chapman & Hall/CRC
  • Hobert, J. P. and Marchev, D., 2008. “A Theoretical Comparison of the Data Augmentation, Marginal Augmentation and PX-DA Algorithms." The Annals of Statistics 36, 532-554.
  • Holmes, C. C. and Mallick, B. K., 2001. “Bayesian Regression with Multivariate Linear Splines." Journal of the Royal Stiatistical Society, Series B 63, 3-17.
  • Holmes, C. C. and Mallick, B. K., 2003. “Generalized Nonlinear Modeling With Multivariate Free-Knot Regression Splines." Journal of the American Statistical Association 98, 352-368.
  • Huang, S. Y. and Lu, H. H.-S. (2001). “Extended Gauss-Markov theorem for nonparametric mixed-effects models. " Journal of Multivariate Analysis 76, 249-266
  • Kauermann, G., Krivobokova, T., and Fahrmeir, L., 2009. “Some Asymptotic Results on Generalized Penalized Spline Smoothing." Journal of the Royal Statistical Society, Series B 71, 487-503.
  • Ke, C. and Wang, Y., 2001. “Semiparametric Nonlinear Mixed-Effects Models and Their Applications." Journal of the American Statistical Association 96, 1272-1281.
  • Kelly, C. and Rice, J., 1990. “Monotone Smoothing with Application to Dose-Response Curves and the Assessment of Synergism." Biometrics 46, 1071-1085.
  • Kyung, M, Gill, J, and Casella G, 2009. “Characterizing the Variance Improvement in Linear Dirichlet Random Effects Models.” Statistics and Probability Letters 79, 2343-2350.
  • Kyung, M, Gill, J, and Casella G., 2010. “Estimation in Dirichlet Random Effects Models.” Annals of Statistics 38, 979-1009.
  • Leitenstorfer, F. and Tutz, G., 2007. “Knot Selection by Boosting Techniques." Computational Statistics and Data Analysis 51, 4605-4621.
  • Lindley, D. V., 1968. “The Choice of Variables in Multiple Regression." Journal of the Royal Statistical Society, Series B 1, 31-66.
  • Maity, A., Carroll, R. J., Mammen, E., and Chatterjee, N., 2009. “Testing in Semiparametric Models with Interaction, with Applications to Gene-Environment Interaction." Journal of the Royal Statistical Society, Series B 71, 75-96.
  • Moreno, E., Casella, G., and Garcia-Ferrer, A., 2005. “An Objective Bayesian Analysis of the Change Point Problem." Stochastic Environmental Research and Risk Assessment 19, 191-204.
  • O'Sullivan, F., 1986. “A Statistical Perspective on Ill-Posed Inverse Problems" Statistical Science 1, 502-518.
  • Parker, R. L. and Rice, J. A., 1985. Discussion of “Some Aspects of the Spline Smoothing Approach to Non-parametric Regression Curve Fitting" by Silverman. Journal of the Royal Statistical Society, Series B 47, 40-42.
  • Pfeffermann, D., 1984. “On Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients." Journal of the Royal Statistical Society, Series B 46, 139-148.
  • Ripley, B.D., 1996. Pattern Recognition and Neural Networks. University Press, Cambridge.
  • Robinson, P. M., 1988. “Root-$N$-Consistent Semiparametric Regression." Econometrica 56, 931-954.
  • Rubin, D. B., 1993. “Discussion: Statistical disclosure limitation." Journal of Official Statistics 2, 461-468.
  • Ruppert, D., 2002. “Selecting the Number of Knots for Penalized Splines." Journal of Computational and Graphical Statistics 11, 735-757.
  • Ruppert, D. and Carroll, R. J., 2000. “Spatially-Adaptive Penalties for Spline Fitting." Australian and New Zealand Journal of Statistics 42, 205-224.
  • Ruppert, D., Wand, M. P. and Carroll, R. J., 2003. Semiparametric Regression, Wiley, New York
  • Silverman, B. W., 1985. “Some Aspects of the Spline Smoothing Approach to Non-parametric Regression Curve Fitting." Journal of the Royal Statistical Society, Series B 47, 1-52.
  • Stone, C. J., 1985. “Additive Regression and Other Nonparametric Models." The Annals of Statistics 13, 689-705.
  • Stone, C. J., Hansen, M. H., Kooperberg, C., and Truong, Y. K., 1997. “Polynomial Splines and Their Tensor Products in Extended Linear Modeling." The Annals of Statistics 25, 1371-1470.
  • United States Department of Health and Human Services., 2006. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Survey on Drug Use and Health, 2003 [Computer file]. ICPSR04138-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2006-10-17. doi:10.3886/ICPSR04138
  • United States Department of Health and Human Services., 2010. National Institutes of Health. National Institute on Drug Abuse. Drug Abuse Treatment Outcome Study (DATOS), 1991-1994: [United States] [Computer file]. ICPSR02258-v5. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-02-16. doi:10.3886/ICPSR02258
  • U.S. Dept. of Justice, National Institute of Justice., 2004. ARRESTEE DRUG ABUSE MONITORING (ADAM) PROGRAM IN THE UNITED STATES, 2003 [Computer file]. ICPSR version. Washington, DC: U.S. Dept. of Justice, National Institute of Justice [producer], 2004. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2004. doi:10.3886/ICPSR04020
  • Wahba, G., 1977. “Practical Approximate Solutions To Linear Operator Equations When The Data Are Noisy.” SIAM Journal on Numerical Analysis 14, 651-667.
  • Wand, M.P., 2003. “Smoothing and Mixed Models." Computational Statistics 18, 223-249.
  • Woods, S., 2006. Generalized Additive Models: An Introduction with R, Chapman & Hall/CRC
  • Yin, G., Li, H., and Zeng, D., 2008. “Partially Linear Additive Hazards Regression With Varying Coefficients." Journal of the American Statistical Association 103, 1200-1213.
  • Zeng, D. and Lin, D. Y., 2007. “Maximum Likelihood Estimation in Semiparametric Regression Models with Censored Data." Journal of the Royal Statistical Society, Series B 69, 507-564.
  • Zhang D. and Davidian, M., 2001. “Linear Mixed Models with Flexivle Distributions of Random Effects for Longitudinal Data." Biometrics 57, 795-802.