A Computational Bayesian Method for Estimating the Number of Knots In Regression Splines

Minjung Kyung

doi:10.1214/11-BA629

December 2011 A Computational Bayesian Method for Estimating the Number of Knots In Regression Splines

Minjung Kyung

Bayesian Anal. 6(4): 793-828 (December 2011). DOI: 10.1214/11-BA629

Abstract

To determine the size of the drug-involved offender population that could be served effectively and efficiently by partnerships between courts and treatment in the United States, a synthetic dataset is created by Bhati and Roman (2009). Because of hidden structure and aggregation necessary to create variables, there exists latent variance that can not be explained fully by a normal random effect model. Semiparametric regression is a well-known and frequently used tool to capture the functional dependence between variables with fixed effect parametric and nonlinear regression. A new Gibbs sampler is developed here for the number and positions of knots in regression splines by expressing semiparametric regression as a linear mixed model with a random effect term for the basis functions. Our Gibbs sampler exploits the properties of the multinomial-Dirichlet distribution, and is shown to be an improvement, in terms of operator norm and efficiency, over add/delete one MCMC algorithms. We find that the Dirichlet distribution with small values of the parameters results in a smaller number of knots and, in general, good fit to the data. This approach is shown to reveal previously unseen structures in the synthetic dataset of Bhati and Roman.

Citation

Download Citation

Minjung Kyung. "A Computational Bayesian Method for Estimating the Number of Knots In Regression Splines." Bayesian Anal. 6 (4) 793 - 828, December 2011. https://doi.org/10.1214/11-BA629