## Bayesian Analysis

### A Computational Bayesian Method for Estimating the Number of Knots In Regression Splines

Minjung Kyung

#### Abstract

To determine the size of the drug-involved offender population that could be served effectively and efficiently by partnerships between courts and treatment in the United States, a synthetic dataset is created by Bhati and Roman (2009). Because of hidden structure and aggregation necessary to create variables, there exists latent variance that can not be explained fully by a normal random effect model. Semiparametric regression is a well-known and frequently used tool to capture the functional dependence between variables with fixed effect parametric and nonlinear regression. A new Gibbs sampler is developed here for the number and positions of knots in regression splines by expressing semiparametric regression as a linear mixed model with a random effect term for the basis functions. Our Gibbs sampler exploits the properties of the multinomial-Dirichlet distribution, and is shown to be an improvement, in terms of operator norm and efficiency, over add/delete one MCMC algorithms. We find that the Dirichlet distribution with small values of the parameters results in a smaller number of knots and, in general, good fit to the data. This approach is shown to reveal previously unseen structures in the synthetic dataset of Bhati and Roman.

#### Article information

Source
Bayesian Anal. Volume 6, Number 4 (2011), 793-828.

Dates
First available in Project Euclid: 13 June 2012

https://projecteuclid.org/euclid.ba/1339616544

Digital Object Identifier
doi:10.1214/11-BA629

Mathematical Reviews number (MathSciNet)
MR2869965

Zentralblatt MATH identifier
1330.62194

#### Citation

Kyung, Minjung. A Computational Bayesian Method for Estimating the Number of Knots In Regression Splines. Bayesian Anal. 6 (2011), no. 4, 793--828. doi:10.1214/11-BA629. https://projecteuclid.org/euclid.ba/1339616544

#### References

• Aronszajn, N., 1950. “Theory of Reproducing Kernels." Transactions of the American Mathematical Society 68, 337-404.
• Bhati, A. S. and Roman J., 2009. Empirical Investigation of “Going to Scale" in Drug Interventions in the United States, 1990, 2003 [Computer file]. ICPSR26101-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2009-08-26. doi:10.3886/ICPSR26101
• Biller, C., 2000. “Adaptive Bayesian Regression Splines in Semiparametric Generalized Linear Models.” Journal of Computational and Graphical Statistics 9, 122-140.
• Blei, D. M. and Jordan, M. I., 2006. “Variational Inference for Dirichlet Process Mixtures." Bayesian Analysis 1, 121-144.
• Breiman, L., 1991. “The $\prod$ Method for Estimating Multivariate Functions from Noisy Data." Technometrics 33, 125-160.
• Brinkman, N. D., 1981. “Ethanol Fuel - A Single Cylinder Engine Study of Efficiency and Exhaust Emissions." SAE Transactions 90, 1410-1424.
• Carroll, R., 1982. “Adapting for Heteroscedasticity In Linear Models.” The Annals of Statistics 10, 1224-1233.
• Claeskens, G., Krivobokova, T., and Opsomer, J. D., 2009. “Asymptotic Properties of Penalized Spline Estimators." Biometrika 96, 529-544.
• Cleveland, W. S. and Devlin, S. J., 1988. “Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting." Journal of the American Statistical Association 83, 596-610.
• Denison, D. G. T., Mallick, B. K., and Smith, A. F. M., 1998. “Automatic Bayesian Curve Fitting." Journal of the Royal Statistical Society. Series B 60, 333-350.
• DiMatteo, I., Genovese, C. R., and Kass, R. E., 2001. “Bayesian Curve-Fitting with Free-Knot Splines." Biometrika 88, 1055-1071.
• Eilers, P. H. C. and Marx, B. D., 1996. “Flexible Smoothing with $B$-splines and Penalties." Statistical Science 11, 89-102.
• Escobar, M. D. and West, M., 1995. “Bayesian Density Estimation and Inference Using Mixtures." Journal of the American Statistical Association 90, 577-588.
• Fahrmeir L. and Lang, S., 2001. “Bayesian Inference for Generalized Additive Mixed Models Based on Markov Random Field Priors." Journal of the Royal Statistical Society. Series C (Applied Statistics) 50, 201-220.
• French, J. L., Kammann, E. E. and Wand, M.P., 2001. Comment on “Semiparametric Nonlinear Mixed-Effects Models and Their Applications" by Ke and Wang. Journal of the American Statistical Association 96, 1285-1288.
• Friedman, J. H. and Silverman, B. W., 1989. “Flexible Parsimonious Smoothing and Additive Modeling." Technometrics 31, 3-21.
• Girón, F. J., Moreno, E., and Casella, G., 2007. “Objective Bayesian Analysis of Multiple Changepoints for Linear Models." Bayesian Statistics 8 (J. M. Bernardo, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) Oxford University Press 1-27.
• Gramacy, R. B., 2007. “tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models." Journal of Statistical Software 19, Issue 9.
• Gramacy, R. B. and Lee, H. K. H., 2008. “Bayesian Tree Gaussian Process Models With an Application to Computer Modeling." Journal of the American Statistical Association 103, 1119-1130.
• Gray, R. J., 1994. “Spline-Based Tests in Survival Analysis. " Biometrics 50, 640-652.
• Green, P.J., 1995. “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination." Biometrika 82, 711-732.
• Green, P.J. and Silverman, B.W., 1994. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman & Hall, London.
• Gu, C., 2002. Smoothing Spline ANOVA Models. Springer.
• Halpern, E. F., 1973. “Bayesian Spline Regression When the Number of Knots is Unknown." Journal of the Royal Statistical Society, Series B 2, 347-360.
• Härdle, W., Müller, M., Sperlich, S. and Werwatz, A., 2004. Nonparametric and Semiparametric Models, Springer.
• Harville, D., 1976. “Extension Of The Gauss-Markov Theorem To Include The Estimation Of Random Effects." The Annals of Statistics 4, 384-395.
• Hastie, T. J., 1996. “Pseudosplines." Journal of the Royal Statistical Society, Series B 58, 379-396.
• Hastie, T. J. and Tibshirani, R. J., 1990. Generalized Additive Models, Chapman & Hall/CRC
• Hobert, J. P. and Marchev, D., 2008. “A Theoretical Comparison of the Data Augmentation, Marginal Augmentation and PX-DA Algorithms." The Annals of Statistics 36, 532-554.
• Holmes, C. C. and Mallick, B. K., 2001. “Bayesian Regression with Multivariate Linear Splines." Journal of the Royal Stiatistical Society, Series B 63, 3-17.
• Holmes, C. C. and Mallick, B. K., 2003. “Generalized Nonlinear Modeling With Multivariate Free-Knot Regression Splines." Journal of the American Statistical Association 98, 352-368.
• Huang, S. Y. and Lu, H. H.-S. (2001). “Extended Gauss-Markov theorem for nonparametric mixed-effects models. " Journal of Multivariate Analysis 76, 249-266
• Kauermann, G., Krivobokova, T., and Fahrmeir, L., 2009. “Some Asymptotic Results on Generalized Penalized Spline Smoothing." Journal of the Royal Statistical Society, Series B 71, 487-503.
• Ke, C. and Wang, Y., 2001. “Semiparametric Nonlinear Mixed-Effects Models and Their Applications." Journal of the American Statistical Association 96, 1272-1281.
• Kelly, C. and Rice, J., 1990. “Monotone Smoothing with Application to Dose-Response Curves and the Assessment of Synergism." Biometrics 46, 1071-1085.
• Kyung, M, Gill, J, and Casella G, 2009. “Characterizing the Variance Improvement in Linear Dirichlet Random Effects Models.” Statistics and Probability Letters 79, 2343-2350.
• Kyung, M, Gill, J, and Casella G., 2010. “Estimation in Dirichlet Random Effects Models.” Annals of Statistics 38, 979-1009.
• Leitenstorfer, F. and Tutz, G., 2007. “Knot Selection by Boosting Techniques." Computational Statistics and Data Analysis 51, 4605-4621.
• Lindley, D. V., 1968. “The Choice of Variables in Multiple Regression." Journal of the Royal Statistical Society, Series B 1, 31-66.
• Maity, A., Carroll, R. J., Mammen, E., and Chatterjee, N., 2009. “Testing in Semiparametric Models with Interaction, with Applications to Gene-Environment Interaction." Journal of the Royal Statistical Society, Series B 71, 75-96.
• Moreno, E., Casella, G., and Garcia-Ferrer, A., 2005. “An Objective Bayesian Analysis of the Change Point Problem." Stochastic Environmental Research and Risk Assessment 19, 191-204.
• O'Sullivan, F., 1986. “A Statistical Perspective on Ill-Posed Inverse Problems" Statistical Science 1, 502-518.
• Parker, R. L. and Rice, J. A., 1985. Discussion of “Some Aspects of the Spline Smoothing Approach to Non-parametric Regression Curve Fitting" by Silverman. Journal of the Royal Statistical Society, Series B 47, 40-42.
• Pfeffermann, D., 1984. “On Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients." Journal of the Royal Statistical Society, Series B 46, 139-148.
• Ripley, B.D., 1996. Pattern Recognition and Neural Networks. University Press, Cambridge.
• Robinson, P. M., 1988. “Root-$N$-Consistent Semiparametric Regression." Econometrica 56, 931-954.
• Rubin, D. B., 1993. “Discussion: Statistical disclosure limitation." Journal of Official Statistics 2, 461-468.
• Ruppert, D., 2002. “Selecting the Number of Knots for Penalized Splines." Journal of Computational and Graphical Statistics 11, 735-757.
• Ruppert, D. and Carroll, R. J., 2000. “Spatially-Adaptive Penalties for Spline Fitting." Australian and New Zealand Journal of Statistics 42, 205-224.
• Ruppert, D., Wand, M. P. and Carroll, R. J., 2003. Semiparametric Regression, Wiley, New York
• Silverman, B. W., 1985. “Some Aspects of the Spline Smoothing Approach to Non-parametric Regression Curve Fitting." Journal of the Royal Statistical Society, Series B 47, 1-52.
• Stone, C. J., 1985. “Additive Regression and Other Nonparametric Models." The Annals of Statistics 13, 689-705.
• Stone, C. J., Hansen, M. H., Kooperberg, C., and Truong, Y. K., 1997. “Polynomial Splines and Their Tensor Products in Extended Linear Modeling." The Annals of Statistics 25, 1371-1470.
• United States Department of Health and Human Services., 2006. Substance Abuse and Mental Health Services Administration. Office of Applied Studies. National Survey on Drug Use and Health, 2003 [Computer file]. ICPSR04138-v2. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2006-10-17. doi:10.3886/ICPSR04138
• United States Department of Health and Human Services., 2010. National Institutes of Health. National Institute on Drug Abuse. Drug Abuse Treatment Outcome Study (DATOS), 1991-1994: [United States] [Computer file]. ICPSR02258-v5. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010-02-16. doi:10.3886/ICPSR02258
• U.S. Dept. of Justice, National Institute of Justice., 2004. ARRESTEE DRUG ABUSE MONITORING (ADAM) PROGRAM IN THE UNITED STATES, 2003 [Computer file]. ICPSR version. Washington, DC: U.S. Dept. of Justice, National Institute of Justice [producer], 2004. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2004. doi:10.3886/ICPSR04020
• Wahba, G., 1977. “Practical Approximate Solutions To Linear Operator Equations When The Data Are Noisy.” SIAM Journal on Numerical Analysis 14, 651-667.
• Wand, M.P., 2003. “Smoothing and Mixed Models." Computational Statistics 18, 223-249.
• Woods, S., 2006. Generalized Additive Models: An Introduction with R, Chapman & Hall/CRC
• Yin, G., Li, H., and Zeng, D., 2008. “Partially Linear Additive Hazards Regression With Varying Coefficients." Journal of the American Statistical Association 103, 1200-1213.
• Zeng, D. and Lin, D. Y., 2007. “Maximum Likelihood Estimation in Semiparametric Regression Models with Censored Data." Journal of the Royal Statistical Society, Series B 69, 507-564.
• Zhang D. and Davidian, M., 2001. “Linear Mixed Models with Flexivle Distributions of Random Effects for Longitudinal Data." Biometrics 57, 795-802.