The Annals of Statistics
- Ann. Statist.
- Volume 28, Number 6 (2000), 1570-1600.
Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV
We propose the randomized Generalized Approximate Cross Validation (ranGACV) method for choosing multiple smoothing parameters in penalized likelihood estimates for Bernoulli data. The method is intended for application with penalized likelihood smoothing spline ANOVA models. In addition we propose a class of approximate numerical methods for solving the penalized likelihood variational problem which, in conjunction with the ranGACV method allows the application of smoothing spline ANOVA models with Bernoulli data to much larger data sets than previously possible. These methods are based on choosing an approximating subset of the natural (representer) basis functions for the variational problem. Simulation studies with synthetic data, including synthetic data mimicking demographic risk factor data sets is used to examine the properties of the method and to compare the approach with the GRKPACK code of Wang (1997c). Bayesian “confidence intervals” are obtained for the fits and are shown in the simulation studies to have the “across the function” property usually claimed for these confidence intervals. Finally the method is applied to an observational data set from the Beaver Dam Eye study, with scientifically interesting results.
Ann. Statist., Volume 28, Number 6 (2000), 1570-1600.
First available in Project Euclid: 12 March 2002
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Primary: 62G07: Density estimation 92C60: Medical epidemiology 68T05: Learning and adaptive systems [See also 68Q32, 91E40] 65D07: Splines 65D10: Smoothing, curve fitting 62A99: None of the above, but in this section 62J07: Ridge regression; shrinkage estimators
Secondary: 41A63: Multidimensional problems (should also be assigned at least one other classification number in this section) 41A15: Spline approximation 62G07: Density estimation 62M30: Spatial processes 65D15: Algorithms for functional approximation 92H25 49M15: Newton-type methods
Lin, Xiwu; Wahba, Grace; Xiang, Dong; Gao, Fangyu; Klein, Ronald; Klein, Barbara. Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV. Ann. Statist. 28 (2000), no. 6, 1570--1600. doi:10.1214/aos/1015957471. https://projecteuclid.org/euclid.aos/1015957471