Electronic Journal of Statistics

Penalized wavelets: Embedding wavelets into semiparametric regression

M.P. Wand and J.T. Ormerod

Full-text: Open access

Abstract

We introduce the concept of penalized wavelets to facilitate seamless embedding of wavelets into semiparametric regression models. In particular, we show that penalized wavelets are analogous to penalized splines; the latter being the established approach to function estimation in semiparametric regression. They differ only in the type of penalization that is appropriate. This fact is not borne out by the existing wavelet literature, where the regression modelling and fitting issues are overshadowed by computational issues such as efficiency gains afforded by the Discrete Wavelet Transform and partially obscured by a tendency to work in the wavelet coefficient space. With penalized wavelet structure in place, we then show that fitting and inference can be achieved via the same general approaches used for penalized splines: penalized least squares, maximum likelihood and best prediction within a frequentist mixed model framework, and Markov chain Monte Carlo and mean field variational Bayes within a Bayesian framework. Penalized wavelets are also shown have a close relationship with wide data (“pn”) regression and benefit from ongoing research on that topic.

Article information

Source
Electron. J. Statist. Volume 5 (2011), 1654-1717.

Dates
First available in Project Euclid: 13 December 2011

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1323785605

Digital Object Identifier
doi:10.1214/11-EJS652

Mathematical Reviews number (MathSciNet)
MR2870147

Zentralblatt MATH identifier
1271.62089

Keywords
Bayesian inference best prediction generalized additive models Gibbs sampling maximum likelihood estimation Markov chain Monte Carlo mean field variational Bayes sparseness-inducing penalty wide data regression

Citation

Wand, M.P.; Ormerod, J.T. Penalized wavelets: Embedding wavelets into semiparametric regression. Electron. J. Statist. 5 (2011), 1654--1717. doi:10.1214/11-EJS652. https://projecteuclid.org/euclid.ejs/1323785605


Export citation

References

  • [1] Albert, J.H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data., Journal of the American Statistical Association, 88, 669–679.
  • [2] Antoniadis, A., Bigot, J. and Gijbels, I. (2007). Penalized wavelet monotone regression., Statistics and Probability Letters, 77, 1608–1621.
  • [3] Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations (with discussion)., Journal of the American Statistical Association, 96, 939–967.
  • [4] Antoniadis, A. and Leblanc, F. (2000). Nonparametric wavelet regression for binary response., Statistics, 34, 183–213.
  • [5] Attias, H. (1999). Inferring parameters and structure of latent variable models by variational Bayes., Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 21–30.
  • [6] Aykroyd, R.G. and Mardia, K.V. (2003). A wavelet approach to shape analysis for spinal curves., Journal of Applied Statistics, 30, 605–623.
  • [7] Berry, S.M., Carroll, R.J. and Ruppert, D. (2002). Bayesian smoothing and regression splines for measurement error problems., Journal of the American Statistical Association, 97, 160–169.
  • [8] Bishop, C.M. (2006)., Pattern Recognition and Machine Learning. New York: Springer.
  • [9] Breheny, P. (2011). ncvreg 2.3. Regularization paths for SCAD- and MCP-penalized regression models. R package., http://cran.r-project.org
  • [10] Brumback, B.A. and Rice, J.A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with discussion)., Journal of the American Statistical Association, 93, 961–994.
  • [11] Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models., The Annals of Statistics, 17, 453–510.
  • [12] Carvalho, C.M., Polson, N.G. and Scott, J.G. (2010). The horseshoe estimator for sparse signals., Biometrika, 97, 465–480.
  • [13] Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation., Numerische Mathematik, 31, 377–403.
  • [14] Currie, I.D. and Durbán, M. (2002). Flexible smoothing with P-splines: a unified approach., Statistical Modelling, 4, 333–349.
  • [15] Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets., Communications on Pure and Applied Mathematics, 41, 909–996.
  • [16] Donoho, D.L. (1995). De-noising by soft-thresholding., IEEE Transactions on Information Theory, 41, 613–627.
  • [17] Donoho, D.L. and Johnstone, I.M. (1994). Ideal spatial adaptation by wavelet shrinkage., Biometrika, 81, 425–456.
  • [18] Durbán, M., Harezlak, J., Wand, M.P. and Carroll, R.J. (2005). Simple fitting of subject-specific curves for longitudinal data., Statistics in Medicine, 24, 1153–1167.
  • [19] Efron, B. (2004). The estimation of prediction error: covariance penalties and cross-validation (with discussion)., Journal of the American Statistical Association, 99, 619–642.
  • [20] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression., The Annals of Statistics, 32, 407–451.
  • [21] Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties (with discussion)., Statistical Science, 11, 89–121.
  • [22] Faes, C., Ormerod, J.T. and Wand, M.P. (2011). Variational Bayesian inference for parametric and nonparametric regression with missing data., Journal of the American Statistical Association, 106, 959–971.
  • [23] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association, 96, 1348–1360.
  • [24] Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality., The Annals of Statistics, 38, 3567–3604.
  • [25] Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G. (Eds.) (2008)., Longitudinal Data Analysis: A Handbook of Modern Statistical Methods. Boca Raton, Florida: Chapman & Hall/CRC.
  • [26] Frank, I.E. and Friedman, J.H. (1993). A statistical view of some chemometrics regression tools., Technometrics, 35, 109–135.
  • [27] Friedman, J., Hastie, T. and Tibshirani, R. (2009). glmnet 1.1: lasso and elastic-net regularized generalized linear models. R package., http://cran.r-project.org
  • [28] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent., Journal of Statistical Software, Volume 33, Issue 1, 1–22.
  • [29] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models., Bayesian Analysis, 1, 515–533.
  • [30] Gradshteyn, I.S. and Ryzhik, I.M. (1994)., Tables of Integrals, Series, and Products, 5th Edition. San Diego, California: Academic Press.
  • [31] Green, P.J. and Silverman, B.W. (1994)., Nonparametric Regression and Generalized Linear Models. London: Chapman and Hall.
  • [32] Griffin, J.E. and Brown, P.J. (2011). Bayesian hyper lassos with non-convex penalization., Australian and New Zealand Journal of Statistics, to appear.
  • [33] Hart, J.P., McCurdy, M.R., Ezhil, M., Wei W., Khan, M., Luo, D., Munden, R.F., Johnson, V.E. and Guerrero, T.M. (2008). Radiation pneumonitis: correlation and toxicity with pulmonary metabolic radiation response., International Journal of Radiation Oncology, Biology, Physics, 4, 967–971.
  • [34] Hastie, T. (1996). Pseudosplines., Journal of the Royal Statistical Society, Series B, 58, 379–396.
  • [35] Hastie, T. and Efron, B. (2007). lars 0.9. Least angle regression, lasso and forward stagewise regression. R package., http://cran.r-project.org
  • [36] Hastie, T.J. and Tibshirani, R.J. (1990)., Generalized Additive Models. London: Chapman and Hall.
  • [37] Hastie, T., Tibshirani, R. and Friedman, J. (2009)., The Elements of Statistical Learning, Second Edition. New York: Springer.
  • [38] Hurvich, C. M., Simonoff, J. S. and Tsai, C. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion., Journal of the Royal Statistical Society, Series B, 60, 271–293.
  • [39] Johnstone, I.M. and Silverman, B.W. (2005). Empirical Bayes selection of wavelet thresholds., The Annals of Statistics, 33, 1700–1752.
  • [40] Kerkyacharian, G. and Picard, D. (1992). Density estimation in Besov spaces., Statistics and Probability Letters, 13, 15–24.
  • [41] Ligges, U., Thomas, A., Spiegelhalter, D., Best, N. Lunn, D., Rice, K. and Sturtz, S. (2009). BRugs 0.5: OpenBUGS and its R/S-PLUS interface BRugs., http://www.stats.ox.ac.uk/pub/RWin/src/contrib/
  • [42] Marley, J.K. and Wand, M.P. (2010). Non-standard semiparametric regression via BRugs., Journal of Statistical Software, Volume 37, Issue 5, 1–30.
  • [43] Marron, J.S., Adak, S., Johnstone, I.M., Neumann, M.H. and Patil, P. (1998). Exact risk analysis of wavelet regression., Journal of Computational and Graphical Statistics, 7, 278–309.
  • [44] Minka, T., Winn, J., Guiver, G. and Knowles, D. (2009). Infer.Net 2.4. Microsoft Research Cambridge, Cambridge, UK., http://research/microsoft.com/infernet
  • [45] Morris, J.S. and Carroll, R.J. (2006). Wavelet-based functional mixed models., Journal of the Royal Statistical Society, Series B, 68, 179–199.
  • [46] Morris, J.S., Vannucci, M., Brown, P.J. and Carroll, R.J. (2003). Wavelet-based nonparametric modeling of hierarchical functions in colon carcinogenesis., Journal of the American Statistical Association, 98, 573–597.
  • [47] Nason, G.P. (2008)., Wavelet Methods in Statistics with R. New York: Springer.
  • [48] Nason, G.P. (2010). wavethresh 4.5. Wavelets statistics and transforms. R package., http://cran.r-project.org
  • [49] Ormerod, J.T. and Wand, M.P. (2010). Explaining variational approximations., The American Statistician, 64, 140–153.
  • [50] Osborne, M.R., Presnell, B. and Turlach, B.A. (2000). On the LASSO and its dual., Journal of Computational and Graphical Statistics, 9, 319–337.
  • [51] O’Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems (with discussion)., Statistical Science, 1, 505–527.
  • [52] Pearl, J. (1988)., Probabilistic Reasoning in Intelligent Systems. San Mateo, California: Morgan Kaufmann.
  • [53] Pericchi, L.R. and Smith, A.F.M. (1992). Exact and approximate posterior moments for a normal location parameter., Journal of the Royal Statistical Society, Series B, 54, 793–804.
  • [54] R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org
  • [55] Robert, C.P. and Casella, G. (2004)., Monte Carlo Statistical Methods, 2nd ed. New York: Springer.
  • [56] Ruppert, D., Wand, M.P. and Carroll, R.J. (2003)., Semiparametric Regression. New York: Cambridge University Press.
  • [57] Ruppert, D., Wand, M.P. and Carroll, R.J. (2009). Semiparametric regression during 2003-2007., Electronic Journal of Statistics, 3, 1193–1256.
  • [58] Spiegelhalter, D.J., Thomas, A., Best, N.G., Gilks, W.R. and Lunn, D. (2003). BUGS: Bayesian inference using Gibbs sampling. Medical Research Council Biostatistics Unit, Cambridge, UK., http://www.mrc-bsu.cam.ac.uk/bugs.
  • [59] Staudenmayer, J., Lake, E.E. and Wand, M.P. (2009). Robustness for general design mixed models using the, t-distribution. Statistical Modelling, 9, 235–255.
  • [60] Stein, C. (1981). Estimation of the mean of a multivariate normal distribution., The Annals of Statistics, 9, 1135–1151.
  • [61] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society, Series B, 58, 267–288.
  • [62] Vidakovic, B. (1999)., Statistical Modeling by Wavelets. New York: Wiley.
  • [63] Wahba, G. (1990)., Spline Models for Observational Data, Philadelphia, Pennsylvania: Society for Industrial and Applied Mathematics.
  • [64] Wainwright, M.J. and Jordan, M.I. (2008). Graphical models, exponential families, and variational inference., Foundation and Trends in Machine Learning, 1, 1–305.
  • [65] Wand, M.P. (2009). Semiparametric regression and graphical models., Australian and New Zealand Journal of Statistics, 51, 9–41.
  • [66] Wand, M.P. and Ormerod, J.T. (2008). On semiparametric regression with O’Sullivan penalized splines., Australian and New Zealand Journal of Statistics, 50, 179–198.
  • [67] Wand, M.P., Ormerod, J.T., Padoan, S.A. and Frühwirth, R. (2011). Mean field variational Bayes for elaborate distributions., Bayesian Analysis, to appear.
  • [68] Wang, Y. (1998). Mixed effects smoothing spline analysis of variance., Journal of the Royal Statistical Society, Series B, 60, 159–174.
  • [69] Wang, S.S.J. and Wand, M.P. (2011). Using Infer.NET for statistical analyses., The American Statistician, 65, 115–126.
  • [70] Welham, S.J., Cullis, B.R., Kenward, M.G. and Thompson, R. (2007). A comparison of mixed model splines for curve fitting., Australian and New Zealand Journal of Statistics, 49, 1–23.
  • [71] Wood, S.N. (2003). Thin-plate regression splines., Journal of the Royal Statistical Society, Series B, 65, 95–114.
  • [72] Wood, S.N. (2006)., Generalized Additive Models: An Introduction with R. Boca Raton, Florida: Chapman & Hall/CRC.
  • [73] Wood, S.N. (2011). mgcv 1.7. GAMs with GCV/AIC/REML smoothness estimation and GAMMs by PQL. R package., http://cran.r-project.org
  • [74] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty., The Annals of Statistics, 38, 894–942.
  • [75] Zhang, D., Lin, X., Raz, J. and Sowers, M. (1998). Semi-parametric stochastic mixed models for longitudinal data., Journal of the American Statistical Association, 93, 710–719
  • [76] Zhao, W. and Wu, R. (2008). Wavelet-based nonparametric functional mapping of longitudinal curves., Journal of the American Statistical Association, 103, 714–725.
  • [77] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society, Series B, 67, 301–320.
  • [78] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso., The Annals of Statistics, 5, 2173–2192.

Supplemental materials