The Annals of Applied Statistics

Bayesian shrinkage methods for partially observed data with many predictors

Philip S. Boonstra, Bhramar Mukherjee, and Jeremy M. G. Taylor

Full-text: Open access


Motivated by the increasing use of and rapid changes in array technologies, we consider the prediction problem of fitting a linear regression relating a continuous outcome $Y$ to a large number of covariates $\mathbf{X}$, for example, measurements from current, state-of-the-art technology. For most of the samples, only the outcome $Y$ and surrogate covariates, $\mathbf{W}$, are available. These surrogates may be data from prior studies using older technologies. Owing to the dimension of the problem and the large fraction of missing information, a critical issue is appropriate shrinkage of model parameters for an optimal bias-variance trade-off. We discuss a variety of fully Bayesian and Empirical Bayes algorithms which account for uncertainty in the missing data and adaptively shrink parameter estimates for superior prediction. These methods are evaluated via a comprehensive simulation study. In addition, we apply our methods to a lung cancer data set, predicting survival time ($Y$) using qRT-PCR ($\mathbf{X}$) and microarray ($\mathbf{W}$) measurements.

Article information

Ann. Appl. Stat., Volume 7, Number 4 (2013), 2272-2292.

First available in Project Euclid: 23 December 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

High-dimensional data Markov chain Monte Carlo missing data measurement error shrinkage


Boonstra, Philip S.; Mukherjee, Bhramar; Taylor, Jeremy M. G. Bayesian shrinkage methods for partially observed data with many predictors. Ann. Appl. Stat. 7 (2013), no. 4, 2272--2292. doi:10.1214/13-AOAS668.

Export citation


  • Boonstra, P. S., Mukherjee, B. and Taylor, J. M. G. (2013). Supplement to “Bayesian shrinkage methods for partially observed data with many predictors.” DOI:10.1214/13-AOAS668SUPP.
  • Boonstra, P. S., Taylor, J. M. G. and Mukherjee, B. (2013). Incorporating auxiliary information for improved prediction in high-dimensional datasets: An ensemble of shrinkage approaches. Biostatistics 14 259–272.
  • Casella, G. (2001). Empirical Bayes Gibbs sampling. Biostatistics 2 485–500.
  • Chen, G., Kim, S., Taylor, J. M. G., Wang, Z., Lee, O., Ramnath, N., Reddy, R. M., Lin, J., Chang, A. C., Orringer, M. B. and Beer, D. G. (2011). Development and validation of a qRT-PCR-classifier for lung cancer prognosis. Journal of Thoracic Oncology 6 1481–1487.
  • Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 31 377–403.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39 1–38.
  • Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109–135.
  • Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409.
  • Gelman, A. and Hill, J. (2006). Data Analysis Using Regression and Multilevel Hierarchical Models. Cambridge Univ. Press, New York.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6 721–741.
  • Gentleman, R. (2012). Annotate: Annotation for microarrays. R package version 1.36.0.
  • Graf, E., Schmoor, C., Sauerbrei, W. and Schumacher, M. (1999). Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18 2529–2545.
  • Green, P. J. (1990). On use of the EM algorithm for penalized likelihood estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 52 443–452.
  • Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55–67.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, Hoboken, NJ.
  • Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681–686.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528–550.
  • Wei, G. C. G. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Amer. Statist. Assoc. 85 699–704.
  • Witten, D. M. and Tibshirani, R. (2009). Covariance-regularized regression and classification for high dimensional problems. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 615–636.
  • Yi, N. and Xu, S. (2008). Bayesian Lasso for quantitative trait loci mapping. Genetics 179 1045–1055.

Supplemental materials

  • Supplementary material: Supplemental article. Here we give the full derivation of the Gibbs steps, computational details and the results from the simulation study. The data from Section 5 and the code for its analysis are available at