Abstract
Motivated by the increasing use of and rapid changes in array technologies, we consider the prediction problem of fitting a linear regression relating a continuous outcome $Y$ to a large number of covariates $\mathbf{X}$, for example, measurements from current, state-of-the-art technology. For most of the samples, only the outcome $Y$ and surrogate covariates, $\mathbf{W}$, are available. These surrogates may be data from prior studies using older technologies. Owing to the dimension of the problem and the large fraction of missing information, a critical issue is appropriate shrinkage of model parameters for an optimal bias-variance trade-off. We discuss a variety of fully Bayesian and Empirical Bayes algorithms which account for uncertainty in the missing data and adaptively shrink parameter estimates for superior prediction. These methods are evaluated via a comprehensive simulation study. In addition, we apply our methods to a lung cancer data set, predicting survival time ($Y$) using qRT-PCR ($\mathbf{X}$) and microarray ($\mathbf{W}$) measurements.
Citation
Philip S. Boonstra. Bhramar Mukherjee. Jeremy M. G. Taylor. "Bayesian shrinkage methods for partially observed data with many predictors." Ann. Appl. Stat. 7 (4) 2272 - 2292, December 2013. https://doi.org/10.1214/13-AOAS668
Information