Electronic Journal of Statistics

Data enriched linear regression

Aiyou Chen, Art B. Owen, and Minghui Shi

Full-text: Open access


We present a linear regression method for predictions on a small data set making use of a second possibly biased data set that may be much larger. Our method fits linear regressions to the two data sets while penalizing the difference between predictions made by those two models. The resulting algorithm is a shrinkage method similar to those used in small area estimation. We find a Stein-type result for Gaussian responses: when the model has $5$ or more coefficients and $10$ or more error degrees of freedom, it becomes inadmissible to use only the small data set, no matter how large the bias is. We also present both plug-in and AICc-based methods to tune our penalty parameter. Most of our results use an $L_{2}$ penalty, but we obtain formulas for $L_{1}$ penalized estimates when the model is specialized to the location setting. Ordinary Stein shrinkage provides an inadmissibility result for only $3$ or more coefficients, but we find that our shrinkage method typically produces much lower squared errors in as few as $5$ or $10$ dimensions when the bias is small and essentially equivalent squared errors when the bias is large.

Article information

Electron. J. Statist., Volume 9, Number 1 (2015), 1078-1112.

Received: November 2014
First available in Project Euclid: 27 May 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J07: Ridge regression; shrinkage estimators 62D05: Sampling theory, sample surveys
Secondary: 62F12: Asymptotic properties of estimators

Data fusion small area estimation Stein shrinkage transfer learning


Chen, Aiyou; Owen, Art B.; Shi, Minghui. Data enriched linear regression. Electron. J. Statist. 9 (2015), no. 1, 1078--1112. doi:10.1214/15-EJS1027. https://projecteuclid.org/euclid.ejs/1432732305

Export citation


  • Boonstra, P. S., Mukherjee, B., and Taylor, J. M. G. (2013a). Bayesian shrinkage methods for partially observed data with many predictors., Annals of Applied Statistics, 7(4):2272–2292.
  • Boonstra, P. S., Taylor, J. M. G., and Mukherjee, B. (2013b). Incorporating auxiliary information for improved prediction in high-dimensional datasets: an ensemble of shrinkage approaches., Biostatistics, 14(2):259–272.
  • Borenstein, M., Hedges, L. V., Higgins, J. P. T., and Rothstein, H. R. (2009)., Introduction to Meta-Analysis. Wiley, Chichester, UK.
  • Brent, R. P. (1973)., Algorithms for Minimization Without Derivatives. Prentice-Hall, Englewood Cliffs, NJ.
  • Brookes, M. (2011). The matrix reference manual. http://www.ee.imperial.ac., uk/hp/staff/dmb/matrix/intro.html.
  • Chen, A., Koehler, J. R., Owen, A. B., Remy, N., and Shi, M. (2014). Data enrichment for incremental reach estimation. Technical report, Google, Inc.
  • Chen, Y.-H., Chatterjee, N., and Carroll, R. J. (2009). Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies., Journal of the American Statistical Association, 104(485):220–233.
  • Chen, Y.-H. and Chen, H. (2000). A unified approach to regression analysis under double-sampling designs., Journal of the Royal Statistical Society: Series B, 62(3):449–460.
  • Copas, J. B. (1983). Regression, prediction and shrinkage., Journal of the Royal Statistical Society, Series B, 45(3):311–354.
  • Cortes, C. and Mohri, M. (2011). Domain adaptation in regression. In, Proceedings of The 22nd International Conference on Algorithmic Learning Theory (ALT 2011), pages 308–323, Heidelberg, Germany. Springer.
  • Daumé, H. (2009). Frustratingly easy domain adaptation., (arXiv:0907.1815).
  • D’Orazio, M., Di Zio, M., and Scanu, M. (2006)., Statistical Matching: Theory and Practice. Wiley, Chichester, UK.
  • Efron, B. (2004). The estimation of prediction error., Journal of the American Statistical Association, 99(467):619–632.
  • Efron, B. and Morris, C. (1973a). Combining possibly related estimation problems., Journal of the Royal Statistical Society, Series B, 35(3):379–421.
  • Efron, B. and Morris, C. (1973b). Stein’s estimation rule and its competitors—an empirical Bayes approach., Journal of the American Statistical Association, 68(341):117–130.
  • Feudale, R. N., Woody, N. A., Tan, H., Myles, A. J., Brown, S. D., and Ferré, J. (2002). Transfer of multivariate calibration models: a review., Chemometrics and Intelligent Laboratory Systems, 64:181–192.
  • George, E. I. (1986). Combining minimax shrinkage estimators., Journal of the American Statistical Association, 81(394):437–445.
  • Ghosh, M. and Rao, J. N. K. (1994). Small area estimation: an appraisal., Statistical Science, 9(1):55–76.
  • Hurvich, C. and Tsai, C. (1989). Regression and time series model selection in small samples., Biometrika, 76(2):297–307.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis., Annals of statistics, 29(2):295–327.
  • Little, R. J. A. and Rubin, D. B. (2009)., Statistical Analysis with Missing Data. John Wiley & Sons Inc., Hoboken, NJ, 2nd edition.
  • Mukherjee, B. and Chatterjee, N. (2008). Exploiting gene-environment independence for analysis of case–control studies: an empirical bayes-type shrinkage estimator to trade-off between bias and efficiency., Biometrics, 64(3):685–694.
  • Obenchain, R. L. (1977). Classical F-tests and confidence regions for ridge regression., Technometrics, 19(4):429–439.
  • Patel, J. K. and Read, C. B. (1996)., Handbook of the Normal Distribution, volume 150. CRC Press.
  • Rao, J. N. K. (2003)., Small Area Estimation. Wiley, Hoboken, NJ.
  • Stein, C. M. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 197–206.
  • Stein, C. M. (1960). Multiple regression. In Olkin, I., Ghurye, S. G., Hoeffding, W., Madow, W. G., and Mann, H. B., editors, Contributions to Probability and Statistics: Essays in Honor of Harald Hotelling. Stanford University Press, Stanford, CA.
  • Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution., The Annals of Statistics, 9(6):1135–1151.
  • Widmer, G. and Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts., Machine Learning, 23:69–101.
  • Woody, N. A., Feudale, R. N., Myles, A. J., and Brown, S. D. (2004). Transfer of multivariate calibrations between four near-infrared spectrometers using orthogonal signal correction., Analytical Chemistry, 76(9):2596–2600.
  • Ye, J. (1998). On measuring and correcting the effects of data mining and model selection., Journal of the American Statistical Association, 93:120–131.