Bayesian Analysis

Shrinkage regression for multivariate inference with missing data, and an application to portfolio balancing

Robert B. Gramacy and Ester Pantaleo

Full-text: Open access


Portfolio balancing requires estimates of covariance between asset returns. Returns data have histories which greatly vary in length, since assets begin public trading at different times. This can lead to a huge amount of missing data---too much for the conventional imputation-based approach. Fortunately, a well-known factorization of the MVN likelihood under the prevailing historical missingness pattern leads to a simple algorithm of OLS regressions that is much more reliable. When there are more assets than returns, however, OLS becomes unstable. Gramacy et. al (2008) showed how classical shrinkage regression may be used instead, thus extending the state of the art to much bigger asset collections, with further accuracy and interpretation advantages. In this paper, we detail a fully Bayesian hierarchical formulation that extends the framework further by allowing for heavy-tailed errors, relaxing the historical missingness assumption, and accounting for estimation risk. We illustrate how this approach compares favorably to the classical one using synthetic data and an investment exercise with real returns. An accompanying R package is on CRAN.

Article information

Bayesian Anal. Volume 5, Number 2 (2010), 237-262.

First available in Project Euclid: 20 June 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

multivariate monotone missing data data augmentation ridge regression double-exponential heavy tails factor model portfolio balancing}


Gramacy, Robert B.; Pantaleo, Ester. Shrinkage regression for multivariate inference with missing data, and an application to portfolio balancing. Bayesian Anal. 5 (2010), no. 2, 237--262. doi:10.1214/10-BA602.

Export citation


  • Andersen, T. (1957). "Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations Are Missing." J. of the American Statistical Association, 52, 200–203.
  • Brown, S. (1979). "The Effect of Estimation Risk on Capital Market Equilibrium." J. of Financial and Quantitative Analysis, 14, 215–220.
  • Carlin, B. P. and Polson, N. G. (1991). "Inference for nonconjugate Bayesian Models using the Gibbs sampler." The Canadian Journal of Statistics, 19, 4, 399–405.
  • Carlin, B. P., Polson, N. G., and Stoffer, D. S. (1992). "A Monte Carlo Approach to Nonnormal and Nonlinear State–Space Modeling." J. of the American Statistical Association, 87, 418, 493–500.
  • Carvalho, C., Polson, N., and Scott, J. (2008). "The horseshoe estimator for sparse signals." Discussion Paper 2008-31, Duke University Department of Statistical Science.
  • Carvalho, C. M. and Scott, J. G. (2009). "Objective Bayesian model selection in Gaussian graphical models." Biometrika, 96, 3, 497–512.
  • Chan, L. K., Karceski, J., and Lakonishok, J. (1999). "On Portfolio Optimization: Forecasting Covariances and Choosing the Risk Model." The Review of Financial Studies, 12, 5, 937–974.
  • Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). "Least Angle Regression (with discussion)." Annals of Statistics, 32, 2.
  • Fama, E. and French, K. (1993). "Common Risk Factors in the Returns on Stocks and Bonds." J. of Financial Economics, 33, 3–56.
  • George, E. and McCulloch, R. (1993). "Variable selection via Gibbs sampling." J. of the American Statistical Association, 88, 881–889.
  • Geweke, J. (1992). "Priors for microeconomic times series and their application." Tech. Rep. Institute of Empirical Macroeconomics Discussion Paper No.64, Federal Reserve Bank of Minneapolis.
  • –- (1993). "Bayesian Treatment of the Independent Student–$t$ Linear Model." J. of Applied Econometrics, Vol. 8, Supplement: Special Issue on Econometric Inference Using Simulation Techniques, S19–S40.
  • –- (1996). "Variable selection and model comparison in regression." In Bayesian Statistics 5, eds. J. Bernardo, J. Berger, A. Dawid, and A. Smith, 609–620. Oxford Press.
  • Godsill, S. (2001). "On the relationship between Markov chain Monte Carlo methods for model uncertainty." J. of Computational and Graphical Statistics, 10, 2, 239–248.
  • Gramacy, R. B. (2009). The monomvn package: Estimation for multivariate normal and Student-t data with monotone missingness. R package version 1.8.
  • Gramacy, R. B., Lee, J. H., and Silva, R. (2008). "On estimating covariances between many assets with histories of highly variable length." Tech. Rep. 0710.5837, arXiv. Url:
  • Green, P. (1995). "Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination." Biometrika, 82, 711–732.
  • Griffin, J. E. and Brown, P. J. (2010). "Inference with Normal–Gamma prior distributions in regression problems." Bayesian Analysis, 5, 1, 171–188.
  • Hans, C. (2008). "Bayesian lasso regression." Tech. Rep. 810, Department of Statistics, The Ohio State University, Columbus, OH 43210.
  • Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
  • Hoerl, A. and Kennard, R. (1970). "Ridge Regression: Biased estimation for non-orthogonal problems." Technometrics, 12, 55–67.
  • Jacquier, E., Polson, N., and Rossi, P. E. (2004). "Bayesian analysis of stochastic volatility models with fat-tails and correlated errors." J. of Econometrics, 122, 185–212.
  • Jagannathan, R. and Ma, T. (2003). "Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps." J. of Finance, 58, 4, 1641–1684.
  • Klein, R. and Bawa, V. (1976). "The Effect of Estimation Risk on Optimal Portfolio Choice." J. of Financial Econometrics, 3, 215–231.
  • Ledoit, O. and Wolf, M. (2002). "Improved estimation of the covariance matrix of stock returns with an application to portfolio selection." J. of Emperical Finance, 10, 603–621.
  • Levina, E., Rothman, A., and Zhu, J. (2008). "Sparse Estimation of Large Covariance Matrices via a Nested Lasso Penalty." Annals of Applied Statistics, 2, 1, 245–263.
  • Li, K. (1988). "Imputation using Markov chains." J. of Statistical Computation, 30, 57–79.
  • Little, R. J. and Rubin, D. B. (2002). Statistical Analysis with Missing Data. 2nd ed. Wiley.
  • Liu, C. (1993). "Bartlett's decomposition of the posterior distribution of the covariance for normal monotone ignorable missing data." J. of Multivariate Analysis, 46, 198–206.
  • –- (1995). "Monotone Data Augmentation Using the Multivariate $t$ Distribution." J. of Multivariate Analysis, 53, 139–158.
  • –- (1996). "Bayesian Robust Multivariate Linear Regression With Incomplete Data." J. of the American Statistical Association, 91, 435, 1219–1227.
  • Markowitz, H. (1959). Portfolio Selection: Efficient Diversification of Investments. New York: John Wiley.
  • Park, T. and Casella, G. (2008). "The Bayesian Lasso." J. of the American Statistical Association, 103, 482, 681–686.
  • Polson, N. and Tew, B. (2000). "Bayesian Portfolio Selection: An Empirical Analysis of the S&P 500 Index 1970–1996." J. of Business & Economic Statistics, 18, 2, 164–173.
  • Schafer, J. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall/CRC.
  • \sf R Development Core Team (2007). \sf R: A Language and Environment for Statistical Computing. \sf R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
  • Stambaugh, R. F. (1997). "Analyzing Investments Whose Histories Differ in Lengh." J. of Financial Economics, 45, 285–331.
  • Tibshirani, R. (1996). "Regression Shrinkage and Selection via the Lasso." J. of the Royal Statistical Society, Series B, 58, 267–288.
  • Troughton, P. T. and Godsill, S. J. (1997). "A reversible jump sampler for autoregressive time series, employing full conditionals to achieve efficient model space moves." Tech. Rep. CUED/F-INFENG/TR.304, Cambridge University Engineering Department.
  • West, M. (2003). "Bayesian factor regression models in the “large $p$, small $n$” paradigm." Bayesian Statistics 7, 723–732.
  • Zellner, A. and Chetty, V. (1965). "Prediction and Decision Problems in Regression Models From the Bayesian Point of View." J. of the American Statistical Association, 605–616.