The Annals of Applied Statistics

Regularized brain reading with shrinkage and smoothing

Leila Wehbe, Aaditya Ramdas, Rebecca C. Steorts, and Cosma Rohilla Shalizi

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high dimensional. Hence, direct estimates are inherently imprecise and call for regularization. We compare a suite of approaches which regularize via shrinkage: ridge regression, the elastic net (a generalization of ridge regression and the lasso), and a hierarchical Bayesian model based on small area estimation (SAE). We contrast regularization with spatial smoothing and combinations of smoothing and shrinkage. All methods are tested on functional magnetic resonance imaging (fMRI) data from multiple subjects participating in two different experiments related to reading, for both predicting neural response to stimuli and decoding stimuli from responses. Interestingly, when the regularization parameters are chosen by cross-validation independently for every voxel, low/high regularization is chosen in voxels where the classification accuracy is high/low, indicating that the regularization intensity is a good tool for identification of relevant voxels for the cognitive task. Surprisingly, all the regularization methods work about equally well, suggesting that beating basic smoothing and shrinkage will take not only clever methods, but also careful modeling.

Article information

Ann. Appl. Stat. Volume 9, Number 4 (2015), 1997-2022.

Received: January 2014
Revised: December 2014
First available in Project Euclid: 28 January 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

fMRI small area estimation regularization shrinkage spatial smoothing


Wehbe, Leila; Ramdas, Aaditya; Steorts, Rebecca C.; Shalizi, Cosma Rohilla. Regularized brain reading with shrinkage and smoothing. Ann. Appl. Stat. 9 (2015), no. 4, 1997--2022. doi:10.1214/15-AOAS837.

Export citation


  • Abbott, L. F. and Sejnowski, T. J. eds. (1998). Neural Codes and Distributed Representations: Foundations of Neural Computation. MIT Press, Cambridge, MA.
  • Ashburner, J., Barnes, G., Chen, C.-C., Daunizeau, J., Flandin, G., Friston, K., Kiebel, S., Kilner, J., Litvak, V., Moran, R., Penny, W., Rosa, M., Stephan, K., Gitelman, D., Henson, R., Hutton, C., Glauche, V., Mattout, J. and Phillips, C. (2008). SPM8 Manual. Functional Imaging Laboratory, Wellcome Trust Centre for Neuroimaging, Institute of Neurology, UCL.
  • Ashby, F. G. (2011). Statistical Analysis of FMRI Data. MIT Press, Cambridge, MA.
  • Ballard, D. H., Zhang, Z. and Rao, R. P. N. (2002). Distributed synchrony: A probabilistic model of neural signalling. In Probabilistic Models of the Brain: Perception and Neural Function (R. P. N. Rao, B. A. Olshausen and M. S. Lewicki, eds.). Neural Information Processing Series 273–284. MIT Press, Cambridge, MA.
  • Broderick, T., Boyd, N., Wibisono, A., Wilson, A. C. and Jordan, M. I. (2013). Streaming variational Bayes. In Advances in Neural Information Processing Systems 26 [NIPS 2013] (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 1727–1735.
  • Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cambridge Univ. Press, Cambridge, MA.
  • Datta, G. S., Ghosh, M., Steorts, R. and Maples, J. (2011). Bayesian benchmarking with applications to small area estimation. TEST 20 574–588.
  • Engel, A. K., Fries, P. and Singer, W. (2001). Dynamic predictions: Oscillations and synchrony in top-down processing. Nat. Rev., Neurosci. 2 704–716.
  • Friedman, J., Hastie, T., Tibshirani, R. and Jiang, H. (2010). Glmnet for Matlab. Statistics Department, Stanford Univ., Stanford.
  • Fries, P. (2009). Neuronal gamma-band synchronization as a fundamental process in cortical computation. Annu. Rev. Neurosci. 32 209–224.
  • Friston, K. J., Rothshtein, P., Geng, J. J., Sterzer, P. and Henson, R. N. (2010). A critique of functional localizers. In Foundational Issues in Human Brain Mapping (S. J. Hanson and M. Bunzl, eds.) 3–24. MIT Press, Cambridge, MA.
  • Genovese, C. R. (2000). A Bayesian time-course model for functional magnetic resonance imaging data. J. Amer. Statist. Assoc. 95 691–703.
  • Golub, G. H., Heath, M. and Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21 215–223.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
  • Haufe, S., Meinecke, F., Görgen, K., Dähne, S., Haynes, J.-D., Blankertz, B. and Bießmann, F. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage 87 96–110.
  • Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55–67.
  • Jaakkola, T. and Haussler, D. (1999). Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11 [NIPS 1998] (M. J. Kearns, S. A. Solla and D. A. Cohn, eds.) 487–493. MIT Press, Cambridge, MA.
  • Kyung, M., Gill, J., Ghosh, M. and Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal. 5 369–411.
  • Lee, K.-J., Jones, G. L., Caffo, B. S. and Bassett, S. S. (2011). Spatial Bayesian Variable Selection Models on Functional Magnetic Resonance Imaging Time-Series Data. Preprint.
  • Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Nature 453 869–878.
  • Louis, T. A. (1984). Estimating a population of parameter values using Bayes and empirical Bayes methods. J. Amer. Statist. Assoc. 79 393–398.
  • Mitchell, T. M., Shinkareva, S. V., Carlson, A., Chang, K.-M., Malave, V. L., Mason, R. A. and Just, M. A. (2008). Predicting human brain activity associated with the meanings of nouns. Science 320 1191–1195.
  • Naselaris, T., Kay, K. N., Nishimoto, S. and Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage 56 400–410.
  • Neiswanger, W., Wang, C. and Xing, E. (2013). Asymptotically exact, Embarrassingly Parallel MCMC. Preprint. Available at arXiv:1311.4780.
  • Norman, K. A., Polyn, S. M., Detre, G. J. and Haxby, J. V. (2006). Beyond mind-reading: Multi-voxel pattern analysis of fMRI data. Trends Cogn. Sci. 10 424–430.
  • Palatucci, M., Pomerleau, D., Hinton, G. E. and Mitchell, T. M. (2009). Zero-shot learning with semantic output codes. In Advances in Neural Information Processing Systems 22 [NIPS 2009] (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams and A. Culotta, eds.) 1410–1418. MIT Press, Cambridge, MA.
  • Park, M., Koyejo, O., Ghosh, J., Poldrack, R. A. and Pillow, J. W. (2013). Bayesian structure learning for functional neuroimaging. In 16th International Conference on Artificial Intelligence and Statistics (C. M. Carlvaho and P. Ravikumar, eds.) 489–497.
  • Pereira, F., Mitchell, T. and Botvinick, M. (2009). Machine learning classifiers and fMRI: A tutorial overview. NeuroImage 45 S199–S209.
  • Pfeffermann, D. (2013). New important developments in small area estimation. Statist. Sci. 28 40–68.
  • Poldrack, R. A. (2008). The role of fMRI in cognitive neuroscience: Where do we stand? Curr. Opin. Neurobiol. 18 223–227.
  • Rao, J. N. K. (2003). Small Area Estimation. Wiley, Hoboken, NJ.
  • Rieke, F., Warland, D., de Ruyter van Steveninck, R. and Bialek, W. (1999). Spikes: Exploring the Neural Code. MIT Press, Cambridge, MA.
  • Rowling, J. K. (2012). Harry Potter and the Sorcerer’s Stone. Pottermore Limited, London.
  • Scott, S. L., Blocker, A. W. and Bonassi, F. V. (2013). Bayes and big data: The consensus Monte Carlo algorithm. Presented at the “EFaB@Bayes 250” conference, 16 December 2013, Duke Univ.
  • Shepherd, G. M. (1994). Neurobiology, 3rd ed. Oxford Univ. Press, London.
  • Smith, S. M. (2004). Overview of fMRI analysis. Br. J. Radiol. 77 S167–S175.
  • Sudre, G., Pomerleau, D., Palatucci, M., Wehbe, L., Fyshe, A., Salmelin, R. and Mitchell, T. (2012). Tracking neural coding of perceptual and semantic features of concrete nouns. NeuroImage 62 451–463.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., Mazoyer, B. and Joliot, M. (2002). Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage 15 273–289.
  • von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395–416.
  • Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Faund. Trends Mach. Learn. 1 1–305.
  • Wehbe, L., Murphy, B., Talukdar, P., Fyshe, A., Ramdas, A. and Mitchell, T. (2014). Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses. PLoS ONE 9 e112575.
  • Wehbe, L., Ramdas, A., Steorts, R. C. and Shalizi, C. R. (2015). Supplement to “Regularized brain reading with shrinkage and smoothing.” DOI:10.1214/15-AOAS837SUPP.
  • Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C. and Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature Methods 8 665–670.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.

Supplemental materials

  • Supplementary Article: Appendix for “Regularized brain reading with shrinkage and smoothing”. This supplement consists of six parts. It offers more details about: (A) our Small Area model and Gibbs sampler, (B) the Marginal Prior of the SAE Model, (C) model checking, (D) the effect of regularization on variability, and (E) the effect of smoothing and regularization on single voxel accuracy, as well as (F) whole brain plots of the experimental results that are portrayed in Figures 5 and 6 for a single slice.