December 2022 Hierarchical resampling for bagging in multistudy prediction with applications to human neurochemical sensing
Gabriel Loewinger, Prasad Patil, Kenneth T. Kishida, Giovanni Parmigiani
Author Affiliations +
Ann. Appl. Stat. 16(4): 2145-2165 (December 2022). DOI: 10.1214/21-AOAS1574

Abstract

We propose the “study strap ensemble,” which combines advantages of two common approaches to fitting prediction models when multiple training datasets (“studies”) are available: pooling studies and fitting one model vs. averaging predictions from multiple models each fit to individual studies. The study strap ensemble fits models to bootstrapped datasets or “pseudo-studies.” These are generated by resampling from multiple studies with a hierarchical resampling scheme that generalizes the randomized cluster bootstrap. The study strap is controlled by a tuning parameter that determines the proportion of observations to draw from each study. When the parameter is set to its lowest value, each pseudo-study is resampled from only a single study. When it is high, the study strap ignores the multistudy structure and generates pseudo-studies by merging the datasets and drawing observations like a standard bootstrap. We empirically show the optimal tuning value often lies in between and prove that special cases of the study strap draw the merged dataset and the set of original studies as pseudo-studies. We extend the study strap approach with an ensemble weighting scheme that utilizes information in the distribution of the covariates of the test dataset.

Our work is motivated by neuroscience experiments using real-time neurochemical sensing during awake behavior in humans. Current techniques to perform this kind of research require measurements from an electrode placed in the brain during awake neurosurgery and rely on prediction models to estimate neurotransmitter concentrations from the electrical measurements recorded by the electrode. These models are trained by combining multiple datasets that are collected in vitro under heterogeneous conditions in order to promote accuracy of the models when applied to data collected in the brain. A prevailing challenge is deciding how to combine studies or ensemble models trained on different studies to enhance model generalizability.

Our methods produce marked improvements in simulations and in this application. All methods are available in the studyStrap CRAN package.

Funding Statement

GCL was supported by the NIH, F31DA052153; T32 AI 007358. GP and PP received support from NSF Grant DMS-1810829. KK received support from the NIH, R01 DA048096; R01 MH121099; R01 NS092701; 5KL2TR00142; WFSOM, Phys/Pharm & Neurosurgery.

Acknowledgments

The authors would like to thank the reviewers, the Associate Editor, and the Editor for their feedback that substantially improved the quality of this paper.

Citation

Download Citation

Gabriel Loewinger. Prasad Patil. Kenneth T. Kishida. Giovanni Parmigiani. "Hierarchical resampling for bagging in multistudy prediction with applications to human neurochemical sensing." Ann. Appl. Stat. 16 (4) 2145 - 2165, December 2022. https://doi.org/10.1214/21-AOAS1574

Information

Received: 1 January 2021; Revised: 1 November 2021; Published: December 2022
First available in Project Euclid: 26 September 2022

MathSciNet: MR4489203
zbMATH: 1498.62240
Digital Object Identifier: 10.1214/21-AOAS1574

Keywords: domain adaptation , domain generalization , neuroscience , transfer learning

Rights: Copyright © 2022 Institute of Mathematical Statistics

JOURNAL ARTICLE
21 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.16 • No. 4 • December 2022
Back to Top