August 2024 Efficient and multiply robust risk estimation under general forms of dataset shift
Hongxiang Qiu, Eric Tchetgen Tchetgen, Edgar Dobriban
Ann. Statist. 52(4): 1796-1824 (August 2024). DOI: 10.1214/24-AOS2422


Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such dataset shift conditions are known as domain adaptation or transfer learning. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population.

In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions—covariate, label and concept shift—as special cases. We allow for partially nonoverlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions.

Funding Statement

This work was supported in part by NSF Grant DMS 2046874, NIH Grants R01AI27271, R01CA222147, R01AG065276, R01GM139926, and Analytics at Wharton.
The Africa Health Research Institute’s Demographic Surveillance Information System and Population Intervention Programme is funded by the Wellcome Trust (201433/Z/16/Z) and the South Africa Population Research Infrastructure Network (funded by the South African Department of Science and Technology and hosted by the South African Medical Research Council).


Hongxiang Qiu. Eric Tchetgen Tchetgen. Edgar Dobriban. "Efficient and multiply robust risk estimation under general forms of dataset shift." Ann. Statist. 52 (4) 1796 - 1824, August 2024.


Received: 1 March 2024; Revised: 1 June 2024; Published: August 2024
Keywords: Dataset shift , domain adaptation , efficiency , multiple robustness , Semiparametric model , transfer learning

