Abstract
The “sample amplification” problem formalizes the following question: Given n i.i.d. samples drawn from an unknown distribution P, when is it possible to produce a larger set of samples which cannot be distinguished from i.i.d. samples drawn from P? In this work, we provide a firm statistical foundation for this problem by deriving generally applicable amplification procedures, lower bound techniques and connections to existing statistical notions. Our techniques apply to a large class of distributions including the exponential family, and establish a rigorous connection between sample amplification and distribution learning.
Funding Statement
Shivam Garg conducted this research while affiliated with Stanford University and was supported by a Stanford Interdisciplinary Graduate Fellowship.
Yanjun Han was supported by a Simons-Berkeley research fellowship and the Norbert Wiener postdoctoral fellowship in statistics at MIT IDSS.
Vatsal Sharan was supported by NSF CAREER Award CCF-2239265 and an Amazon Research Award.
Gregory Valiant was supported by NSF Awards AF-2341890, CCF-1704417, CCF-1813049, UT Austin’s Foundation of ML NSF AI Institute, and a Simons Foundation Investigator Award.
Acknowledgments
Thank you to anonymous reviewers for helpful feedback on earlier drafts of this paper.
Citation
Brian Axelrod. Shivam Garg. Yanjun Han. Vatsal Sharan. Gregory Valiant. "On the statistical complexity of sample amplification." Ann. Statist. 52 (6) 2767 - 2790, December 2024. https://doi.org/10.1214/24-AOS2444
Information