Open Access
August 2017 Principles of Experimental Design for Big Data Analysis
Christopher C. Drovandi, Christopher C. Holmes, James M. McGree, Kerrie Mengersen, Sylvia Richardson, Elizabeth G. Ryan
Statist. Sci. 32(3): 385-404 (August 2017). DOI: 10.1214/16-STS604


Big Datasets are endemic, but are often notoriously difficult to analyse because of their size, heterogeneity and quality. The purpose of this paper is to open a discourse on the potential for modern decision theoretic optimal experimental design methods, which by their very nature have traditionally been applied prospectively, to improve the analysis of Big Data through retrospective designed sampling in order to answer particular questions of interest. By appealing to a range of examples, it is suggested that this perspective on Big Data modelling and analysis has the potential for wide generality and advantageous inferential and computational properties. We highlight current hurdles and open research questions surrounding efficient computational optimisation in using retrospective designs, and in part this paper is a call to the optimisation and experimental design communities to work together in the field of Big Data analysis.


Download Citation

Christopher C. Drovandi. Christopher C. Holmes. James M. McGree. Kerrie Mengersen. Sylvia Richardson. Elizabeth G. Ryan. "Principles of Experimental Design for Big Data Analysis." Statist. Sci. 32 (3) 385 - 404, August 2017.


Published: August 2017
First available in Project Euclid: 1 September 2017

zbMATH: 06870252
MathSciNet: MR3696002
Digital Object Identifier: 10.1214/16-STS604

Keywords: Active learning , big data , Dimension reduction , Experimental design , sub-sampling

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.32 • No. 3 • August 2017
Back to Top