The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 5, Number 3 (2011), 1752-1779.
Measuring reproducibility of high-throughput experiments
Reproducibility is essential to reliable scientific discovery in high-throughput experiments. In this work we propose a unified approach to measure the reproducibility of findings identified from replicate experiments and identify putative discoveries using reproducibility. Unlike the usual scalar measures of reproducibility, our approach creates a curve, which quantitatively assesses when the findings are no longer consistent across replicates. Our curve is fitted by a copula mixture model, from which we derive a quantitative reproducibility score, which we call the “irreproducible discovery rate” (IDR) analogous to the FDR. This score can be computed at each set of paired replicate ranks and permits the principled setting of thresholds both for assessing reproducibility and combining replicates.
Since our approach permits an arbitrary scale for each replicate, it provides useful descriptive measures in a wide variety of situations to be explored. We study the performance of the algorithm using simulations and give a heuristic analysis of its theoretical properties. We demonstrate the effectiveness of our method in a ChIP-seq experiment.
Ann. Appl. Stat. Volume 5, Number 3 (2011), 1752-1779.
First available in Project Euclid: 13 October 2011
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Li, Qunhua; Brown, James B.; Huang, Haiyan; Bickel, Peter J. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5 (2011), no. 3, 1752--1779. doi:10.1214/11-AOAS466. https://projecteuclid.org/euclid.aoas/1318514284
- Supplementary material: Supplementary materials for Measuring reproducibility of high-throughput experiments. This supplement consists of four parts. Part 1 describes the algorithm for estimating parameters in our copula mixture model. Part 2 provides a theoretical justification for the efficiency of our estimator for the proposed copula mixture model when n is large. Part 3 derives the properties of the correspondence curves in Section 2.1.1. Part 4 provides an extension of our model to the case with multiple replicates.