The Annals of Applied Statistics

Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology

Keith A. Baggerly and Kevin R. Coombes

Full-text: Open access

Abstract

High-throughput biological assays such as microarrays let us ask very detailed questions about how diseases operate, and promise to let us personalize therapy. Data processing, however, is often not described well enough to allow for exact reproduction of the results, leading to exercises in “forensic bioinformatics” where aspects of raw data and reported results are used to infer what methods must have been employed. Unfortunately, poor documentation can shift from an inconvenience to an active danger when it obscures not just methods but errors. In this report we examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials are currently being allocated to treatment arms on the basis of these results. However, we show in five case studies that the results incorporate several simple errors that may be putting patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. We then discuss steps we are taking to avoid such errors in our own investigations.

Article information

Source
Ann. Appl. Stat. Volume 3, Number 4 (2009), 1309-1334.

Dates
First available in Project Euclid: 1 March 2010

Permanent link to this document
http://projecteuclid.org/euclid.aoas/1267453942

Digital Object Identifier
doi:10.1214/09-AOAS291

Zentralblatt MATH identifier
05696880

Mathematical Reviews number (MathSciNet)
MR2752136

Citation

Baggerly, Keith A.; Coombes, Kevin R. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. The Annals of Applied Statistics 3 (2009), no. 4, 1309--1334. doi:10.1214/09-AOAS291. http://projecteuclid.org/euclid.aoas/1267453942.


Export citation

References

  • Augustine, C. K., Yoo, J. S., Potti, A., Yoshimoto, Y., Zipfel, P. A., Friedman, H. S., Nevins, J. R., Ali-Osman, F. and Tyler, D. S. (2009). Genomic and molecular profiling predicts response to temozolomide in melanoma. Clin. Cancer Res. 15 502–510. Correction p. 3240.
  • Baggerly, K. A., Coombes, K. R. and Neeley, E. S. (2008). Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. J. Clin. Oncol. 26 1186–1187.
  • Baggerly, K. A. and Coombes, K. R. (2009a). Supplement 1, “Examining doxorubicin in detail,” to “Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology.” DOI: 10.1214/09-AOAS291SUPPA.
  • Baggerly, K. A. and Coombes, K. R. (2009b). Supplement 2, “Cisplatin and pemetrexed,” to “Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology.” DOI: 10.1214/09-AOAS291SUPPB.
  • Baggerly, K. A. and Coombes, K. R. (2009c). Supplement 3, “Examining combination therapy,” to “Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology.” DOI: 10.1214/09-AOAS291SUPPC.
  • Baggerly, K. A. and Coombes, K. R. (2009d). Supplement 4, “Surveying cell lines,” to “Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology.” DOI: 10.1214/09-AOAS291SUPPD.
  • Baggerly, K. A. and Coombes, K. R. (2009e). Supplement 5, “Examining docetaxel in detail,” to “Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology.” DOI: 10.1214/09-AOAS291SUPPE.
  • Baggerly, K. A., Edmonson, S. R., Morris, J. S. and Coombes, K. R. (2004). High-resolution serum proteomic patterns for ovarian cancer detection. Endocr. Relat. Cancer 11 583–584.
  • Baggerly, K. A., Morris, J. S. and Coombes, K. R. (2004). Reproducibility of SELDI-TOF protein patterns in serum: Comparing datasets from different experiments. Bioinformatics 20 777–785.
  • Baggerly, K. A., Morris, J. S., Edmonson, S. R. and Coombes, K. R. (2005). Signal in noise: Evaluating reported reproducibility of serum proteomic tests for ovarian cancer. J. Natl. Cancer Inst. 97 307–309.
  • Bonnefoi, H., Potti, A., Delorenzi, M., Mauriac, L., Campone, M., Tubiana-Hulin, M., Petit, T., Rouanet, P., Jassem, J., Blot, E., Becette, V., Farmer, P., Andre, S., Acharya, C. R., Mukherjee, S., Cameron, D., Bergh, J., Nevins, J. R. and Iggo, R. D. (2007). Validation of gene signatures that predict the response of breast cancer to neoadjuvant chemotherapy: A substudy of the EORTC 10994/BIG 00-01 clinical trial. Lancet Oncol. 8 1071–1078.
  • Chang, J. C., Wooten, E. C., Tsimelzon, A., Hilsenbeck, S. G., Gutierrez, M. C., Elledge, R., Mohsin, S., Osborne, C. K., Chamness, G. C., Allred, D. C. and O’Connell, P. (2003). Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362 362–369.
  • Clinical Trial NCT00509366 (2009). Study using a genomic predictor of platinum resistance to guide therapy in stage IIIB/IV non-small cell lung cancer (TOP0602). Available at http://clinicaltrials.gov/ct2/show/NCT00509366.
  • Coombes, K. R., Wang, J. and Baggerly, K. A. (2007). Microarrays: Retracing steps. Nat. Med. 13 1276–1277.
  • Discover (2007). The top 6 genetics stories of 2006. Discover January.
  • Gentleman, R. (2005). Reproducible research: A bioinformatics case study. Stat. Appl. Genet. Mol. Biol. 4 Article 2.
  • Gentleman, R. and Temple Lang, D. (2007). Statistical analyses and reproducible research. J. Comput. Graph. Statist. 16 1–23.
  • Györffy, B., Surowiak, P., Kiesslich, O., Denkert, C., Schafer, R., Dietel, M. and Lage, H. (2006). Gene expression profiling of 30 cancer cell lines predicts resistance towards 11 anticancer drugs at clinically achieved concentrations. Int. J. Cancer 118 1699–1712.
  • Holleman, A., Cheok, M. H., den Boer, M. L., Yang, W., Veerman, A. J., Kazemier, K. M., Pei, D., Cheng, C., Pui, C. H., Relling, M. V., Janka-Schaub, G. E., Pieters, R. and Evans, W. E. (2004). Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment. N. Engl. J. Med. 351 533–542.
  • Hsu, D. S., Balakumaran, B. S., Acharya, C. R., Vlahovic, V., Walters, K. S., Garman, K., Anders, C., Riedel, R. F., Lancaster, J., Harpole, D., Dressman, H. K., Nevins, J. R., Febbo, P. G. and Potti, A. (2007). Pharmacogenomic strategies provide a rational approach to the treatment of cisplatin-resistant patients with advanced cancer. J. Clin. Oncol. 25 4350–4357.
  • Ioannidis, J. P., Allison, D. B., Ball, C. A., Coulibaly, I., Cui, X., Culhane, A. C., Falchi, M., Furlanello, C., Game, L., Jurman, G., Mangion, J., Mehta, T., Nitzberg, M., Page, G. P., Petretto, E. and van Noort, V. (2009). Repeatability of published microarray gene expression analyses. Nat. Genet. 41 149–155.
  • Leisch, F. (2002). Dynamic generation of statistical reports using literate data analysis. In Compstat 2002—Proceedings in Computational Statistics (W. Härdle and B. Rönz, eds.) 575–580. Physika Verlag, Heidelberg, Germany.
  • Li, C. (2008). Automating dChip: Toward reproducible sharing of microarray data analysis. BMC Bioinformatics 9 231.
  • Lugthart, S., Cheok, M. H., den Boer, M. L., Yang, W., Holleman, A., Cheng, C., Pui, C. H., Relling, M. V., Janka-Schaub, G. E., Pieters, R. and Evans, W. E. (2005). Identification of genes associated with chemotherapy crossresistance and treatment response in childhood acute lymphoblastic leukemia. Cancer Cell 7 375–386.
  • McShane, L. M., Altman, D. G., Sauerbrei, W., Taube, S. E., Gion, M. and Clark, G. M. (2005). Reporting recommendations for tumor marker prognostic studies (REMARK). J. Natl. Cancer Inst. 97 1180–1184.
  • Potti, A. and Nevins, J. (2007). Reply to Microarrays: Retracing steps. Nat. Med. 13 1277–1278.
  • Potti, A., Dressman, H. K., Bild, A., Riedel, R. F., Chan, G., Sayer, R., Cragun, J., Cottrill, H., Kelley, M. J., Petersen, R., Harpole, D., Marks, J., Berchuck, A., Ginsburg, G. S., Febbo, P., Lancaster, J. and Nevins, J. R. (2006). Genomic signatures to guide the use of chemotherapeutics. Nat. Med. 12 1294–1300.
  • Potti, A., Dressman, H. K., Bild, A., Riedel, R. F., Chan, G., Sayer, R., Cragun, J., Cottrill, H., Kelley, M. J., Petersen, R., Harpole, D., Marks, J., Berchuck, A., Ginsburg, G. S., Febbo, P., Lancaster, J. and Nevins, J. (2008). Corrigendum to “Genomics signatures to guide the use of chemotherapeutics.” Nat. Med. 14 889.
  • Potti, A., Nevins, J. R. and Lancaster, J. M. (2009). Predicting responsiveness to cancer therapeutics. U.S. Patent Application 20090105167. Available at http://www.freepatentsonline.com/y2009/0105167.html.
  • R Development Core Team (2008). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org.
  • Riedel, R. F., Porrello, A., Pontzer, E., Chenette, E. J., Hsu, D. S., Balakumaran, B., Potti, A., Nevins, J. and Febbo, P. G. (2008). A genomic approach to identify molecular pathways associated with chemotherapy resistance. Mol. Cancer Ther. 7 3141–3149.
  • Ruschhaupt, M., Huber, W., Poustka, A. and Mansmann, U. (2004). A compendium to ensure computational reproducibility in high-dimensional classification tasks. Stat. Appl. Genet. Mol. Biol. 3 37.
  • Salter, K. H., Acharya, C. R., Walters, K. S., Redman, R., Anguiano, A., Garman, K. S., Anders, C. K., Mukherjee, S., Dressman, H. K., Barry, W. T., Marcom, K. P., Olson, J., Nevins, J. R. and Potti, A. (2008). An integrated approach to the prediction of chemotherapeutic response in patients with breast cancer. PLoS One 3 e1908.
  • Stivers, D. N., Wang, J., Rosner, G. L. and Coombes, K. R. (2003). Organ-specific differences in gene expression and UniGene annotations describing source material. In Methods of Microarray Data Analysis III (K. Johnson and S. Lin, eds.) 59–72. Kluwer Academic, Boston.