Statistical Science

Applying the Bootstrap in Phylogeny Reconstruction

Pamela S. Soltis and Douglas E. Soltis

Full-text: Open access

Abstract

With the increasing emphasis in biology on reconstruction of phylogenetic trees, questions have arisen as to how confident one should be in a given phylogenetic tree and how support for phylogenetic trees should be measured. Felsenstein suggested that bootstrapping be applied across characters of a taxon-by-character data matrix to produce replicate "bootstrap data sets," each of which is then analyzed phylogenetically, with a consensus tree constructed to summarize the results of all replicates. The proportion of trees/replicates in which a grouping is recovered is presented as a measure of support for that group. Bootstrapping has become a common feature of phylogenetic analysis. However, the interpretation of bootstrap values remains open to discussion, and phylogeneticists have used these values in multiple ways. The usefulness of phylogenetic bootstrapping is potentially limited by a number of features, such as the size of the data matrix and the underlying assu! mptions of the phylogeny reconstruction program. Recent studies have explored the application of bootstrapping to large data sets and the relative performance of bootstrapping and jackknifing.

Article information

Source
Statist. Sci. Volume 18, Issue 2 (2003), 256-267.

Dates
First available in Project Euclid: 19 September 2003

Permanent link to this document
http://projecteuclid.org/euclid.ss/1063994980

Digital Object Identifier
doi:10.1214/ss/1063994980

Mathematical Reviews number (MathSciNet)
MR2026084

Keywords
Bootstrap phylogeny support jackknife

Citation

Soltis, Pamela S.; Soltis, Douglas E. Applying the Bootstrap in Phylogeny Reconstruction. Statist. Sci. 18 (2003), no. 2, 256--267. doi:10.1214/ss/1063994980. http://projecteuclid.org/euclid.ss/1063994980.


Export citation

References

  • Bull, J. J., Cunningham, C. W., Molineux, I. J., Badgett, M. R. and Hillis, D. M. (1993). Experimental molecular evolution of bacteriophage T7. Evolution 47 993--1007.
  • Carpenter, J. M. (1992). Random cladistics. Cladistics 8 147--153.
  • Carpenter, J. M. (1996). Uninformative bootstrapping. Cladistics 12 177--181.
  • Cavender, J. A. (1978). Taxonomy with confidence. Math. Biosci. 40 271--280.
  • Cavender, J. A. (1981). Tests of phylogenetic hypotheses under generalized models. Math. Biosci. 54 217--229.
  • Chase, M. W., Soltis, D. E., Olmstead, R. G., Morgan, D., Les, D. H., Mishler, B. D., Duvall, M. R., Price, R. A., Hills, H. G., Qiu, Y.-L., Kron, K. A., Rettig, J. H., Conti, E., Palmer, J. D., Manhart, J. R., Sytsma, K. J., Michaels, H. J., Kress, W. J., Karol, K. G., Clark, W. D., Hedrén, M., Gaut, B. S., Jansen, R. K., Kim, K. J., Wimpee, C. F., Smith, J. F., Furnier, G. R., Strauss, S. H., Xiang, Q.-Y., Plunkett, G. M., Soltis, P. S., Swensen, S. M., Williams, S. E., Gadek, P. A., Quinn, C. J., Eguiarte, L. E., Golenberg, E., Learn, G. H., Jr., Graham, S. W., Barrett, S. C. H., Dayanandan, S. and Albert, V. A. (1993). Phylogenetics of seed plants: An analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden 80 528--580.
  • Darwin, C. (1859). On the Origin of Species by Means of Natural Selection. J. Murray, London.
  • DeBry, R. W. and Olmstead, R. G. (2000). A simulation study of reduced tree-search effort in bootstrap resampling analysis. Systematic Biology 49 171--179.
  • Diaconis, P. and Efron, B. (1983). Computer-intensive methods in statistics. Scientific American 249 116--130.
  • Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1--26.
  • Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. SIAM, Philadelphia.
  • Efron, B. (1985). Bootstrap confidence intervals for a class of parametric problems. Biometrika 72 45--58.
  • Efron, B. and Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. Amer. Statist. 37 36--48.
  • Efron, B., Halloran, E. and Holmes, S. (1996). Bootstrap confidence levels for phylogenetic trees. Proc. Nat. Acad. Sci. U.S.A. 93 13,429--13,434.
  • Faith, D. P. and Cranston, P. S. (1991). Could a cladogram this short have arisen by chance alone? On permutation tests for cladistic structure. Cladistics 7 1--28.
  • Farris, J. S. (1983). The logical basis of phylogenetic analysis. In Advances in Cladistics 2 (N. I. Platnick and V. A. Funk, eds.) 7--36. Columbia Univ. Press.
  • Farris, J. S., Albert, V. A., Källersjö, M., Lipscomb, D. and Kluge, A. G. (1996). Parsimony jackknifing outperforms neighbor-joining. Cladistics 12 99--124.
  • Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27 401--410.
  • Felsenstein, J. (1985). Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39 783--791.
  • Felsenstein, J. (1988). Phylogenies from molecular sequences: Inference and reliability. Annual Review of Genetics 22 521--565.
  • Felsenstein, J. and Kishino, H. (1993). Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Systematic Biology 42 193--200.
  • Graybeal, A. (1998). Is it better to add taxa or characters to a difficult phylogenetic problem? Systematic Biology 47 9--17.
  • Harshman, J. (1994). The effect of irrelevant characters on bootstrap values. Systematic Biology 43 419--424.
  • Hedges, S. B. (1992). The number of replications needed for accurate estimation of the bootstrap $p$-value in phylogenetic studies. Molecular Biology and Evolution 9 366--369.
  • Hennig, W. (1966). Phylogenetic Systematics. Univ. Illinois Press, Urbana.
  • Hillis, D. M. (1996). Inferring complex phylogenies. Nature 383 130--131.
  • Hillis, D. M. and Bull, J. J. (1993). An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systematic Biology 42 182--192.
  • Hillis, D. M. and Dixon, M. T. (1989). Vertebrate phylogeny: Evidence from 28S ribosomal DNA sequences. In The Hierarchy of Life (B. Fernholm, K. Bremer and H. Jörnvall, eds.) 355--367. Elsevier, Amsterdam.
  • Huelsenbeck, J. P. (1995). Performance of phylogenetic methods in simulation. Systematic Biology 44 17--48.
  • Huelsenbeck, J. P. and Crandall, K. A. (1997). Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics 28 437--466.
  • Huelsenbeck, J. P. and Hillis, D. M. (1993). Success of phylogenetic methods in the four-taxon case. Systematic Biology 42 247--264.
  • Källersjö, M., Farris, J. S., Chase, M. W., Bremer, B., Fay, M. F., Humphries, C. J., Petersen, G., Seberg, O. and Bremer, K. (1998). Simultaneous parsimony jackknife analysis of 2538 rbcL DNA sequences reveals support for major clades of green plants, land plants, seed plants, and flowering plants. Plant Systematics and Evolution 213 259--287.
  • Kluge, A. G. (1997). Testability and the refutation and corroboration of cladistic hypotheses. Cladistics 13 81--96.
  • Kluge, A. G. (1999). The science of phylogenetic systematics: Explanation, prediction, and test. Cladistics 15 429--436.
  • Kluge, A. G. and Wolf, A. J. (1993). Cladistics: What's in a word? Cladistics 9 183--199.
  • Lanyon, S. (1985). Detecting internal inconsistencies in distance data. Systematic Zoology 34 397--403.
  • Miller, R. G. (1974). The jackknife---a review. Biometrika 61 1--15.
  • Mort, M. E., Soltis, P. S., Soltis, D. E. and Mabry, M. (2000). Comparison of three methods for estimating internal support on phylogenetic trees. Systematic Biology 49 160--171.
  • Mueller, L. D. and Ayala, F. J. (1982). Estimation and interpretation of genetic distance in empirical studies. Genetical Research 40 127--137.
  • Newton, M. A. (1996). Bootstrapping phylogenies: Large deviations and dispersion effects. Biometrika 83 315--328.
  • Penny, D., Foulds, L. R. and Hendy, M. D. (1982). Testing the theory of evolution by comparing phylogenetic trees constructed from 5 different protein sequences. Nature 297 197--200.
  • Penny, D. and Hendy, M. D. (1985). Testing methods of evolutionary tree construction. Cladistics 1 266--278.
  • Platnick, N. I. and Gaffney, E. S. (1977). Review of The Logic of Scientific Discovery and Conjectures and Refutations, by K. R. Popper. Systematic Zoology 26 361--365.
  • Platnick, N. I. and Gaffney, E. S. (1978). Evolutionary biolo- gy: A Popperian perspective. Systematic Zoology 27 138--141.
  • Rodrigo, A. (1993). Calibrating the bootstrap test of monophyly. International Journal for Parasitology 23 507--514.
  • Sanderson, M. J. (1989). Confidence limits on phylogenies: The bootstrap revisited. Cladistics 5 113--129.
  • Sanderson, M. J. (1995). Objections to bootstrapping phylogenies: A critique. Systematic Biology 44 299--320.
  • Sanderson, M. J. and Wojciechowski, M. F. (2000). Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae). Systematic Biology 49 671--685.
  • Savolainen, V., Chase, M. W., Morton, C. M., Hoot, S. B., Soltis, D. E., Bayer, C., Fay, M. F., De Bruijn, A., Sullivan, S. and Qiu, Y.-L. (2000). Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences. Systematic Biology 49 306--362.
  • Soltis, D. E., Soltis, P. S., Mort, M. E., Chase, M. W., Savolainen, V., Hoot, S. B. and Morton, C. M. (1998). Inferring complex phylogenies using parsimony: An empirical approach using three large DNA data sets for angiosperms. Systematic Biology 47 32--42.
  • Soltis, D. E., Soltis, P. S., Chase, M. W., Mort, M. E., Albach, D. C., Zanis, M., Savolainen, V., Hahn, W. H., Hoot, S. B., Fay, M. F., Axtell, M., Swensen, S. M., Prince, L. M., Kress, W. J., Nixon, K. C. and Farris, J. S. (2000). Angiosperm phylo- geny inferred from 18S rDNA, rbcL, and atpB sequences. Botanical Journal of the Linnean Society 133 381--461.
  • Soltis, P. S. and Novak, S. J. (1997). Polyphyly of the tuberous Lomatiums (Apiaceae): cpDNA evidence for morphological convergence. Systematic Botany 22 99--112.
  • Soltis, P. S., Soltis, D. E. and Chase, M. W. (1999). Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402 402--404.
  • Swofford, D. L. (1998). PAUP* 4.0: Phylogenetic analysis using parsimony (and other methods), Beta version 4.0. Sinauer, Sunderland, MA.
  • Templeton, A. R. (1983). Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37 221--244.
  • Wendel, J. F. and Albert, V. A. (1992). Phylogenetics of the cotton genus (Gossypium): Character-state weighted parsimony analysis of chloroplast-DNA restriction site data and its systematic and biogeographic implications. Systematic Botany 17 115--143.
  • Wiley, E. O. (1975). Karl R. Popper, systematics, and classification: A reply to Walter Bock and other evolutionary taxonomists. Systematic Zoology 24 233--243.
  • Zanis, M. J., Soltis, D. E., Soltis, P. S., Mathews, S. and Donoghue, M. J. (2002). The root of the angiosperms revisited. Proc. Nat. Acad. Sci. U.S.A. 99 6848--6853.
  • Zharkikh, A. and Li, W.-H. (1992a). Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Molecular Biology and Evolution 9 1119--1147.
  • Zharkikh, A. and Li, W.-H. (1992b). Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. II. Four taxa without a molecular clock. J. Molecular Evolution 35 356--366.
  • Zharkikh, A. and Li, W.-H. (1995). Estimation of confidence in phylogeny: The complete-and-partial bootstrap technique. Molecular Phylogenetics and Evolution 4 44--63.