The Annals of Applied Statistics

Analysis of comparative data with hierarchical autocorrelation

Cécile Ané

Full-text: Open access


The asymptotic behavior of estimates and information criteria in linear models are studied in the context of hierarchically correlated sampling units. The work is motivated by biological data collected on species where autocorrelation is based on the species’ genealogical tree. Hierarchical autocorrelation is also found in many other kinds of data, such as from microarray experiments or human languages. Similar correlation also arises in ANOVA models with nested effects. I show that the best linear unbiased estimators are almost surely convergent but may not be consistent for some parameters such as the intercept and lineage effects, in the context of Brownian motion evolution on the genealogical tree. For the purpose of model selection I show that the usual BIC does not provide an appropriate approximation to the posterior probability of a model. To correct for this, an effective sample size is introduced for parameters that are inconsistently estimated. For biological studies, this work implies that tree-aware sampling design is desirable; adding more sampling units may not help ancestral reconstruction and only strong lineage effects may be detected with high power.

Article information

Ann. Appl. Stat., Volume 2, Number 3 (2008), 1078-1102.

First available in Project Euclid: 13 October 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Asymptotic convergence consistency linear model dependence comparative method phylogenetic tree Brownian motion evolution


Ané, Cécile. Analysis of comparative data with hierarchical autocorrelation. Ann. Appl. Stat. 2 (2008), no. 3, 1078--1102. doi:10.1214/08-AOAS173.

Export citation


  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19 716–723.
  • Akritas, M. and Arnold, S. (2000). Asymptotics for analysis of variance when the number of levels is large. J. Amer. Statist. Assoc. 95 212–226.
  • Beck, R. M. D., Bininda-Emonds, O. R. P., Cardillo, M., Liu, F.-G. R. and Purvis, A. (2006). A higher-level MRP supertree of placental mammals. BMC Evol. Biol. 6 93.
  • Bhattacharya, T., Daniels, M., Heckerman, D., Foley, B., Frahm, N., Kadie, C., Carlson, J., Yusim, K., McMahon, B., Gaschen, B., Mallal, S., Mullins, J., Nickle, D., Herbeck, J., Rousseau, C., Learn, G., Miura, T., Brander, C., Walker, B. D. and Korber, B. (2007). Founder effects in the assessment of HIV polymorphisms and hla allele associations. Science 315 1583–1586.
  • Blomberg, S. P., Garland, Jr., T. and Ives, A. R. (2003). Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57 717–745.
  • Burnham, K. P. and Anderson, D. R. (2002). Model selection and multimodel inference: A Practical Information-Theoretic Approach, 2nd ed. Springer, New York.
  • Butler, M. A. and King, A. A. (2004). Phylogenetic comparative analysis: A modeling approach for adaptive evolution. The American Naturalist 164 683–695.
  • Cardillo, M., Mace, G. M., Jones, K. E., Bielby, J., Bininda-Emonds, O. R. P., Sechrest, W., Orme, C. D. L. and Purvis, A. (2005). Multiple causes of high extinction risk in large mammal species. Science 309 1239–1241.
  • Cunningham, C. W., Omland, K. E. and Oakley, T. H. (1998). Reconstructing ancestral character states: a critical reappraisal. Trends in Ecology and Evolution 13 361–366.
  • Davis, C. C., Latvis, M., Nickrent, D. L., Wurdack, K. J. and Baum, D. A. (2007). Floral gigantism in Rafflesiaceae. Science 315 1812.
  • Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd ed. Springer, New York.
  • Dressler, R. L. (1993). Phylogeny and Classification of the Orchid Family. Dioscorides Press, USA.
  • Felsenstein, J. (1985). Phylogenies and the comparative method. The American Naturalist 125 1–15.
  • Felsenstein, J. (2004). Inferring Phylogenies. Sinauer Associates, Sunderland, MA.
  • Fu, Y.-X. and Li, W.-H. (1993). Maximum likelihood estimation of population parameters. Genetics 134 1261–1270.
  • Garland, T., Jr., Bennett, A. F. and Rezende, E. L. (2005). Phylogenetic approaches in comparative physiology. J. Experimental Biology 208 3015–3035.
  • Garland, T., Jr., Dickerman, A. W., Janis, C. M. and Jones, J. A. (1993). Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42 265–292.
  • Garland, T., Jr. and Ives, A. R. (2000). Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods. The American Naturalist 155 346–364.
  • Gu, X. (2004). Statistical framework for phylogenomic analysis of gene family expression profiles. Genetics 167 531–542.
  • Guo, H., Weiss, R. E., Gu, X. and Suchard, M. A. (2007). Time squared: Repeated measures on phylogenies. Molecular Biology Evolution 24 352–362.
  • Güven, B. (2006). The limiting distribution of the F-statistic from nonnormal universes. Statistics 40 545–557.
  • Hansen, T. F. (1997). Stabilizing selection and the comparative analysis of adaptation. Evolution 51 1341–1351.
  • Hansen, T. F. and Martins, E. P. (1996). Translating between microevolutionary process and macroevolutionary patterns: The correlation structure of interspecific data. Evolution 50 1404–1417.
  • Harvey, P. H. and Pagel, M. (1991). The Comparative Method in Evolutionary Biology. Oxford Univ. Press.
  • Housworth, E. A., Martins, E. P. and Lynch, M. (2004). The phylogenetic mixed model. The American Naturalist 163 84–96.
  • Huelsenbeck, J. P. and Bollback, J. (2001). Empirical and hierarchical Bayesian estimation of ancestral states. Systematic Biology 50 351–366.
  • Johnson, N. L. and Kotz, S. (1972). Distributions in Statistics: Continuous Multivariate Distributions. Wiley, New York.
  • Jønsson, K. A. and Fjeldså, J. (2006). A phylogenetic supertree of oscine passerine birds (Aves: Passeri). Zoologica Scripta 35 149–186.
  • Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773–795.
  • Kass, R. E., Tierney, L. and Kadane, J. B. (1990). The validity of posterior expansions based on Laplace’s method. In Bayesian and Likelihood methods in Statistics and Econometrics 473–488. North-Holland, Amsterdam.
  • Kass, R. E. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Amer. Statist. Assoc. 90 928–934.
  • Kass, R. E. and Wasserman, L. (1996). The selection of prior distributions by formal rules. J. Amer. Statist. Assoc. 91 1343–1370.
  • Mace, R. and Holden, C. J. (2005). A phylogenetic approach to cultural evolution. Trends in Ecology and Evolution 20 116–121.
  • Martins, E. P. (2000). Adaptation and the comparative method. Trends in Ecology and Evolution 15 296–299.
  • Martins, E. P. and Hansen, T. F. (1997). Phylogenies and the comparative method: A general approach to incorporating phylogenetic information into the analysis of interspecific data. The American Naturalist 149 646–667.
  • McArdle, B. and Rodrigo, A. G. (1994). Estimating the ancestral states of a continuous-valued character using squared-change parsimony: An analytical solution. Systematic Biology 43 573–578.
  • Pagel, M. (1999). The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Systematic Biology 48 612–622.
  • Pagel, M., Atkinson, Q. D. and Meade, A. (2007). Frequency of word-use predicts rates of lexical evolution throughout indo-european history. Nature 449 717–720.
  • Pagel, M., Meade, A. and Barker, D. (2004). Bayesian estimation of ancestral character states on phylogenies. Systematic Biology 53 673–684.
  • Paradis, E. and Claude, J. (2002). Analysis of comparative data using generalized estimating equations. J. Theoret. Biology 218 175–185.
  • Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology 25 111–163.
  • Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83 251–266.
  • Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd ed. Wiley, New York.
  • Rohlf, F. J. (2006). A comment on phylogenetic regression. Evolution 60 1509–1515.
  • Schluter, D., Price, T., Mooers, A. O. and Ludwig, D. (1997). Likelihood of ancestor states in adaptive radiation. Evolution 51 1699–1711.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Semple, C. and Steel, M. (2003). Phylogenetics. Oxford Univ. Press, New York.
  • Spooner, D. M. and Hijmans, R. J. (2001). Potato systematics and germplasm collecting, 1989-2000. American J. Potato Research 78 237–268; 395.
  • Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics 105 437–460.
  • Verdú, M. and Gleiser, G. (2006). Adaptive evolution of reproductive and vegetative traits driven by breeding systems. New Phytologist 169 409–417.
  • Wang, H. and Akritas, M. (2004). Rank tests for ANOVA with large number of factor levels. J. Nonparametr. Stat. 16 563–589.
  • Wasserman, L. (2000). Bayesian model selection and model averaging. J. Math. Psych. 44 92–107.
  • Zhang, H. and Zimmerman, D. L. (2005). Towards reconciling two asymptotic frameworks in spatial statistics. Biometrika 92 921–936.