We introduce a dependent Bayesian nonparametric model for the probabilistic modeling of membership of subgroups in a community based on partially replicated data. The focus here is on species-by-site data, that is, community data where observations at different sites are classified in distinct species. Our aim is to study the impact of additional covariates, for instance, environmental variables, on the data structure, and in particular on the community diversity. To this end, we introduce dependence a priori across the covariates and show that it improves posterior inference. We use a dependent version of the Griffiths–Engen–McCloskey distribution defined via the stick-breaking construction. This distribution is obtained by transforming a Gaussian process whose covariance function controls the desired dependence. The resulting posterior distribution is sampled by Markov chain Monte Carlo. We illustrate the application of our model to a soil microbial data set acquired across a hydrocarbon contamination gradient at the site of a fuel spill in Antarctica. This method allows for inference on a number of quantities of interest in ecotoxicology, such as diversity or effective concentrations, and is broadly applicable to the general problem of community response to environmental variables.
Ann. Appl. Stat.
10(3):
1496-1516
(September 2016).
DOI: 10.1214/16-AOAS944
Aitchison, J. (1994). Principles of compositional data analysis. In Multivariate Analysis and Its Applications (Hong Kong, 1992). Institute of Mathematical Statistics Lecture Notes—Monograph Series 24 73–81. IMS, Hayward, CA. MR1479457 10.1214/lnms/1215463786Aitchison, J. (1994). Principles of compositional data analysis. In Multivariate Analysis and Its Applications (Hong Kong, 1992). Institute of Mathematical Statistics Lecture Notes—Monograph Series 24 73–81. IMS, Hayward, CA. MR1479457 10.1214/lnms/1215463786
Alston, C. L., Mengersen, K. L. and Gardner, G. E. (2011). Bayesian mixture models: A blood-free dissection of a sheep. In Mixtures: Estimation and Applications (K. Mengersen, C. P. Robert and M. Titterington, eds.) 293–308. Wiley, Chichester. MR2883358 10.1002/9781119995678.ch14Alston, C. L., Mengersen, K. L. and Gardner, G. E. (2011). Bayesian mixture models: A blood-free dissection of a sheep. In Mixtures: Estimation and Applications (K. Mengersen, C. P. Robert and M. Titterington, eds.) 293–308. Wiley, Chichester. MR2883358 10.1002/9781119995678.ch14
Andrianakis, I. and Challenor, P. G. (2012). The effect of the nugget on Gaussian process emulators of computer models. Comput. Statist. Data Anal. 56 4215–4228. MR2957866Andrianakis, I. and Challenor, P. G. (2012). The effect of the nugget on Gaussian process emulators of computer models. Comput. Statist. Data Anal. 56 4215–4228. MR2957866
Arbel, J., Mengersen, K. and Rousseau, J. (2016). Supplement to “Bayesian nonparametric dependent model for partially replicated data: The influence of fuel spills on species diversity.” DOI:10.1214/16-AOAS944SUPP.Arbel, J., Mengersen, K. and Rousseau, J. (2016). Supplement to “Bayesian nonparametric dependent model for partially replicated data: The influence of fuel spills on species diversity.” DOI:10.1214/16-AOAS944SUPP.
Arbel, J., King, C. K., Raymond, B., Winsley, T. and Mengersen, K. L. (2015). Application of a Bayesian nonparametric model to derive toxicity estimates based on the response of Antarctic microbial communities to fuel-contaminated soil. Ecol. Evol. 5 2633–2645.Arbel, J., King, C. K., Raymond, B., Winsley, T. and Mengersen, K. L. (2015). Application of a Bayesian nonparametric model to derive toxicity estimates based on the response of Antarctic microbial communities to fuel-contaminated soil. Ecol. Evol. 5 2633–2645.
Arbel, J., Favaro, S., Nipoti, B. and Teh, Y. W. (2016). Bayesian nonparametric inference for discovery probabilities: Credible intervals and large sample asymptotics. Statist. Sinica. To appear. Available at arXiv:1506.04915. 1506.0491Arbel, J., Favaro, S., Nipoti, B. and Teh, Y. W. (2016). Bayesian nonparametric inference for discovery probabilities: Credible intervals and large sample asymptotics. Statist. Sinica. To appear. Available at arXiv:1506.04915. 1506.0491
Barrientos, A. F., Jara, A. and Quintana, F. A. (2012). On the support of MacEachern’s dependent Dirichlet processes and extensions. Bayesian Anal. 7 277–309. MR2934952 10.1214/12-BA709 euclid.ba/1339878889
Barrientos, A. F., Jara, A. and Quintana, F. A. (2012). On the support of MacEachern’s dependent Dirichlet processes and extensions. Bayesian Anal. 7 277–309. MR2934952 10.1214/12-BA709 euclid.ba/1339878889
Barrientos, A. F., Jara, A. and Quintana, F. A. (2015). Bayesian density estimation for compositional data using random Bernstein polynomials. J. Statist. Plann. Inference 166 116–125. MR3390138 10.1016/j.jspi.2015.01.006Barrientos, A. F., Jara, A. and Quintana, F. A. (2015). Bayesian density estimation for compositional data using random Bernstein polynomials. J. Statist. Plann. Inference 166 116–125. MR3390138 10.1016/j.jspi.2015.01.006
Bohlin, J., Skjerve, E. and Ussery, D. (2009). Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering. BMC Genomics 10 487.Bohlin, J., Skjerve, E. and Ussery, D. (2009). Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering. BMC Genomics 10 487.
Borges, E. P. and Roditi, I. (1998). A family of nonextensive entropies. Phys. Lett. A 246 399–402. MR1649464 10.1016/S0375-9601(98)00572-6Borges, E. P. and Roditi, I. (1998). A family of nonextensive entropies. Phys. Lett. A 246 399–402. MR1649464 10.1016/S0375-9601(98)00572-6
Broms, K. M., Hooten, M. B. and Fitzpatrick, R. M. (2015). Accounting for imperfect detection in Hill numbers for biodiversity studies. Methods in Ecology and Evolution 6 99–108.Broms, K. M., Hooten, M. B. and Fitzpatrick, R. M. (2015). Accounting for imperfect detection in Hill numbers for biodiversity studies. Methods in Ecology and Evolution 6 99–108.
Calabrese, E. J. (2005). Paradigm lost, paradigm found: The re-emergence of hormesis as a fundamental dose response model in the toxicological sciences. Environ. Pollut. 138 378–411.Calabrese, E. J. (2005). Paradigm lost, paradigm found: The re-emergence of hormesis as a fundamental dose response model in the toxicological sciences. Environ. Pollut. 138 378–411.
Caron, F., Davy, M. and Doucet, A. (2007). Generalized Pólya urn for time-varying Dirichlet process mixtures. In 23rd Conference on Uncertainty in Artificial Intelligence (UAI’2007). Vancouver, Canada.Caron, F., Davy, M. and Doucet, A. (2007). Generalized Pólya urn for time-varying Dirichlet process mixtures. In 23rd Conference on Uncertainty in Artificial Intelligence (UAI’2007). Vancouver, Canada.
Cerquetti, A. (2014). Bayesian nonparametric estimation of Patil–Taillie–Tsallis diversity under Gnedin–Pitman priors. Preprint. Available at arXiv:1404.3441. 1404.3441Cerquetti, A. (2014). Bayesian nonparametric estimation of Patil–Taillie–Tsallis diversity under Gnedin–Pitman priors. Preprint. Available at arXiv:1404.3441. 1404.3441
Chung, Y. and Dunson, D. B. (2011). The local Dirichlet process. Ann. Inst. Statist. Math. 63 59–80. MR2748934 10.1007/s10463-008-0218-9Chung, Y. and Dunson, D. B. (2011). The local Dirichlet process. Ann. Inst. Statist. Math. 63 59–80. MR2748934 10.1007/s10463-008-0218-9
Colwell, R. K., Chao, A., Gotelli, N. J., Lin, S.-Y., Mao, C. X., Chazdon, R. L. and Longino, J. T. (2012). Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages. Journal of Plant Ecology 5 3–21.Colwell, R. K., Chao, A., Gotelli, N. J., Lin, S.-Y., Mao, C. X., Chazdon, R. L. and Longino, J. T. (2012). Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages. Journal of Plant Ecology 5 3–21.
Donnelly, P. and Grimmett, G. (1993). On the asymptotic distribution of large prime factors. J. Lond. Math. Soc. (2) 47 395–404. MR1214904 10.1112/jlms/s2-47.3.395Donnelly, P. and Grimmett, G. (1993). On the asymptotic distribution of large prime factors. J. Lond. Math. Soc. (2) 47 395–404. MR1214904 10.1112/jlms/s2-47.3.395
Dorazio, R. M., Mukherjee, B., Zhang, L., Ghosh, M., Jelks, H. L. and Jordan, F. (2008). Modeling unobserved sources of heterogeneity in animal abundance using a Dirichlet process prior. Biometrics 64 635–644, 670–671. MR2432438 10.1111/j.1541-0420.2007.00873.xDorazio, R. M., Mukherjee, B., Zhang, L., Ghosh, M., Jelks, H. L. and Jordan, F. (2008). Modeling unobserved sources of heterogeneity in animal abundance using a Dirichlet process prior. Biometrics 64 635–644, 670–671. MR2432438 10.1111/j.1541-0420.2007.00873.x
Dunson, D. B. and Park, J.-H. (2008). Kernel stick-breaking processes. Biometrika 95 307–323. MR2521586 10.1093/biomet/asn012Dunson, D. B. and Park, J.-H. (2008). Kernel stick-breaking processes. Biometrika 95 307–323. MR2521586 10.1093/biomet/asn012
Dunson, D. B., Pillai, N. and Park, J.-H. (2007). Bayesian density regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 163–183. MR2325270 10.1111/j.1467-9868.2007.00582.xDunson, D. B., Pillai, N. and Park, J.-H. (2007). Bayesian density regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 163–183. MR2325270 10.1111/j.1467-9868.2007.00582.x
Dunson, D. B. and Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. J. Amer. Statist. Assoc. 104 1042–1051. MR2562004 10.1198/jasa.2009.tm08439Dunson, D. B. and Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. J. Amer. Statist. Assoc. 104 1042–1051. MR2562004 10.1198/jasa.2009.tm08439
Favaro, S., Lijoi, A. and Prünster, I. (2012). A new estimator of the discovery probability. Biometrics 68 1188–1196. MR3040025 10.1111/j.1541-0420.2012.01793.xFavaro, S., Lijoi, A. and Prünster, I. (2012). A new estimator of the discovery probability. Biometrics 68 1188–1196. MR3040025 10.1111/j.1541-0420.2012.01793.x
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230. MR350949 10.1214/aos/1176342360 euclid.aos/1176342360
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230. MR350949 10.1214/aos/1176342360 euclid.aos/1176342360
Ferrier, S., Manion, G., Elith, J. and Richardson, K. (2007). Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Divers. Distrib. 13 252–264.Ferrier, S., Manion, G., Elith, J. and Richardson, K. (2007). Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Divers. Distrib. 13 252–264.
Fordyce, J. A., Gompert, Z., Forister, M. L. and Nice, C. C. (2011). A hierarchical Bayesian approach to ecological count data: A flexible tool for ecologists. PLoS ONE 6 e26785.Fordyce, J. A., Gompert, Z., Forister, M. L. and Nice, C. C. (2011). A hierarchical Bayesian approach to ecological count data: A flexible tool for ecologists. PLoS ONE 6 e26785.
Foster, S. D. and Dunstan, P. K. (2010). The analysis of biodiversity using rank abundance distributions. Biometrics 66 186–195. MR2756705 10.1111/j.1541-0420.2009.01263.xFoster, S. D. and Dunstan, P. K. (2010). The analysis of biodiversity using rank abundance distributions. Biometrics 66 186–195. MR2756705 10.1111/j.1541-0420.2009.01263.x
Gelfand, A. E. (1996). Model determination using sampling-based methods. In Markov Chain Monte Carlo in Practice 145–161. Chapman & Hall, London. MR1397969Gelfand, A. E. (1996). Model determination using sampling-based methods. In Markov Chain Monte Carlo in Practice 145–161. Chapman & Hall, London. MR1397969
Gill, C. A. and Joanes, D. N. (1979). Bayesian estimation of Shannon’s index of diversity. Biometrika 66 81–85. MR529150 10.1093/biomet/66.1.81Gill, C. A. and Joanes, D. N. (1979). Bayesian estimation of Shannon’s index of diversity. Biometrika 66 81–85. MR529150 10.1093/biomet/66.1.81
Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika 40 237–264. MR61330 10.1093/biomet/40.3-4.237Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika 40 237–264. MR61330 10.1093/biomet/40.3-4.237
Griffin, J. E. and Steel, M. F. J. (2006). Order-based dependent Dirichlet processes. J. Amer. Statist. Assoc. 101 179–194. MR2268037 10.1198/016214505000000727Griffin, J. E. and Steel, M. F. J. (2006). Order-based dependent Dirichlet processes. J. Amer. Statist. Assoc. 101 179–194. MR2268037 10.1198/016214505000000727
Griffin, J. E. and Steel, M. F. J. (2011). Stick-breaking autoregressive processes. J. Econometrics 162 383–396. MR2795625 10.1016/j.jeconom.2011.03.001Griffin, J. E. and Steel, M. F. J. (2011). Stick-breaking autoregressive processes. J. Econometrics 162 383–396. MR2795625 10.1016/j.jeconom.2011.03.001
Havrda, J. and Charvát, F. (1967). Quantification method of classification processes. Concept of structural $a$-entropy. Kybernetika (Prague) 3 30–35. MR209067Havrda, J. and Charvát, F. (1967). Quantification method of classification processes. Concept of structural $a$-entropy. Kybernetika (Prague) 3 30–35. MR209067
Johnson, D. S., Ream, R. R., Towell, R. G., Williams, M. T. and Leon Guerrero, J. D. (2013). Bayesian clustering of animal abundance trends for inference and dimension reduction. J. Agric. Biol. Environ. Stat. 18 299–313. MR3110895 10.1007/s13253-013-0143-0Johnson, D. S., Ream, R. R., Towell, R. G., Williams, M. T. and Leon Guerrero, J. D. (2013). Bayesian clustering of animal abundance trends for inference and dimension reduction. J. Agric. Biol. Environ. Stat. 18 299–313. MR3110895 10.1007/s13253-013-0143-0
Kaniadakis, G., Lissia, M. and Scarfone, A. M. (2005). Two-parameter deformations of logarithm, exponential, and entropy: A consistent framework for generalized statistical mechanics. Phys. Rev. E (3) 71 046128, 12. MR2139991 10.1103/PhysRevE.71.046128Kaniadakis, G., Lissia, M. and Scarfone, A. M. (2005). Two-parameter deformations of logarithm, exponential, and entropy: A consistent framework for generalized statistical mechanics. Phys. Rev. E (3) 71 046128, 12. MR2139991 10.1103/PhysRevE.71.046128
Lijoi, A., Mena, R. H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769–786. MR2416792 10.1093/biomet/asm061Lijoi, A., Mena, R. H. and Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769–786. MR2416792 10.1093/biomet/asm061
Lijoi, A., Nipoti, B. and Prünster, I. (2014a). Bayesian inference with dependent normalized completely random measures. Bernoulli 20 1260–1291. MR3217444 10.3150/13-BEJ521 euclid.bj/1402488940
Lijoi, A., Nipoti, B. and Prünster, I. (2014a). Bayesian inference with dependent normalized completely random measures. Bernoulli 20 1260–1291. MR3217444 10.3150/13-BEJ521 euclid.bj/1402488940
Lijoi, A., Nipoti, B. and Prünster, I. (2014b). Dependent mixture models: Clustering and borrowing information. Comput. Statist. Data Anal. 71 417–433. MR3131980Lijoi, A., Nipoti, B. and Prünster, I. (2014b). Dependent mixture models: Clustering and borrowing information. Comput. Statist. Data Anal. 71 417–433. MR3131980
Lovell, D., Pawlowsky-Glahn, V., Egozcue, J. J., Marguerat, S. and Bähler, J. (2015). Proportionality: A valid alternative to correlation for relative data. PLoS Comput. Biol. 11 e1004075.Lovell, D., Pawlowsky-Glahn, V., Egozcue, J. J., Marguerat, S. and Bähler, J. (2015). Proportionality: A valid alternative to correlation for relative data. PLoS Comput. Biol. 11 e1004075.
MacEachern, S. N. (1999). Dependent nonparametric processes. In ASA Proceedings of the Section on Bayesian Statistical Science 50–55. Amer. Statist. Assoc., Alexandria, VA.MacEachern, S. N. (1999). Dependent nonparametric processes. In ASA Proceedings of the Section on Bayesian Statistical Science 50–55. Amer. Statist. Assoc., Alexandria, VA.
Pati, D., Dunson, D. B. and Tokdar, S. T. (2013). Posterior consistency in conditional distribution estimation. J. Multivariate Anal. 116 456–472. MR3049916 10.1016/j.jmva.2013.01.011Pati, D., Dunson, D. B. and Tokdar, S. T. (2013). Posterior consistency in conditional distribution estimation. J. Multivariate Anal. 116 456–472. MR3049916 10.1016/j.jmva.2013.01.011
Patil, G. P. and Taillie, C. (1982). Diversity as a concept and its measurement. J. Amer. Statist. Assoc. 77 548–567. MR675883 10.1080/01621459.1982.10477845Patil, G. P. and Taillie, C. (1982). Diversity as a concept and its measurement. J. Amer. Statist. Assoc. 77 548–567. MR675883 10.1080/01621459.1982.10477845
Rodríguez, A. and Dunson, D. B. (2011). Nonparametric Bayesian models through probit stick-breaking processes. Bayesian Anal. 6 145–177. MR2781811 10.1214/11-BA605 euclid.ba/1339611944
Rodríguez, A. and Dunson, D. B. (2011). Nonparametric Bayesian models through probit stick-breaking processes. Bayesian Anal. 6 145–177. MR2781811 10.1214/11-BA605 euclid.ba/1339611944
Rodríguez, A., Dunson, D. B. and Gelfand, A. E. (2010). Latent stick-breaking processes. J. Amer. Statist. Assoc. 105 647–659. MR2724849 10.1198/jasa.2010.tm08241Rodríguez, A., Dunson, D. B. and Gelfand, A. E. (2010). Latent stick-breaking processes. J. Amer. Statist. Assoc. 105 647–659. MR2724849 10.1198/jasa.2010.tm08241
Royle, J. A. and Dorazio, R. M. (2008). Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations and Communities. Academic Press, San Diego, CA.Royle, J. A. and Dorazio, R. M. (2008). Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations and Communities. Academic Press, San Diego, CA.
Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 319–392. MR2649602 10.1111/j.1467-9868.2008.00700.xRue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 319–392. MR2649602 10.1111/j.1467-9868.2008.00700.x
Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., Lesniewski, R. A., Oakley, B. B., Parks, D. H., Robinson, C. J., Sahl, J. W., Stres, B., Thallinger, G. G., Horn, D. J. V. and Weber, C. F. (2009). Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75 7537–7541.Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., Lesniewski, R. A., Oakley, B. B., Parks, D. H., Robinson, C. J., Sahl, J. W., Stres, B., Thallinger, G. G., Horn, D. J. V. and Weber, C. F. (2009). Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75 7537–7541.
Siciliano, S. D., Palmer, A. S., Winsley, T., Lamb, E., Bissett, A., Brown, M. V., van Dorst, J., Ji, M., Ferrari, B. C., Grogan, P., Chu, H. and Snape, I. (2014). Soil fertility is associated with fungal and bacterial richness, whereas pH is associated with community composition in polar soil microbial communities. Soil Biol. Biochem. 78 10–20.Siciliano, S. D., Palmer, A. S., Winsley, T., Lamb, E., Bissett, A., Brown, M. V., van Dorst, J., Ji, M., Ferrari, B. C., Grogan, P., Chu, H. and Snape, I. (2014). Soil fertility is associated with fungal and bacterial richness, whereas pH is associated with community composition in polar soil microbial communities. Soil Biol. Biochem. 78 10–20.
Snape, I., Siciliano, S. D., Winsley, T., van Dorst, J., Mukan, J., Palmer, A. S. and Lagerewskij, G. (2015). Operational Taxonomic Unit (OTU) Microbial Ecotoxicology Data from Macquarie Island and Casey Station: TPH, Chemistry and OTU Abundance Data. Australian Antarctic Data Centre.Snape, I., Siciliano, S. D., Winsley, T., van Dorst, J., Mukan, J., Palmer, A. S. and Lagerewskij, G. (2015). Operational Taxonomic Unit (OTU) Microbial Ecotoxicology Data from Macquarie Island and Casey Station: TPH, Chemistry and OTU Abundance Data. Australian Antarctic Data Centre.
van den Boogaart, K. G. and Tolosana-Delgado, R. (2013). Fundamental concepts of compositional data analysis. In Analyzing Compositional Data with R, Use R!. Springer, Heidelberg. MR3099409van den Boogaart, K. G. and Tolosana-Delgado, R. (2013). Fundamental concepts of compositional data analysis. In Analyzing Compositional Data with R, Use R!. Springer, Heidelberg. MR3099409
van der Vaart, A. W. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth. Ann. Statist. 37 2655–2675. MR2541442 10.1214/08-AOS678 euclid.aos/1247836664
van der Vaart, A. W. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth. Ann. Statist. 37 2655–2675. MR2541442 10.1214/08-AOS678 euclid.aos/1247836664
Wang, Y., Naumann, U., Wright, S. T. and Warton, D. I. (2012). mvabund—An R package for model-based analysis of multivariate abundance data. Methods in Ecology and Evolution 3 471–474.Wang, Y., Naumann, U., Wright, S. T. and Warton, D. I. (2012). mvabund—An R package for model-based analysis of multivariate abundance data. Methods in Ecology and Evolution 3 471–474.