Open Access
September 2013 Bayesian Nonparametric ROC Regression Modeling
Vanda Inácio de Carvalho, Alejandro Jara, Timothy E. Hanson, Miguel de Carvalho
Bayesian Anal. 8(3): 623-646 (September 2013). DOI: 10.1214/13-BA825
Abstract

The receiver operating characteristic (ROC) curve is the most widely used measure for evaluating the discriminatory performance of a continuous biomarker. Incorporating covariates in the analysis can potentially enhance information gathered from the biomarker, as its discriminatory ability may depend on these. In this paper we propose a dependent Bayesian nonparametric model for conditional ROC estimation. Our model is based on dependent Dirichlet processes, where the covariate-dependent ROC curves are indirectly modeled using probability models for related probability distributions in the diseased and healthy groups. Our approach allows for the entire distribution in each group to change as a function of the covariates, provides exact posterior inference up to a Monte Carlo error, and can easily accommodate multiple continuous and categorical predictors. Simulation results suggest that, regarding the mean squared error, our approach performs better than its competitors for small sample sizes and nonlinear scenarios. The proposed model is applied to data concerning diagnosis of diabetes.

References

1.

Alonzo, T. A. and Pepe, M. S. (2002). “Distribution-free ROC analysis using binary regression techniques.” Biostatistics, 3: 421–432.Alonzo, T. A. and Pepe, M. S. (2002). “Distribution-free ROC analysis using binary regression techniques.” Biostatistics, 3: 421–432.

2.

Barrientos, A. F., Jara, A., and Quintana, F. (2012). “On the support of MacEachern’s dependent Dirichlet processes and extensions.” Bayesian Analysis, 7: 277–310. MR2934952 10.1214/12-BA709 euclid.ba/1339878889 Barrientos, A. F., Jara, A., and Quintana, F. (2012). “On the support of MacEachern’s dependent Dirichlet processes and extensions.” Bayesian Analysis, 7: 277–310. MR2934952 10.1214/12-BA709 euclid.ba/1339878889

3.

Blackwell, D. and MacQueen, J. (1973). “Ferguson distributions via Pólya urn schemes.” The Annals of Statistics, 1: 353–355. MR362614 10.1214/aos/1176342372 euclid.aos/1176342372 Blackwell, D. and MacQueen, J. (1973). “Ferguson distributions via Pólya urn schemes.” The Annals of Statistics, 1: 353–355. MR362614 10.1214/aos/1176342372 euclid.aos/1176342372

4.

Branscum, A. J., Johnson, W. O., Hanson, T. E., and Gardner, I. A. (2008). “Bayesian semiparametric ROC curve estimation and disease diagnosis.” Statistics in Medicine, 27: 2474–2496. MR2432500 10.1002/sim.3250Branscum, A. J., Johnson, W. O., Hanson, T. E., and Gardner, I. A. (2008). “Bayesian semiparametric ROC curve estimation and disease diagnosis.” Statistics in Medicine, 27: 2474–2496. MR2432500 10.1002/sim.3250

5.

Cai, T. (2004). “Semiparametric ROC regression analysis with placement values.” Biostatistics, 5: 45–60.Cai, T. (2004). “Semiparametric ROC regression analysis with placement values.” Biostatistics, 5: 45–60.

6.

De Iorio, M., Johnson, W. O., Müller, P., and Rosner, G. L. (2009). “Bayesian nonparametric non-proportional hazards survival modelling.” Biometrics, 65: 762–771. MR2649849 10.1111/j.1541-0420.2008.01166.xDe Iorio, M., Johnson, W. O., Müller, P., and Rosner, G. L. (2009). “Bayesian nonparametric non-proportional hazards survival modelling.” Biometrics, 65: 762–771. MR2649849 10.1111/j.1541-0420.2008.01166.x

7.

De Iorio, M., Müller, P., Rosner, G. L., and MacEachern, S. N. (2004). “An ANOVA model for dependent random measures.” Journal of the American Statistical Association, 99: 205–215. MR2054299 1089.62513 10.1198/016214504000000205De Iorio, M., Müller, P., Rosner, G. L., and MacEachern, S. N. (2004). “An ANOVA model for dependent random measures.” Journal of the American Statistical Association, 99: 205–215. MR2054299 1089.62513 10.1198/016214504000000205

8.

De la Cruz, R., Quintana, F. A., and Müller, P. (2007). “Semiparametric Bayesian classification with longitudinal markers.” Journal of the Royal Statistical Society, Ser. C, 56(2): 119–137. MR2359237 05188760 10.1111/j.1467-9876.2007.00569.xDe la Cruz, R., Quintana, F. A., and Müller, P. (2007). “Semiparametric Bayesian classification with longitudinal markers.” Journal of the Royal Statistical Society, Ser. C, 56(2): 119–137. MR2359237 05188760 10.1111/j.1467-9876.2007.00569.x

9.

Dubey, S. (1970). “Compound gamma, beta and F distributions.” Metrika, 16: 27–31. MR312624 10.1007/BF02613934Dubey, S. (1970). “Compound gamma, beta and F distributions.” Metrika, 16: 27–31. MR312624 10.1007/BF02613934

10.

Eilers, P. H. C. and Marx, B. D. (1996). “Flexible smoothing with B-splines and penalties.” Statistical Science, 11(2): 89–121. MR1435485 10.1214/ss/1038425655 euclid.ss/1038425655 Eilers, P. H. C. and Marx, B. D. (1996). “Flexible smoothing with B-splines and penalties.” Statistical Science, 11(2): 89–121. MR1435485 10.1214/ss/1038425655 euclid.ss/1038425655

11.

Erkanli, A., Sung, M., Costello, E. J., and Angold, A. (2006). “Bayesian semiparametric ROC analysis.” Statistics in Medicine, 25: 3905–3928. MR2297400 10.1002/sim.2496Erkanli, A., Sung, M., Costello, E. J., and Angold, A. (2006). “Bayesian semiparametric ROC analysis.” Statistics in Medicine, 25: 3905–3928. MR2297400 10.1002/sim.2496

12.

Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. London: Chapman & Hall. MR1383587 0873.62037Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. London: Chapman & Hall. MR1383587 0873.62037

13.

Faraggi, D. (2003). “Adjusting receiver operating characteristic curves and related indices for covariates.” Journal of the Royal Statistical Society, Ser.D, 52: 1152–1174. MR1977259 10.1111/1467-9884.00350Faraggi, D. (2003). “Adjusting receiver operating characteristic curves and related indices for covariates.” Journal of the Royal Statistical Society, Ser.D, 52: 1152–1174. MR1977259 10.1111/1467-9884.00350

14.

González-Manteiga, W., Pardo-Fernandéz, J. C., and Van Keilegom, I. (2011). “ROC curves in non-parametric location-scale regression models.” Scandinavian Journal of Statistics, 38: 169–184. MR2760145 10.1111/j.1467-9469.2010.00693.xGonzález-Manteiga, W., Pardo-Fernandéz, J. C., and Van Keilegom, I. (2011). “ROC curves in non-parametric location-scale regression models.” Scandinavian Journal of Statistics, 38: 169–184. MR2760145 10.1111/j.1467-9469.2010.00693.x

15.

Hanson, T., Branscum, A., and Gardner, I. (2008a). “Multivariate mixtures of Polya trees for modelling ROC data.” Statistical Modelling, 8: 81–96. MR2750632 10.1177/1471082X0700800106Hanson, T., Branscum, A., and Gardner, I. (2008a). “Multivariate mixtures of Polya trees for modelling ROC data.” Statistical Modelling, 8: 81–96. MR2750632 10.1177/1471082X0700800106

16.

Hanson, T., Kottas, A., and Branscum, A. J. (2008b). “Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian non-parametric approaches.” Journal of the Royal Statistical Society, Ser.C, 57: 207–225. MR2420437 05622195 10.1111/j.1467-9876.2007.00609.xHanson, T., Kottas, A., and Branscum, A. J. (2008b). “Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian non-parametric approaches.” Journal of the Royal Statistical Society, Ser.C, 57: 207–225. MR2420437 05622195 10.1111/j.1467-9876.2007.00609.x

17.

Hsieh, F. and Turnbull, B. (1996). “Nonparametric and semiparametric estimation of the receiver operating characteristic curve.” The Annals of Statistics, 24: 24–40. MR1389878 0855.62029 10.1214/aos/1033066197 euclid.aos/1033066197 Hsieh, F. and Turnbull, B. (1996). “Nonparametric and semiparametric estimation of the receiver operating characteristic curve.” The Annals of Statistics, 24: 24–40. MR1389878 0855.62029 10.1214/aos/1033066197 euclid.aos/1033066197

18.

Inácio, V., Turkman, A. A., Nakas, C. T., and Alonzo, T. A. (2011). “Nonparametric Bayesian estimation of the three-way receiver operating characteristic surface.” Biometrical Journal, 53: 1011–1024. MR2861524 1238.62036 10.1002/bimj.201100070Inácio, V., Turkman, A. A., Nakas, C. T., and Alonzo, T. A. (2011). “Nonparametric Bayesian estimation of the three-way receiver operating characteristic surface.” Biometrical Journal, 53: 1011–1024. MR2861524 1238.62036 10.1002/bimj.201100070

19.

Jara, A. (2007). “Applied Bayesian non- and semi-parametric inference using DPpackage.” Rnews, 7: 17–26.Jara, A. (2007). “Applied Bayesian non- and semi-parametric inference using DPpackage.” Rnews, 7: 17–26.

20.

Jara, A., Hanson, T., Quintana, F., Müller, P., and Rosner, G. L. (2011). “DPpackage: Bayesian semi- and sonparametric modeling in R.” Journal of Statistical Software, 40: 1–30.Jara, A., Hanson, T., Quintana, F., Müller, P., and Rosner, G. L. (2011). “DPpackage: Bayesian semi- and sonparametric modeling in R.” Journal of Statistical Software, 40: 1–30.

21.

Jara, A., Lesaffre, E., De Iorio, M., and Quintana, F. A. (2010). “Bayesian semiparametric inference for multivariate doubly-interval-censored data.” Annals of Applied Statistics, 4: 2126–2149. MR2829950 1220.62023 10.1214/10-AOAS368 euclid.aoas/1294167813 Jara, A., Lesaffre, E., De Iorio, M., and Quintana, F. A. (2010). “Bayesian semiparametric inference for multivariate doubly-interval-censored data.” Annals of Applied Statistics, 4: 2126–2149. MR2829950 1220.62023 10.1214/10-AOAS368 euclid.aoas/1294167813

22.

Lloyd, C. J. (1998). “Using smooth receiver operating characteristic curves to summarize and compare diagnostic systems.” Journal of the American Statistical Association, 93: 1356–1364.Lloyd, C. J. (1998). “Using smooth receiver operating characteristic curves to summarize and compare diagnostic systems.” Journal of the American Statistical Association, 93: 1356–1364.

23.

MacEachern, S. N. (1994). “Estimating normal means with a conjugate style Dirichlet process prior.” Communications in Statistics: Simulation and Computation, 23: 727–741. MR1293996 0825.62053 10.1080/03610919408813196MacEachern, S. N. (1994). “Estimating normal means with a conjugate style Dirichlet process prior.” Communications in Statistics: Simulation and Computation, 23: 727–741. MR1293996 0825.62053 10.1080/03610919408813196

24.

— (2000). “Dependent Dirichlet processes.” Technical report, Department of Statistics, The Ohio State University.— (2000). “Dependent Dirichlet processes.” Technical report, Department of Statistics, The Ohio State University.

25.

MacEachern, S. N. and Müller, P. (1998). “Estimating mixture of Dirichlet process models.” Journal of Computational and Graphical Statistics, 7: 223–338.MacEachern, S. N. and Müller, P. (1998). “Estimating mixture of Dirichlet process models.” Journal of Computational and Graphical Statistics, 7: 223–338.

26.

Muliere, P. and Tardella, L. (1998). “Approximating distributions of random functionals of Ferguson-Dirichlet priors.” The Canadian Journal of Statistics, 26: 283–297. MR1648431 10.2307/3315511Muliere, P. and Tardella, L. (1998). “Approximating distributions of random functionals of Ferguson-Dirichlet priors.” The Canadian Journal of Statistics, 26: 283–297. MR1648431 10.2307/3315511

27.

Neal, R. (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9: 249–265. MR1823804Neal, R. (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9: 249–265. MR1823804

28.

Peng, L. and Zhou, X. H. (2004). “Local linear smoothing of receiver operating characteristic (ROC) curves.” Journal of Statistical Planning and Inference, 118: 129–143. MR2015225 1031.62097 10.1016/S0378-3758(02)00394-4Peng, L. and Zhou, X. H. (2004). “Local linear smoothing of receiver operating characteristic (ROC) curves.” Journal of Statistical Planning and Inference, 118: 129–143. MR2015225 1031.62097 10.1016/S0378-3758(02)00394-4

29.

Pepe, M. S. (1998). “Three approaches to regression analysis of receiver operating characteristic curves for continuous test results.” Biometrics, 54: 124–135.Pepe, M. S. (1998). “Three approaches to regression analysis of receiver operating characteristic curves for continuous test results.” Biometrics, 54: 124–135.

30.

— (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. New York: Oxford University Press. MR2260483— (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. New York: Oxford University Press. MR2260483

31.

R Development Core Team (2012). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.R Development Core Team (2012). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

32.

Richardson, S. and Green, P. J. (1997). “On Bayesian analysis of mixtures with an unknown number of components.” Journal of the Royal Statistical Society, Ser.B, 59: 731–792. MR1483213 10.1111/1467-9868.00095Richardson, S. and Green, P. J. (1997). “On Bayesian analysis of mixtures with an unknown number of components.” Journal of the Royal Statistical Society, Ser.B, 59: 731–792. MR1483213 10.1111/1467-9868.00095

33.

Rodríguez-Álvarez, M. X., Roca-Pardiñas, J., and Cadarso-Suárez, C. (2011a). “ROC curve and covariates: extending the induced methodology to the non-parametric framework.” Statistics and Computing, 21: 483–495. MR2826687 1221.62147 10.1007/s11222-010-9184-1Rodríguez-Álvarez, M. X., Roca-Pardiñas, J., and Cadarso-Suárez, C. (2011a). “ROC curve and covariates: extending the induced methodology to the non-parametric framework.” Statistics and Computing, 21: 483–495. MR2826687 1221.62147 10.1007/s11222-010-9184-1

34.

Rodríguez-Álvarez, M. X., Tahoces, P. C., Cadarso-Suárez, C., and Lado, M. J. (2011b). “Comparative study of ROC regression techniques—applications for the computer-aided diagnostic system in breast cancer detection.” Computational Statistics and Data Analysis, 55: 888–902. MR2736605Rodríguez-Álvarez, M. X., Tahoces, P. C., Cadarso-Suárez, C., and Lado, M. J. (2011b). “Comparative study of ROC regression techniques—applications for the computer-aided diagnostic system in breast cancer detection.” Computational Statistics and Data Analysis, 55: 888–902. MR2736605

35.

Sarwar, N., Gao, P., Seshasai, S. R., Gobin, R., Kaptoge, S., Di Angelantonio, E., Ingelsson, E., Lawlor, D. A., Selvin, E., Stampfer, M., Stehouwer, C. D., Lewington, S., Pennells, L., Thompson, A., Sattar, N., White, I. R., Ray, K. K., and Danesh, J. (2010). “Diabetes mellitus fasting blood glucose concentration and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies.” The Lancet, 375: 2215–2222.Sarwar, N., Gao, P., Seshasai, S. R., Gobin, R., Kaptoge, S., Di Angelantonio, E., Ingelsson, E., Lawlor, D. A., Selvin, E., Stampfer, M., Stehouwer, C. D., Lewington, S., Pennells, L., Thompson, A., Sattar, N., White, I. R., Ray, K. K., and Danesh, J. (2010). “Diabetes mellitus fasting blood glucose concentration and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies.” The Lancet, 375: 2215–2222.

36.

Sethuraman, J. (1994). “A constructive definition of Dirichlet priors.” Statistica Sinica, 2: 639–650. MR1309433Sethuraman, J. (1994). “A constructive definition of Dirichlet priors.” Statistica Sinica, 2: 639–650. MR1309433

37.

Smith, P. J. and Thompson, T. J. (1996). “Correcting for confounding in analyzing receiver operating characteristic curves.” Biometrical Journal, 7: 857–863.Smith, P. J. and Thompson, T. J. (1996). “Correcting for confounding in analyzing receiver operating characteristic curves.” Biometrical Journal, 7: 857–863.

38.

Wild., S., Roghic, G., Green, A., Sicree, R., and King, H. (2004). “Global prevalence of diabetes: estimates for 2000 and projection for 2030.” Diabetes Care, 27: 1047–1053.Wild., S., Roghic, G., Green, A., Sicree, R., and King, H. (2004). “Global prevalence of diabetes: estimates for 2000 and projection for 2030.” Diabetes Care, 27: 1047–1053.

39.

Xu, L., Hanson, T., Bedrick, E., and Restrepo, C. (2010). “Hypothesis tests on mixture model components with applications in ecology and agriculture.” Journal of Agricultural, Biological, and Environmental Statistics, 15: 308–326. MR2787261 1306.62365 10.1007/s13253-010-0020-zXu, L., Hanson, T., Bedrick, E., and Restrepo, C. (2010). “Hypothesis tests on mixture model components with applications in ecology and agriculture.” Journal of Agricultural, Biological, and Environmental Statistics, 15: 308–326. MR2787261 1306.62365 10.1007/s13253-010-0020-z

40.

Zhou, X. H. and Harezlak, J. (2002). “Comparison of bandwidth selection methods for kernel smoothing of ROC curves.” Statistics in Medicine, 21: 2045–2055.Zhou, X. H. and Harezlak, J. (2002). “Comparison of bandwidth selection methods for kernel smoothing of ROC curves.” Statistics in Medicine, 21: 2045–2055.

41.

Zou, K. H., Hall, W. J., and Shapiro, D. E. (1997). “Smooth nonparametric receiver operating characteristic (ROC) curves for continuous diagnostic tests.” Statistics in Medicine, 16: 2143–2156.Zou, K. H., Hall, W. J., and Shapiro, D. E. (1997). “Smooth nonparametric receiver operating characteristic (ROC) curves for continuous diagnostic tests.” Statistics in Medicine, 16: 2143–2156.
Copyright © 2013 International Society for Bayesian Analysis
Vanda Inácio de Carvalho, Alejandro Jara, Timothy E. Hanson, and Miguel de Carvalho "Bayesian Nonparametric ROC Regression Modeling," Bayesian Analysis 8(3), 623-646, (September 2013). https://doi.org/10.1214/13-BA825
Published: September 2013
Vol.8 • No. 3 • September 2013
Back to Top