Bayesian Analysis

Bayesian Nonparametric ROC Regression Modeling

Vanda Inácio de Carvalho, Alejandro Jara, Timothy E. Hanson, and Miguel de Carvalho

Full-text: Open access

Abstract

The receiver operating characteristic (ROC) curve is the most widely used measure for evaluating the discriminatory performance of a continuous biomarker. Incorporating covariates in the analysis can potentially enhance information gathered from the biomarker, as its discriminatory ability may depend on these. In this paper we propose a dependent Bayesian nonparametric model for conditional ROC estimation. Our model is based on dependent Dirichlet processes, where the covariate-dependent ROC curves are indirectly modeled using probability models for related probability distributions in the diseased and healthy groups. Our approach allows for the entire distribution in each group to change as a function of the covariates, provides exact posterior inference up to a Monte Carlo error, and can easily accommodate multiple continuous and categorical predictors. Simulation results suggest that, regarding the mean squared error, our approach performs better than its competitors for small sample sizes and nonlinear scenarios. The proposed model is applied to data concerning diagnosis of diabetes.

Article information

Source
Bayesian Anal., Volume 8, Number 3 (2013), 623-646.

Dates
First available in Project Euclid: 9 September 2013

Permanent link to this document
https://projecteuclid.org/euclid.ba/1378729922

Digital Object Identifier
doi:10.1214/13-BA825

Mathematical Reviews number (MathSciNet)
MR3102228

Zentralblatt MATH identifier
1329.62154

Keywords
Conditional area under the curve related probability distributions dependent Dirichlet process Markov chain Monte Carlo

Citation

Inácio de Carvalho, Vanda; Jara, Alejandro; E. Hanson, Timothy; de Carvalho, Miguel. Bayesian Nonparametric ROC Regression Modeling. Bayesian Anal. 8 (2013), no. 3, 623--646. doi:10.1214/13-BA825. https://projecteuclid.org/euclid.ba/1378729922


Export citation

References

  • Alonzo, T. A. and Pepe, M. S. (2002). “Distribution-free ROC analysis using binary regression techniques.” Biostatistics, 3: 421–432.
  • Barrientos, A. F., Jara, A., and Quintana, F. (2012). “On the support of MacEachern’s dependent Dirichlet processes and extensions.” Bayesian Analysis, 7: 277–310.
  • Blackwell, D. and MacQueen, J. (1973). “Ferguson distributions via Pólya urn schemes.” The Annals of Statistics, 1: 353–355.
  • Branscum, A. J., Johnson, W. O., Hanson, T. E., and Gardner, I. A. (2008). “Bayesian semiparametric ROC curve estimation and disease diagnosis.” Statistics in Medicine, 27: 2474–2496.
  • Cai, T. (2004). “Semiparametric ROC regression analysis with placement values.” Biostatistics, 5: 45–60.
  • De Iorio, M., Johnson, W. O., Müller, P., and Rosner, G. L. (2009). “Bayesian nonparametric non-proportional hazards survival modelling.” Biometrics, 65: 762–771.
  • De Iorio, M., Müller, P., Rosner, G. L., and MacEachern, S. N. (2004). “An ANOVA model for dependent random measures.” Journal of the American Statistical Association, 99: 205–215.
  • De la Cruz, R., Quintana, F. A., and Müller, P. (2007). “Semiparametric Bayesian classification with longitudinal markers.” Journal of the Royal Statistical Society, Ser. C, 56(2): 119–137.
  • Dubey, S. (1970). “Compound gamma, beta and F distributions.” Metrika, 16: 27–31.
  • Eilers, P. H. C. and Marx, B. D. (1996). “Flexible smoothing with B-splines and penalties.” Statistical Science, 11(2): 89–121.
  • Erkanli, A., Sung, M., Costello, E. J., and Angold, A. (2006). “Bayesian semiparametric ROC analysis.” Statistics in Medicine, 25: 3905–3928.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. London: Chapman & Hall.
  • Faraggi, D. (2003). “Adjusting receiver operating characteristic curves and related indices for covariates.” Journal of the Royal Statistical Society, Ser.D, 52: 1152–1174.
  • González-Manteiga, W., Pardo-Fernandéz, J. C., and Van Keilegom, I. (2011). “ROC curves in non-parametric location-scale regression models.” Scandinavian Journal of Statistics, 38: 169–184.
  • Hanson, T., Branscum, A., and Gardner, I. (2008a). “Multivariate mixtures of Polya trees for modelling ROC data.” Statistical Modelling, 8: 81–96.
  • Hanson, T., Kottas, A., and Branscum, A. J. (2008b). “Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian non-parametric approaches.” Journal of the Royal Statistical Society, Ser.C, 57: 207–225.
  • Hsieh, F. and Turnbull, B. (1996). “Nonparametric and semiparametric estimation of the receiver operating characteristic curve.” The Annals of Statistics, 24: 24–40.
  • Inácio, V., Turkman, A. A., Nakas, C. T., and Alonzo, T. A. (2011). “Nonparametric Bayesian estimation of the three-way receiver operating characteristic surface.” Biometrical Journal, 53: 1011–1024.
  • Jara, A. (2007). “Applied Bayesian non- and semi-parametric inference using DPpackage.” Rnews, 7: 17–26.
  • Jara, A., Hanson, T., Quintana, F., Müller, P., and Rosner, G. L. (2011). “DPpackage: Bayesian semi- and sonparametric modeling in R.” Journal of Statistical Software, 40: 1–30.
  • Jara, A., Lesaffre, E., De Iorio, M., and Quintana, F. A. (2010). “Bayesian semiparametric inference for multivariate doubly-interval-censored data.” Annals of Applied Statistics, 4: 2126–2149.
  • Lloyd, C. J. (1998). “Using smooth receiver operating characteristic curves to summarize and compare diagnostic systems.” Journal of the American Statistical Association, 93: 1356–1364.
  • MacEachern, S. N. (1994). “Estimating normal means with a conjugate style Dirichlet process prior.” Communications in Statistics: Simulation and Computation, 23: 727–741.
  • — (2000). “Dependent Dirichlet processes.” Technical report, Department of Statistics, The Ohio State University.
  • MacEachern, S. N. and Müller, P. (1998). “Estimating mixture of Dirichlet process models.” Journal of Computational and Graphical Statistics, 7: 223–338.
  • Muliere, P. and Tardella, L. (1998). “Approximating distributions of random functionals of Ferguson-Dirichlet priors.” The Canadian Journal of Statistics, 26: 283–297.
  • Neal, R. (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9: 249–265.
  • Peng, L. and Zhou, X. H. (2004). “Local linear smoothing of receiver operating characteristic (ROC) curves.” Journal of Statistical Planning and Inference, 118: 129–143.
  • Pepe, M. S. (1998). “Three approaches to regression analysis of receiver operating characteristic curves for continuous test results.” Biometrics, 54: 124–135.
  • — (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. New York: Oxford University Press.
  • R Development Core Team (2012). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.
  • Richardson, S. and Green, P. J. (1997). “On Bayesian analysis of mixtures with an unknown number of components.” Journal of the Royal Statistical Society, Ser.B, 59: 731–792.
  • Rodríguez-Álvarez, M. X., Roca-Pardiñas, J., and Cadarso-Suárez, C. (2011a). “ROC curve and covariates: extending the induced methodology to the non-parametric framework.” Statistics and Computing, 21: 483–495.
  • Rodríguez-Álvarez, M. X., Tahoces, P. C., Cadarso-Suárez, C., and Lado, M. J. (2011b). “Comparative study of ROC regression techniques—applications for the computer-aided diagnostic system in breast cancer detection.” Computational Statistics and Data Analysis, 55: 888–902.
  • Sarwar, N., Gao, P., Seshasai, S. R., Gobin, R., Kaptoge, S., Di Angelantonio, E., Ingelsson, E., Lawlor, D. A., Selvin, E., Stampfer, M., Stehouwer, C. D., Lewington, S., Pennells, L., Thompson, A., Sattar, N., White, I. R., Ray, K. K., and Danesh, J. (2010). “Diabetes mellitus fasting blood glucose concentration and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies.” The Lancet, 375: 2215–2222.
  • Sethuraman, J. (1994). “A constructive definition of Dirichlet priors.” Statistica Sinica, 2: 639–650.
  • Smith, P. J. and Thompson, T. J. (1996). “Correcting for confounding in analyzing receiver operating characteristic curves.” Biometrical Journal, 7: 857–863.
  • Wild., S., Roghic, G., Green, A., Sicree, R., and King, H. (2004). “Global prevalence of diabetes: estimates for 2000 and projection for 2030.” Diabetes Care, 27: 1047–1053.
  • Xu, L., Hanson, T., Bedrick, E., and Restrepo, C. (2010). “Hypothesis tests on mixture model components with applications in ecology and agriculture.” Journal of Agricultural, Biological, and Environmental Statistics, 15: 308–326.
  • Zhou, X. H. and Harezlak, J. (2002). “Comparison of bandwidth selection methods for kernel smoothing of ROC curves.” Statistics in Medicine, 21: 2045–2055.
  • Zou, K. H., Hall, W. J., and Shapiro, D. E. (1997). “Smooth nonparametric receiver operating characteristic (ROC) curves for continuous diagnostic tests.” Statistics in Medicine, 16: 2143–2156.