The Annals of Applied Statistics

Logistic regression analysis with standardized markers

Ying Huang, Margaret S. Pepe, and Ziding Feng

Full-text: Open access

Abstract

Two different approaches to analysis of data from diagnostic biomarker studies are commonly employed. Logistic regression is used to fit models for probability of disease given marker values, while ROC curves and risk distributions are used to evaluate classification performance. In this paper we present a method that simultaneously accomplishes both tasks. The key step is to standardize markers relative to the nondiseased population before including them in the logistic regression model. Among the advantages of this method are the following: (i) ensuring that results from regression and performance assessments are consistent with each other; (ii) allowing covariate adjustment and covariate effects on ROC curves to be handled in a familiar way, and (iii) providing a mechanism to incorporate important assumptions about structure in the ROC curve into the fitted risk model. We develop the method in detail for the problem of combining biomarker data sets derived from multiple studies, populations or biomarker measurement platforms, when ROC curves are similar across data sources. The methods are applicable to both cohort and case–control sampling designs. The data set motivating this application concerns Prostate Cancer Antigen 3 (PCA3) for diagnosis of prostate cancer in patients with or without previous negative biopsy where the ROC curves for PCA3 are found to be the same in the two populations. The estimated constrained maximum likelihood and empirical likelihood estimators are derived. The estimators are compared in simulation studies and the methods are illustrated with the PCA3 data set.

Article information

Source
Ann. Appl. Stat., Volume 7, Number 3 (2013), 1640-1662.

Dates
First available in Project Euclid: 3 October 2013

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1380804810

Digital Object Identifier
doi:10.1214/13-AOAS634

Mathematical Reviews number (MathSciNet)
MR3127962

Zentralblatt MATH identifier
06237191

Keywords
Constrained likelihood empirical likelihood logistic regression predictiveness curve ROC curve

Citation

Huang, Ying; Pepe, Margaret S.; Feng, Ziding. Logistic regression analysis with standardized markers. Ann. Appl. Stat. 7 (2013), no. 3, 1640--1662. doi:10.1214/13-AOAS634. https://projecteuclid.org/euclid.aoas/1380804810


Export citation

References

  • Alonzo, T. A. and Pepe, M. S. (2002). Distribution-free ROC analysis using binary regression techniques. Biostatistics 3 421–432.
  • Bura, E. and Gastwirth, J. L. (2001). The binary regression quantile plot: Assessing the importance of predictors in binary regression visually. Biom. J. 43 5–21.
  • Cai, T. and Zheng, Y. (2007). Model checking for ROC regression analysis. Biometrics 63 152–163, 312–313.
  • Campbell, G. and Ratnaparkhi, M. V. (1993). An application of Lomax distributions in receiver operating characteristic (ROC) curve analysis. Communications in Statistics 22 1681–1697.
  • Deras, I. L., Aubin, S. M. J., Blase, A., Day, J. R., Koo, S., Partin, A. W., Ellis, W. J., Marks, L. S., Fradet, Y., Rittenhouse, H. and Groskopf, J. (2008). PCA3: A molecular urine assay for predicting prostate biopsy outcome. J. Urol. 179 1587–1592.
  • Dodd, L. E. and Pepe, M. S. (2003). Semiparametric regression for the area under the receiver operating characteristic curve. J. Amer. Statist. Assoc. 98 409–417.
  • Dorfman, D. D., Berbaum, K. S., Metz, C. E., Length, R. V., Hanley, J. A. and Dagga, H. A. (1996). Proper receiver operating characteristic analysis: The bigamma model. Academic Radiology 4 138–149.
  • Egan, J. P. (1975). Signal Detection Theory and ROC Analysis. Academic Press, New York.
  • Frischancho, A. R. (1990). Anthropometric Standards for the Assessment of Growth and Nutritional Status. Univ. Michigan Press, Ann Arbor.
  • Gu, W. and Pepe, M. S. (2010). Estimating the diagnostic likelihood ratio of a continuous marker. Biostatistics 12 87–101.
  • Hanley, J. A. and Hajian-Tilaki, K. O. (1997). Sampling variability of nonparametric estimate of the areas under receiver operating characteristic curves: An update. Academic Radiology 4 49–58.
  • Hosmer, D. W. and Lemeshow, S. (1980). Goodness of fit tests for the multiple logistic regression model. Comm. Statist. Theory Methods 9 1043–1069.
  • Huang, Y. (2007). Evaluating the predictiveness of continuous biomarkers. Ph.D. thesis, Univ. Washington.
  • Huang, Y., Pepe, M. S. and Feng, Z. (2007). Evaluating the predictiveness of a continuous marker. Biometrics 63 1181–1188, 1313.
  • Huang, Y. and Pepe, M. S. (2009a). Biomarker evaluation using the controls as a reference population. Biostatistics 10 228–244.
  • Huang, Y. and Pepe, M. S. (2009b). A parametric ROC model-based approach for evaluating the predictiveness of continuous markers in case-control studies. Biometrics 65 1133–1144.
  • Huang, Y. and Pepe, M. S. (2009c). Semiparametric methods for evaluating risk prediction markers in case-control studies. Biometrika 96 991–997.
  • Huang, Y. and Pepe, M. S. (2010a). Semiparametric methods for evaluating the covariate-specific predictiveness of continuous markers in matched case-control studies. J. R. Stat. Soc. Ser. C. Appl. Stat. 59 437–456.
  • Huang, Y. and Pepe, M. S. (2010b). Assessing risk prediction models in case-control studies using semiparametric and nonparametric methods. Stat. Med. 29 1391–1410.
  • Huang, Y., Pepe, M. S. and Feng, Z. (2013). Supplement to “Logistic regression analysis with standardized markers.” DOI:10.1214/13-AOAS634SUPP.
  • Janes, H. and Pepe, M. S. (2008). Adjusting for covariates in studies of diagnostic, screening, or prognostic markers: An old concept in a new setting. Am. J. Epidemiol. 168 89–97.
  • Janes, H. and Pepe, M. S. (2009). Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika 96 371–382.
  • Metz, C. E. and Pan, X. (1999). “Proper” binormal ROC curves: Theory and maximum-likelihood estimation. J. Math. Psych. 43 1–33.
  • Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series 28. Oxford Univ. Press, Oxford.
  • Pepe, M. S. and Cai, T. (2004). The analysis of placement values for evaluating discriminatory measures. Biometrics 60 528–535.
  • Pepe, M. S., Etzioni, R., Feng, Z., Potter, J. D., Thompson, M. L., Thornquist, M., Winget, M. and Yasui, Y. (2001). Phases of biomarker development for early detection of cancer. J. Natl. Cancer Inst. 93 1054–1061.
  • Pepe, M. S., Feng, Z., Huang, Y., Longton, G. M., Prentice, R., Thompson, I. M. and Zheng, Y. (2008). Integrating the predictiveness of a marker with its performance as a classifier. American Journal of Epidemiology 167 362–368.
  • Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. Ann. Statist. 22 300–325.
  • Qin, J. and Zhang, B. (1997). A goodness-of-fit test for logistic regression models based on case-control data. Biometrika 84 609–618.
  • Qin, J. and Zhang, B. (2003). Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika 90 585–596.

Supplemental materials

  • Supplementary material: Supplementary Appendix. Supplement: Proof of Theorem 1, a simulated example referred to in Section 2.1, steps of the construction of a concave ROC curve based on the pseudoempirical likelihood estimators, and additional simulation results. The supplementary material would be provided at this location.