The Annals of Applied Statistics

Comparing healthcare utilization patterns via global differences in the endorsement of current procedural terminology codes

Xu Shi, Hristina Pashova, and Patrick J. Heagerty

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


The linkage of electronic medical records (EMR) across clinics, hospitals, and healthcare systems is opening new opportunities to evaluate factors associated with both individual treatment benefit and potential harm. For example, the FDA Sentinel initiative seeks to create a surveillance network with over 100 million patient lives (Behrman et al. [N. Engl. J. Med. 364 (2011) 498–499]), while PCORnet has created multiple networks that include linked electronic medical records from geographic regions such as entire cities or states, with the ultimate goal of facilitating comparative effectiveness research (Collins et al. [Journal of the American Medical Informatics Association 4 (2014) 576–577]). However, one key challenge to the use of electronically assembled cohorts is the potential for variation in both the choice of specific healthcare procedures and coding practices due to differences in patient populations and/or financial incentives within care delivery networks. In order to explore variation in patient care or procedure coding, we review and develop statistical methods that can permit testing and estimation of subgroup differences in code assignments. We focus on Current Procedural Terminology (CPT) codes which are used in a standardized fashion to capture patient treatment details and to record medical histories, but the methods we develop can be used for any structured EMR data. We specifically study testing procedures that can be valid for comparing both rare and common counts as routinely encountered with medical procedure codes, and we transfer methods from studies of genetic association. Hierarchical structure in terms of both thematically grouped medical codes and provider-level clustering adds unique complexity to the analysis of EMR data. We detail penalized regression methods unifying estimation and inference to leverage the hierarchical structure and stabilize rate ratio estimates for rare procedures. We also expand inference methods to account for potential within provider correlation of patient utilization. We illustrate methods comparing the endorsement of CPT codes for subjects enrolled in a back pain cohort study where interest is in the differences across recruitment centers in the use of CPT codes (Jarvik [BMC Musculoskelet Disord. 13 (2012)]).

Article information

Ann. Appl. Stat. Volume 11, Number 3 (2017), 1349-1374.

Received: November 2016
Revised: January 2017
First available in Project Euclid: 5 October 2017

Permanent link to this document

Digital Object Identifier

Electronic medical records hierarchical structure dynamic graphics


Shi, Xu; Pashova, Hristina; Heagerty, Patrick J. Comparing healthcare utilization patterns via global differences in the endorsement of current procedural terminology codes. Ann. Appl. Stat. 11 (2017), no. 3, 1349--1374. doi:10.1214/17-AOAS1028.

Export citation


  • Basu, S. and Pan, W. (2011). Comparison of statistical tests for disease association with rare variants. Genetic Epidemiology 35 606–619.
  • Behrman, R. E., Benner, J. S., Brown, J. S., McClellan, M., Woodcock, J. and Platt, R. (2011). Developing the Sentinel System—A national resource for evidence development. N. Engl. J. Med. 364 498–499.
  • Bentley, P. N., Wilson, A. G., Derwin, M. E., Scodellaro, R. and Jackson, R. E. (2002). Reliability of assigning correct current procedural terminology—4 E/M codes. Ann. Emerg. Med. 40 269–274.
  • Bühlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19 1212–1242.
  • Bull, S. B. (1998). Regression models for multiple outcomes in large epidemiologic studies. Stat. Med. 17 2179–2197.
  • Chapman, J. and Whittaker, J. (2008). Analysis of multiple SNPs in a candidate gene or region. Genetic Epidemiology 32 560–566.
  • Collins, F. S., Hudson, K. L., Briggs, J. P. and Lauer, M. S. (2014). PCORnet: Turning a dream into reality. Journal of the American Medical Informatics Association 4 576–577.
  • Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford.
  • Hoerl, A. and Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55–67.
  • Holt, J., Warsy, A. and Wright, P. (2010). Medical decision making: Guide to improved CPT coding. South. Med. J. 103 316–322.
  • Jarvik, J. G. (2012). Study protocol: The back pain outcomes using longitudinal data (BOLD) registry. BMC Musculoskelet Disord. 13 64.
  • King, M. S., Lipsky, M. S. and Sharp, L. (2002). Expert agreement in Current Procedural Terminology evaluation and management coding. Arch. Intern. Med. 162 316–320.
  • King, M. S., Sharp, L. and Lipsky, M. S. (2001). Accuracy of CPT evaluation and management coding by family physicians. J. Am. Board Fam. Pract. 14 184–192.
  • Lee, S., Emond, M. J., Bashed, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., NHLBI GO Exome Sequencing Project—ESP Lung Project Team, Christiani, D. C., Wurzel, M. M. and Lin, X. (2012). Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91 224–237.
  • Madsen, B. E. and Browning, S. R. (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5 e1000384.
  • Miglioretti, D. L. and Heagerty, P. J. (2004). Marginal modeling of multilevel binary data with time-varying covariates. Biostatistics 5 381–398.
  • Miglioretti, D. L. and Heagerty, P. J. (2007). Marginal modeling of nonnested multilevel data using standard software. Am. J. Epidemiol. 165 453–463.
  • Morgenthaler, S. and Thilly, W. G. (2007). A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: A cohort allelic sums test (CAST). Mutat. Res. 615 28–56.
  • Morris, A. P. and Zeggini, E. (2010). An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genetic Epidemiology 34 188–193.
  • Pan, W. (2009). Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genetic Epidemiology 33 487–507.
  • Przyborowski, J. and Wilenski, H. (1940). Homogeneity of results in testing samples from Poisson series with an application to testing clover seed for dodder. Biometrika 31 313–323.
  • Qi, Y., Weeks, D. E., Tiwari, H. K., Yi, N., Zhang, K., Gao, G., Lin, W., Lou, X., Chen, W. and Liu, W. (2015). Rare-variant kernel machine test for longitudinal data from population and family samples. Hum. Hered. 80 126–138.
  • Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010). EdgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 139–140.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Statist. Assoc. 79 515–524.
  • Shi, X., Pashova, H. and Heagerty, P. J. (2017). Supplement to “Comparing healthcare utilization patterns via global differences in the endorsement of current procedural terminology codes.” DOI:10.1214/17-AOAS1028SUPPA, DOI:10.1214/17-AOAS1028SUPPB, DOI:10.1214/17-AOAS1028SUPPC, DOI:10.1214/17-AOAS1028SUPPD.
  • Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89 82–93.

Supplemental materials

  • Supplement A: Comprehensive discussion on code-wise two-sample testing options. We provide detailed review of testing strategies that are candidates for the evaluation of variation in code endorsement rates across cohorts.
  • Supplement B: Proof of Lemma 3.1. We provide a proof of Lemma 3.1.
  • Supplement C: Comprehensive review of simulation results comparing group-wise association tests. We provide a review of relevant results in previous research comparing group-wise association tests.
  • Supplement D: Comprehensive plots of type I error and power. We provide additional supporting plots that show the type I error and power of all tests with equal/unequal sample sizes using generated data of independent observations or under provider-level clustering.