Electronic Journal of Statistics

Dimension reduction and estimation in the secondary analysis of case-control studies

Liang Liang, Raymond Carroll, and Yanyuan Ma

Full-text: Open access

Abstract

Studying the relationship between covariates based on retrospective data is the main purpose of secondary analysis, an area of increasing interest. We examine the secondary analysis problem when multiple covariates are available, while only a regression mean model is specified. Despite the completely parametric modeling of the regression mean function, the case-control nature of the data requires special treatment and semiparametric efficient estimation generates various nonparametric estimation problems with multivariate covariates. We devise a dimension reduction approach that fits with the specified primary and secondary models in the original problem setting, and use reweighting to adjust for the case-control nature of the data, even when the disease rate in the source population is unknown. The resulting estimator is both locally efficient and robust against the misspecification of the regression error distribution, which can be heteroscedastic as well as non-Gaussian. We demonstrate the advantage of our method over several existing methods, both analytically and numerically.

Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 1782-1821.

Dates
Received: February 2017
First available in Project Euclid: 12 June 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1528769120

Digital Object Identifier
doi:10.1214/18-EJS1446

Mathematical Reviews number (MathSciNet)
MR3813597

Zentralblatt MATH identifier
06886385

Subjects
Primary: 62G05: Estimation

Keywords
Biased samples case-control study dimension reduction heteroscedastic error secondary analysis semiparametric estimation

Rights
Creative Commons Attribution 4.0 International License.

Citation

Liang, Liang; Carroll, Raymond; Ma, Yanyuan. Dimension reduction and estimation in the secondary analysis of case-control studies. Electron. J. Statist. 12 (2018), no. 1, 1782--1821. doi:10.1214/18-EJS1446. https://projecteuclid.org/euclid.ejs/1528769120


Export citation

References

  • Chatterjee, N. and Carroll, R. J. (2005). Semiparametric maximum likelihood estimation in case-control studies of gene-environment interactions., Biometrika, 92, 399–418.
  • Chatterjee, N., Chen, Y.-H., Luo, S., and Carroll, R. J. (2009). Analysis of case-control association studies: SNPs, imputation and haplotypes., Statistical Science, 24, 489–502.
  • Chen, J., Ayyagari, R., Chatterjee, N., Pee, D. Y., Schairer, C., Byrne, C., Benichou, J., and Gail, M. H. (2008). Breast cancer relative hazard estimates from case–control and cohort designs with missing data on mammographic density., Journal of the American Statistical Association, 103, 976–988.
  • Chen, J., Pee, D., Ayyagari, R., Graubard, B., Schairer, C., Byrne, C., Benichou, J., and Gail, M. H. (2006). Projecting absolute invasive breast cancer risk in white women with a model that includes mammographic density., Journal of the National Cancer Institute, 98, 1215–1226.
  • Chen, Y. H., Chatterjee, N., and Carroll, R. J. (2008). Retrospective analysis of haplotype-based case-control studies under a flexible model for gene-environment association., Biostatistics, 9, 81–99.
  • Chen, Y. H., Chatterjee, N., and Carroll, R. J. (2009). Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies., Journal of the American Statistical Association, 104, 220–233.
  • Cook, R. D. (1994). On the interpretation of regression plots., Journal of the American Statistical Association, 89, 177–189.
  • Cook, R. D. (2009)., Regression Graphics: Ideas for Studying Regressions Through Graphics, volume 482. John Wiley & Sons.
  • Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression., Annals of Statistics, pages 455–474.
  • Cook, R. D. and Setodji, C. M. (2003). A model-free test for reduced rank in multivariate regression., Journal of the American Statistical Association, 98, 340–351.
  • Dong, Y. and Li, B. (2010). Dimension reduction for non-elliptically distributed predictors: second-order methods., Biometrika, 97, 279–294.
  • Jiang, Y., Scott, A. J., and Wild, C. J. (2006). Secondary analysis of case-control data., Statistics in Medicine, 25, 1323–1339.
  • Li, B. and Dong, Y. (2009). Dimension reduction for nonelliptically distributed predictors., Annals of Statistics, pages 1272–1298.
  • Li, B. and Wang, S. (2007). On directional regression for dimension reduction., Journal of the American Statistical Association, 102, 997–1008.
  • Li, B., Wen, S., and Zhu, L. (2008). On a projective resampling method for dimension reduction with multivariate responses., Journal of the American Statistical Association, 103, 1177–1186.
  • Li, B., Zha, H., and Chiaromonte, F. (2005). Contour regression: a general approach to dimension reduction., Annals of Statistics, pages 1580–1616.
  • Li, H., Gail, M. H., Berndt, S., and Chatterjee, N. (2010). Using cases to strengthen inference on the association between single nucleotide polymorphisms and a secondary phenotype in genome-wide association studies., Genetic Epidemiology, 34, 427–433.
  • Li, K.-C. (1991). Sliced inverse regression for dimension reduction., Journal of the American Statistical Association, 86, 316–327.
  • Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma., Journal of the American Statistical Association, 87, 1025–1039.
  • Li, K.-C. and Duan, N. (1989). Regression analysis under link violation., Annals of Statistics, 17, 1009–1052.
  • Lian, H., Liang, H., and Carroll, R. J. (2015). Variance function partially linear single-index models., Journal of the Royal Statistical Society: Series B, 77, 171–194.
  • Lin, D. Y. and Zeng, D. (2009). Proper analysis of secondary phenotype data in case-control association studies., Genetic Epidemiology, 33, 256–265.
  • Ma, Y. (2010). A semiparametric efficient estimator in case-control studies., Bernoulli, 16, 585–603.
  • Ma, Y. and Carroll, R. J. (2016). Semiparametric estimation in the secondary analysis of case–control studies., Journal of the Royal Statistical Society, Series B, 78, 127–151.
  • Ma, Y. and Zhu, L. (2012a). Efficiency loss caused by linearity condition in dimension reduction., Biometrika, 99, 1–13.
  • Ma, Y. and Zhu, L. (2012b). A semiparametric approach to dimension reduction., Journal of the American Statistical Association, 107, 168–179.
  • Ma, Y. and Zhu, L. (2013a). Efficient estimation in sufficient dimension reduction., Annals of Statistics, 41, 250–268.
  • Ma, Y. and Zhu, L. (2013b). A review on dimension reduction., International Statistical Review, 81, 134–150.
  • Ma, Y. and Zhu, L. P. (2013c). Efficient estimation in sufficient dimension reduction., Annals of Statistics, 41, 250–268.
  • Piegorsch, W. W., Weinberg, C. R., and Taylor, J. A. (1994). Non-hierarchical logistic models and case-only designs for assessing susceptibility in population based case-control studies., Statistics in Medicine, 13, 153–162.
  • Prentice, R. L. and Pyke, R. (1979). Logistic disease incidence models and case-control studies., Biometrika, 66, 403–411.
  • Setodji, C. M. and Cook, R. D. (2004). K-means inverse regression., Technometrics, 46, 421–429.
  • Tchetgen, E. J. T. (2014). A general regression framework for a secondary outcome in case–control studies., Biostatistics, 15, 117–128.
  • Wei, J., Carroll, R. J., Müller, U. U., Van Keilegom, I., and Chatterjee, N. (2013). Robust estimation for homoscedastic regression in the secondary analysis of case–control data., Journal of the Royal Statistical Society, Series B, 75, 185–206.
  • Xia, Y. (2007). A constructive approach to the estimation of dimension reduction directions., Annals of Statistics, pages 2654–2690.
  • Yin, X. and Bura, E. (2006). Moment-based dimension reduction for multivariate response regression., Journal of Statistical Planning and Inference, 136, 3675–3688.
  • Yin, X. and Cook, R. D. (2002). Dimension reduction for the conditional kth moment in regression., Journal of the Royal Statistical Society: Series B, 64, 159–175.
  • Zhu, L., Wang, T., Zhu, L., and Ferré, L. (2010). Sufficient dimension reduction through discretization-expectation estimation., Biometrika, 97, 295–304.