Open Access
March 2019 Capturing heterogeneity of covariate effects in hidden subpopulations in the presence of censoring and large number of covariates
Farhad Shokoohi, Abbas Khalili, Masoud Asgharian, Shili Lin
Ann. Appl. Stat. 13(1): 444-465 (March 2019). DOI: 10.1214/18-AOAS1198

Abstract

The advent of modern technology has led to a surge of high-dimensional data in biology and health sciences such as genomics, epigenomics and medicine. The high-grade serous ovarian cancer (HGS-OvCa) data reported by The Cancer Genome Atlas (TCGA) Research Network is one example. The TCGA and other research groups have analyzed several aspects of these data. Here we study the relationship between Disease Free Time (DFT) after surgery among ovarian cancer patients and their DNA methylation profiles of genomic features. Such studies pose additional challenges beyond the typical big data problem due to population substructure and censoring. Despite the availability of several methods for analyzing time-to-event data with a large number of covariates but a small sample size, there is no method available to date that accommodates the additional feature of heterogeneity. To this end, we propose a regularized framework based on the finite mixture of accelerated failure time model to capture intangible heterogeneity due to population substructure and to account for censoring simultaneously. We study the properties of the proposed framework both theoretically and numerically. Our data analysis indicates the existence of heterogeneity in the HGS-OvCa data, with one component of the mixture capturing a more aggressive form of the disease, and the second component capturing a less aggressive form. In particular, the second component portrays a significant positive relationship between methylation and DFT for BRCA1. By further unearthing the negative relationship between expression and methylation for this gene, one may provide a biologically reasonable explanation that sheds light on the relationship between DNA methylation, gene expression and mutation.

Citation

Download Citation

Farhad Shokoohi. Abbas Khalili. Masoud Asgharian. Shili Lin. "Capturing heterogeneity of covariate effects in hidden subpopulations in the presence of censoring and large number of covariates." Ann. Appl. Stat. 13 (1) 444 - 465, March 2019. https://doi.org/10.1214/18-AOAS1198

Information

Received: 1 May 2017; Revised: 1 March 2018; Published: March 2019
First available in Project Euclid: 10 April 2019

zbMATH: 07057435
MathSciNet: MR3937436
Digital Object Identifier: 10.1214/18-AOAS1198

Keywords: DNA methylation , finite mixture of AFT model , ovarian cancer , penalized regression , right censoring

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.13 • No. 1 • March 2019
Back to Top