Electronic Journal of Statistics

Estimation of a non-parametric variable importance measure of a continuous exposure

Antoine Chambaz, Pierre Neuvial, and Mark J. van der Laan

Full-text: Open access


We define a new measure of variable importance of an exposure on a continuous outcome, accounting for potential confounders. The exposure features a reference level $x_{0}$ with positive mass and a continuum of other levels. For the purpose of estimating it, we fully develop the semi-parametric estimation methodology called targeted minimum loss estimation methodology (TMLE) [23,22]. We cover the whole spectrum of its theoretical study (convergence of the iterative procedure which is at the core of the TMLE methodology; consistency and asymptotic normality of the estimator), practical implementation, simulation study and application to a genomic example that originally motivated this article. In the latter, the exposure $X$ and response $Y$ are, respectively, the DNA copy number and expression level of a given gene in a cancer cell. Here, the reference level is $x_{0}=2$, that is the expected DNA copy number in a normal cell. The confounder is a measure of the methylation of the gene. The fact that there is no clear biological indication that $X$ and $Y$ can be interpreted as an exposure and a response, respectively, is not problematic.

Article information

Electron. J. Statist., Volume 6 (2012), 1059-1099.

First available in Project Euclid: 22 June 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation 62G20: Asymptotic properties 62G35: Robustness 62P10: Applications to biology and medical sciences

Variable importance measure non-parametric estimation targeted minimum loss estimation robustness asymptotics


Chambaz, Antoine; Neuvial, Pierre; van der Laan, Mark J. Estimation of a non-parametric variable importance measure of a continuous exposure. Electron. J. Statist. 6 (2012), 1059--1099. doi:10.1214/12-EJS703. https://projecteuclid.org/euclid.ejs/1340369355

Export citation


  • [1] J. Andrews, W. Kennette, J. Pilon, A. Hodgson, A. B. Tuck, A. F. Chambers, and D. I. Rodenhiser. Multi-platform whole-genome microarray analyses refine the epigenetic signature of breast cancer metastasis with gene expression and copy number., PLoS ONE, 5(1): e8665, 01 2010.
  • [2] F. S. Collins and A. D. Barker. Mapping the cancer genome., Scientific American, 296(3):50–57, Mar 2007.
  • [3] E. Dimitriadou, K. Hornik, F. Leisch, D. Meyer, and A. Weingessel., Misc Functions of the Department of Statistics (e1071), TU Wien, 2011. URL http://cran.r-project.org/web/packages/e1071/index.html. R package version 1.6.
  • [4] T. Hastie., Generalized additive models, 2011. URL http://cran.r-project.org/web/packages/gam/index.html. R package version 1.04.1.
  • [5] P. A. Jones and S. B. Baylin. The epigenomics of cancer., Cell, 128(4):683–692, Feb 2007.
  • [6] C. Kooperberg., Polynomial spline routines, 2010. URL http://cran.r-project.org/web/packages/polspline/index.html. R package version 1.1.5.
  • [7] C. L. Lawson and R. J. Hanson., Solving least squares problems, volume 15. Society for Industrial Mathematics, 1995.
  • [8] L. M. Le Cam., Théorie asymptotique de la décision statistique. Séminaire de Mathématiques Supérieures, No. 33 (Été, 1968). Les Presses de l’Université de Montréal, Montreal, Que., 1969.
  • [9] A. Liaw and M. Wiener. Classification and regression by randomforest., R News, 2(3):18–22, 2002. URL http://CRAN.R-project.org/doc/Rnews/.
  • [10] R. Louhimo and S. Hautaniemi. CNAmet: an R package for integrating copy number, methylation and expression data., Bioinformatics, 27(6):887, 2011.
  • [11] J. R. Pollack, T. Sørlie, C. M. Perou, C. A. Rees, S. S. Jeffrey, P. E. Lonning, R. Tibshirani, D. Botstein, A.-L. Børresen-Dale, and P. O Brown. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors., Proc Natl Acad Sci U S A, 99(20): 12963–12968, Oct 2002.
  • [12] E. Polley and M. J. van der Laan., SuperLearner, 2011. URL http://CRAN.R-project.org/package=SuperLearner. R package version 2.0-4.
  • [13] R Development Core Team., R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2010. URL http://www.R-project.org. ISBN 3-900051-07-0.
  • [14] J. M. Robins and A. Rotnitzky. Comment on Inference for semiparametric models: some questions and an answer, by Bickel, P. J. and Kwon, J., Statistica Sinica, 11:920–935, 2001.
  • [15] J. M. Robins, S. D. Mark, and W. K. Newey. Estimating exposure effects by modelling the expectation of exposure conditional on confounders., Biometrics, 48(2):479–495, 1992.
  • [16] T. P. Speed. From expression profiling to putative master regulators. UC Berkeley Statistics and Genomics Seminar, February 5th, 2009.
  • [17] Z. Sun, Y. W. Asmann, K. R. Kalari, B. Bot, J. E. Eckel-Passow, T. R. Baker, J. M. Carr, I. Khrebtukova, S. Luo, L. Zhang, et al. Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast cancer cells by deep sequencing., PLoS One, 6(2): e17490, 2011.
  • [18] The Cancer Genome Atlas (TGCA) research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways., Nature, 455 :1061–1068, 2008.
  • [19] The Cancer Genome Atlas (TGCA) research Network. Integrated genomic analyses of ovarian carcinoma., Nature, 474 (7353):609–615, 2011.
  • [20] C. Tuglus and M. J. van der Laan., Targeted Learning: Causal Inference for Observational and Experimental Data, chapter Targeted methods for biomarker discovery. Springer Verlag, 2011.
  • [21] M. J. van der Laan. Statistical inference for variable importance., Int. J. Biostat., 2:Article 2, 2006.
  • [22] M. J. van der Laan and S. Rose., Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Verlag, 2011.
  • [23] M. J. van der Laan and D. Rubin. Targeted maximum likelihood learning., Int. J. Biostat., 2:Article 11, 2006.
  • [24] M. J. van der Laan, E. C. Polley, and A. E. Hubbard. Super learner., Stat. Appl. Genet. Mol. Biol., 6:Article 25, 2007.
  • [25] A. W. van der Vaart., Asymptotic statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 1998.
  • [26] W. N. van Wieringen and M. A. van de Wiel. Nonparametric testing for DNA copy number induced differential mRNA gene expression., Biometrics, 5(1):19–29, March 2008.
  • [27] X. V. Wang, R. G. W. Verhaak, E. Purdom, P. T. Spellman, and T. P. Speed. Unifying gene expression measures from multiple platforms using factor analysis., PloS one, 6(3): e17691, 2011.
  • [28] Z. Yu and M. J. van der Laan. Measuring treatment effects using semiparametric models. Technical report, Division of Biostatistics, University of California, Berkeley, 2003.