Electronic Journal of Statistics

The function-on-scalar LASSO with applications to longitudinal GWAS

Rina Foygel Barber, Matthew Reimherr, and Thomas Schill

Full-text: Open access

Abstract

We present a new methodology for simultaneous variable selection and parameter estimation in function-on-scalar regression with an ultra-high dimensional predictor vector. We extend the LASSO to functional data in both the dense functional setting and the sparse functional setting. We provide theoretical guarantees which allow for an exponential number of predictor variables. Simulations are carried out which illustrate the methodology and compare the sparse/functional methods. Using the Framingham Heart Study, we demonstrate how our tools can be used in genome-wide association studies, finding a number of genetic mutations which affect blood pressure and are therefore important for cardiovascular health.

Article information

Source
Electron. J. Statist., Volume 11, Number 1 (2017), 1351-1389.

Dates
Received: April 2016
First available in Project Euclid: 19 April 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1492567399

Digital Object Identifier
doi:10.1214/17-EJS1260

Mathematical Reviews number (MathSciNet)
MR3635916

Zentralblatt MATH identifier
1362.62084

Subjects
Primary: 62G08: Nonparametric regression 62G20: Asymptotic properties
Secondary: 62J07: Ridge regression; shrinkage estimators

Keywords
Functional data analysis high-dimensional regression variable selection functional regression

Rights
Creative Commons Attribution 4.0 International License.

Citation

Barber, Rina Foygel; Reimherr, Matthew; Schill, Thomas. The function-on-scalar LASSO with applications to longitudinal GWAS. Electron. J. Statist. 11 (2017), no. 1, 1351--1389. doi:10.1214/17-EJS1260. https://projecteuclid.org/euclid.ejs/1492567399


Export citation

References

  • [1] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proof of the restricted isometry property for random matrices., Constructive Approximation, 28(3):253–263, 2008.
  • [2] P. Bickel, Y. Ritov, and A. Tsybakov. Simultaneous analysis of lasso and dantzig selector., The Annals of Statistics, 37(4) :1705–1732, 2009.
  • [3] L. J. Bierut, P. A. Madden, N. Breslau, E. O. Johnson, D. Hatsukami, O. F. Pomerleau, G. E. Swan, J. Rutter, S. Bertelsen, and L. Fox. Novel genes identified in a high-density genome wide association study for nicotine dependence., Human molecular genetics, 16(1):24–35, 2007.
  • [4] P. Bühlmann and S. Van de Geer., Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, 2011.
  • [5] H. Cardot, F. Ferraty, and P. Sarda. Spline estimators for the functional linear model., Statistica Sinica, 13:571–591, 2003.
  • [6] G. Chen and D. Levy. Contributions of the framingham heart study to the epidemiology of coronary heart disease., JAMA Cardiology, 2016.
  • [7] Y. Chen, J. Goldsmith, and R. T. Ogden. Variable selection in function-on-scalar regression., Stat, 2016.
  • [8] J. Fan and J. Zhang. Two-step estimation of functional linear models with applications to longitudinal data., Journal of the Royal Statistical Society (B), 62:303–322, 2000.
  • [9] Y. Fan, N. Foutz, G. M. James, W. Jank, et al. Functional response additive model estimation with online virtual stock markets., The Annals of Applied Statistics, 8(4) :2435–2460, 2014.
  • [10] Y. Fan, G. James, and P. Radchenko. Functional Additive Regression., Annals of Statistics, 00(000) :0000–0000, 2015+.
  • [11] J. Gertheiss, A. Maity, and A. Staicu. Variable selection in generalized functional linear models., Stat, 2(1):86–101, 2013.
  • [12] T. Hastie, R. Tibshirani, and J. Friedman., Elements of Statistical Learning. Springer, 2001.
  • [13] D. Hoover, J. Rice, C. Wu, and L.-P. Yang. Nonparametric smoothing estimates of time–varying coefficient models with longitudinal data., Biometrika, 85(4):809–822, 1998.
  • [14] L. Horváth and P. Kokoszka., Inference for Functional Data with Applications. Springer, New York, 2012.
  • [15] D. Hsu, S. Kakade, and T. Zhang. A tail inequality for quadratic forms of subgaussian random vectors. Technical report, Preprint, 2011.
  • [16] G. James, T. Hastie, D. Witten, and R. Tibshirani., An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer London, Limited, 2013. ISBN 9781461471370. URL http://books.google.com/books?id=at1bmAEACAAJ.
  • [17] G. M. James, J. Wang, and J. Zhu. Functional linear regression that’s interpretable., The Annals of Statistics, 37 :2083–2595, 2009.
  • [18] S. Kathiresan, A. K. Manning, S. Demissie, R. B. D’agostino, A. Surti, C. Guiducci, L. Gianniny, N. P. Burtt, O. Melander, and M. Orho-Melander. A genome-wide association study for blood lipid phenotypes in the framingham heart study., BMC medical genetics, 8(1):1, 2007.
  • [19] P. Kokoszka, I. Maslova, J. Sojka, and L. Zhu. Testing for lack of dependence in the functional linear model., Canadian Journal of Statistics, 36:207–222, 2008.
  • [20] D. Kong, K. Xue, F. Yao, and H. H. Zhang. Partially functional linear regression in high dimensions., Biometrika, page asv062, 2016.
  • [21] M. G. Larson, L. D. Atwood, E. J. Benjamin, L. A. Cupples, R. B. D’Agostino, C. S. Fox, D. R. Govindaraju, C.-Y. Guo, N. L. Heard-Costa, and S.-J. Hwang. Framingham heart study 100k project: genome-wide associations for cardiovascular disease outcomes., BMC medical genetics, 8(1):1, 2007.
  • [22] B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection., Annals of Statistics, 28 :1302–1338, 2000.
  • [23] Y. Li and T. Hsing. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data., The Annals of Statistics, 3321–3351, 2010.
  • [24] H. Lian. Shrinkage estimation and selection for multiple functional regression., Statistica Sinica, 23:51–74, 2013.
  • [25] K. Lounici, M. Pontil, S. Van de Geer, and A. Tsybakov. Oracle inequalities and optimal inference under group sparsity., The Annals of Statistics, 39(4) :2164–2204, 2011.
  • [26] S. S. Mahmood, D. Levy, R. S. Vasan, and T. J. Wang. The framingham heart study and the epidemiology of cardiovascular disease: a historical perspective., The Lancet, 383 (9921):999–1008, 2014.
  • [27] H. Matsui and S. Konishi. Variable selection for functional regression models via the L1 regularization., Computational Statistics & Data Analysis, 55(12) :3304–3310, 2011.
  • [28] H. Müller. Functional modeling of longitudinal data. In, Longitudinal Data Analysis, pages 223–252. CRC Press, 2008.
  • [29] C. J. O’Donnell and R. Elosua. Cardiovascular risk factors. Insights from framingham heart study., Revista Espanola de Cardiologia (English Edition), 61(3):299–310, 2008.
  • [30] R. Oliveira. The lower tail of random quadratic forms, with applications to ordinary least squares and restricted eigenvalue properties., arXiv preprint arXiv :1312.2903, 2013.
  • [31] J. Ramsay and B. Silverman., Functional Data Analysis. Springer, 2005.
  • [32] M. Reimherr and D. Nicolae. A functional data analysis approach for genetic association studies., The Annals of Applied Statistics, 8:406–429, 2014.
  • [33] P. Reiss, M. Mennes, and L. Huang. Fast function–on–scalar regression with penalized basis expansions., International Journal of Biostatistics, 6:Article 28, 2010.
  • [34] N. J. Samani, J. Erdmann, A. S. Hall, C. Hengstenberg, M. Mangino, B. Mayer, R. J. Dixon, T. Meitinger, P. Braund, and H.-E. Wichmann. Genomewide association analysis of coronary artery disease., New England Journal of Medicine, 357(5):443–453, 2007.
  • [35] L. J. Scott, K. L. Mohlke, L. L. Bonnycastle, C. J. Willer, Y. Li, W. L. Duren, M. R. Erdos, H. M. Stringham, P. S. Chines, and A. U. Jackson. A genome-wide association study of type 2 diabetes in finns detects multiple susceptibility variants., Science, 316 (5829):1341–1345, 2007.
  • [36] P. Shi., Weak signal identification and inference in penalized model selection. PhD thesis, University of Illinois at Urbana-Champaign, 2015.
  • [37] R. Tibshirani. The lasso problem and uniqueness., Electronic Journal of Statistics, 7 :1456–1490, 2013.
  • [38] J. Wang, B. Lin, P. Gong, P. Wonka, and J. Ye. Lasso screening rules via dual polytope projection. In, Advances in Neural Information Processing Systems, pages 1070–1078, 2013.
  • [39] L. Wang, H. Li, and J. Huang. Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements., Journal of the American Statistical Association, 103(484) :1556–1569, 2008.
  • [40] F. Wei, J. Huang, and H. Li. Variable selection and estimation in high-dimensional varying-coefficient models., Statistica Sinica, 21(4) :1515–1540, 2011.
  • [41] F. Yao, H.-G. Müller, and J.-L. Wang. Functional linear regression analysis for longitudinal data., The Annals of Statistics, 33 :2873–2903, 2005.
  • [42] X. Zhang and J. Wang. From sparse to dense functional data and beyond. Technical report, Preprint, 2015+.
  • [43] P. Zhao and L. Xue. Variable selection for semiparametric varying coefficient partially linear errors-in-variables models., Journal of Multivariate Analysis, 101(8) :1872–1883, 2010.
  • [44] H. Zhu, R. Li, and L. Kong. Multivariate varying coefficient model for functional responses., Annals of Statistics, 40(5) :2634–2666, 2012.