Electronic Journal of Statistics

Variable selection for partially linear models via learning gradients

Lei Yang, Yixin Fang, Junhui Wang, and Yongzhao Shao

Full-text: Open access


Partially linear models (PLMs) are important generalizations of linear models and are very useful for analyzing high-dimensional data. Compared to linear models, the PLMs possess desirable flexibility of non-parametric regression models because they have both linear and non-linear components. Variable selection for PLMs plays an important role in practical applications and has been extensively studied with respect to the linear component. However, for the non-linear component, variable selection has been well developed only for PLMs with extra structural assumptions such as additive PLMs and generalized additive PLMs. There is currently an unmet need for variable selection methods applicable to general PLMs without structural assumptions on the non-linear component. In this paper, we propose a new variable selection method based on learning gradients for general PLMs without any assumption on the structure of the non-linear component. The proposed method utilizes the reproducing-kernel-Hilbert-space tool to learn the gradients and the group-lasso penalty to select variables. In addition, a block-coordinate descent algorithm is suggested and some theoretical properties are established including selection consistency and estimation consistency. The performance of the proposed method is further evaluated via simulation studies and illustrated using real data.

Article information

Electron. J. Statist. Volume 11, Number 2 (2017), 2907-2930.

Received: August 2016
First available in Project Euclid: 8 August 2017

Permanent link to this document

Digital Object Identifier

PLM group Lasso gradient learning variable selection high-dimensional data reproducing kernel Hilbert space

Creative Commons Attribution 4.0 International License.


Yang, Lei; Fang, Yixin; Wang, Junhui; Shao, Yongzhao. Variable selection for partially linear models via learning gradients. Electron. J. Statist. 11 (2017), no. 2, 2907--2930. doi:10.1214/17-EJS1300. https://projecteuclid.org/euclid.ejs/1502157627

Export citation


  • [1] Bach, Francis and Jenatton, Rodolphe and Mairal, Julien and Obozinski, Guillaume and others. (2004). Convex optimization with sparsity-inducing norms., Optimization for Machine Learning, 5, 19–53.
  • [2] Bertin, Karine and Lecué, Guillaume. (2008). Selection of variables and dimension reduction in high-dimensional non-parametric regression., Electronic Journal of Statistics, 5, 19–53.
  • [3] Bunea, Florentina and Wegkamp, Marten H. (2004). Two-stage model selection procedures in partially linear regression., The Canadian Journal of Statistics, 32, 105–118.
  • [4] Cheng, Guang and Zhang, Hao Helen and Shang, Zuofeng. (2015). Sparse and efficient estimation for partial spline models with increasing dimension., Annals of the Institute of Statistical Mathematics.
  • [5] Cohen, J. (1960). A coefficient of agreement for nominal scales., Educational and Psychological Measurement. 20, 37–46.
  • [6] Comminges, Laëtitia and Dalalyan, Arnak S and others. (2012). Tight conditions for consistency of variable selection in the context of high dimensionality., The Annals of Statistic. 40, 2667–2696.
  • [7] De Brabanter, Kris and De Brabanter, Jos and De Moor, Bart and Gijbels, Irène. (2013). Derivative estimation with local polynomial fitting., The Journal of Machine Learning Research. 14, 281–301.
  • [8] Engle, R. F. and Granger, C. W. J. and Rice, J. and Weiss, A. (1986). Semiparametric estimates of the relation between weather and electricity sales., Journal of the American Statistical Association. 81, 310–320.
  • [9] Fan, Jianqing and Gijbels, I. (2003). Local polynomial modelling and its applications., CRC Press, Boca Raton.
  • [10] Fan, J. and Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties., Journal of the American Statistical Association, 96, 1348–1360.
  • [11] Fan, J. and Li, R. (2004). New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis., Journal of the American Statistical Association, 99, 710–723.
  • [12] Fan, Jianqing and Lv, Jinchi. (2010). A selective overview of variable selection in high dimensional feature space., Statistica Sinica, 20, 101–148.
  • [13] Friedman, Jerome and Hastie, Trevor and Tibshirani, Rob. (2010). Regularization paths for generalized linear models via coordinate descent., Journal of statistical software, 33.
  • [14] Golub, Gene H and Heath, Michael and Wahba, Grace. (1979). Generalized cross-validation as a method for choosing a good ridge parameter., Technometrics, 21, 215–223.
  • [15] Härdle, Wolfgang and Gasser, Theo. (1985). On robust kernel estimation of derivatives of regression functions., Scandinavian journal of statistics, 233–240.
  • [16] Härdle, Wolfgang and Liang, Hua and Gao, Jiti. (2000). Partially Linear Models., Physica-Verlag, Heidelberg.
  • [17] Härdle, W. and Müller, M. and Sperlich, S. and Werwatz, A. (2004). Nonparametric and Semiparametric Models., Springer-Verlag, New York.
  • [18] Huang, J. and Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models., Annals of Statistics, 38, 2282–2313.
  • [19] Huang, Jianhua Z and Yang, Lijian. (2004). Identification of non-linear additive autoregressive models., Journal of the Royal Statistical Society: Series B, 66, 463–477.
  • [20] Hunsberger, Sally and Albert, Paul S. and Follmann, Dean A. and Suh, Edward. (2002). Parametric and semiparametric approaches to testing for seasonal trend in serial count data., Biostatistic, 3, 289– 298.
  • [21] Jarrow, Robert and Ruppert, David and Yu, Yan. (2004). Estimating the interest rate term structure of corporate debt with a semiparametric penalized spline model., Journal of the American Statistical Association, 99, 57–66.
  • [22] Lafferty, John and Wasserman, Larry. (2008). Rodeo: sparse, greedy nonparametric regression., The Annals of Statistics, 28–63.
  • [23] Liang, Feng and Paulo, Rui and Molina, German and Clyde, Merlise A. and Berger, Jim O. (2008). Mixtures of $g$ priors for Bayesian variable selection., Journal of the American Statistical Association, 103, 410–423.
  • [24] Liang, H. and Härdle, W. and and Carroll, R.J. (1999). Estimation in a semiparametric partially linear errors-in-variables model., The Annals of Statistics, 27, 1519–1535.
  • [25] Liang, Hua and Li, Runze. (2009). Variable selection for partially linear models with measurement errors., Journal of the American Statistical Association, 104, 234–248.
  • [26] Y. Lin and H. H. Zhang. (2006). Component Selection and Smoothing in Smoothing Spline Analysis of Variance Models., Applied Statistics, 34, 2272–2297.
  • [27] Liu, X. and Wang, Li and Liang, H. (2011). Estimation and Variable Selection for Semiparametric Additive Partial Linear Models., Statistica Sinica, 21, 1225–1248.
  • [28] Miller, Hugh and Hall, Peter. (2006). Local polynomial regression and variable selection., Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown, 216–233.
  • [29] Mukherjee, Sayan and Wu, Qiang. (2006). Estimation of gradients and coordinate covariation in classification., Journal of Machine Learning Research, 7, 2481–2514.
  • [30] Mukherjee, S. and Zhou, D. (2006). Learning coordinate covariates via gradient., Journal of Machine Learning Research, 7, 419–549.
  • [31] Müller, Hans-Georg and Stadtmüller, U and Schmitt, Thoma. (1987). Bandwidth choice and confidence intervals for derivatives of noisy data., Biometrika, 74, 743–749.
  • [32] Nesterov, Yu. (2005). Smooth minimization of non-smooth functions., Mathematical programming, 103, 127–152.
  • [33] Ni, Xiao and Zhang, Hao Helen and Zhang, Daowen. (2009). Automatic model selection for partially linear models., Journal of Multivariate Analysis, 100, 2100–2111.
  • [34] Prada-Sánchez, J.M. and Febrero-Bande, M. and Cotos-Yáñez, T. and González-Manteiga, W. and Bermúdez-Cela, J.L. and Lucas-Dominguez, T. (2000). Prediction of SO$_2$ pollution incidents near a power station using partially linear models and an historical matrix of predictor-response vectors., Environmetrics, 11, 209–225.
  • [35] Robinson, P. M. (1998). Root $n$-Consistent Semiparametric Regression., Econometrica, 56, 931–954.
  • [36] Rosasco, Lorenzo and Villa, Silvia and Mosci, Sofia and Santoro, Matteo and Verri, Alessandro. (2013). Nonparametric sparsity and regularization., Journal of Machine Learning Research, 14, 1665–1714.
  • [37] Shively, T.S. and Kohn, R. and Wood, S. (1999). Variable selection and function estimation in additive nonparametric regression using a data-based prior (with discussion)., Journal of the American Statistical Association, 94, 777–794.
  • [38] Speckman, P. E. (1998). Kernel smoothing in partial linear models., Journal of the Royal Statistical Society: Series B, 50, 413–436.
  • [39] Sun, W. and Wang, J. and Fang, Y. (2013). Consistent selection of tuning parameters via variable selection stability., Journal of Machine Learning Research, 14, 3419–3440.
  • [40] Tang, E Ke and Suganthan, Ponnuthurai N and Yao, Xin and Qin, A Kai. (2005). Linear dimensionality reduction using relevance weighted LDA., Pattern recognition, 38, 485–493.
  • [41] Tibshirani, Robert. (1996). Regression Shrinkage and Selection via the Lasso., Journal of the Royal Statistical Society: Series B, 58, 267–288.
  • [42] Van Der Vaart, Aad W and Wellner, Jon A. (1996). Weak convergence., Weak Convergence and Empirical Processes, 16–28.
  • [43] G. Wahba. (1990). Spline Models for Observational Data., CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, Philadelphia.
  • [44] Wang, Hansheng and Leng, Chenlei. (2008). A note on adaptive group lasso., Computational Statistics and Data Analysis , 52, 5277–5286.
  • [45] Wang, L. and Liu, X. and Liang, H. and Carroll, R. (2011). Estimation and Variable Selection for Generalized Additive Partial Linear Models., The Annals of Statistics, 39, 1827–1851.
  • [46] Xie, Huiliang and Huang, Jian. (2009). SCAD-penalized regression in high-dimensional partially linear models., The Annals of Statistics, 37, 673–696.
  • [47] Xue, Lan. (2009). Consistent variable selection in additive models., Statistica Sinica, 19, 1281–1296.
  • [48] Yafeh, Yishay and Yosha, Oved. (2003). Large Shareholders and Banks: Who monitors and How?, The Economic Journal, 113, 128–146.
  • [49] Yang, Lei and Lv, Shaogao and Wang, Junhui. (2016). Model free variable selection in reproducing Kernel Hilbert space., Journal of Machine Learning Research, 17, 1–24.
  • [50] Yang, Yi and Zou, Hui. (2015). A fast unified algorithm for solving group-lasso penalize learning problems., Statistics and Computing, 25, 1129–1141.
  • [51] A. Yatchew and J. A. No. (2001). Household Gasoline Demand in Canada., Econometrica, 69, 1697–1709.
  • [52] Ye, Guibo and Xie, Xiaohui. (2012). Learning sparse gradients for variable selection and dimension reduction., Machine Learning Journal, 87, 303–355.
  • [53] Ying, Yiming and Wu, Qiang and Campbell, Colin. (2012). Learning the coordinate gradients., Advances in Computational Mathematics, 37, 355–378.
  • [54] Yuan, M. and Lin, Y. (2006). Model Selection and Estimation in Regression with Grouped Variables., Journal of the Royal Statistical Society: Series B, 68, 49–67.
  • [55] Zeger, S.L. and Diggle, P.J. (1994). Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters., Biometrics, 50, 689–699.
  • [56] Zou, H. (2006). The adaptive lasso and its oracle properties., Journal of the American Statistical Association, 101, 1418–1429.