Electronic Journal of Statistics

Variable selection for partially linear models via learning gradients

Lei Yang, Yixin Fang, Junhui Wang, and Yongzhao Shao

Full-text: Open access


Partially linear models (PLMs) are important generalizations of linear models and are very useful for analyzing high-dimensional data. Compared to linear models, the PLMs possess desirable flexibility of non-parametric regression models because they have both linear and non-linear components. Variable selection for PLMs plays an important role in practical applications and has been extensively studied with respect to the linear component. However, for the non-linear component, variable selection has been well developed only for PLMs with extra structural assumptions such as additive PLMs and generalized additive PLMs. There is currently an unmet need for variable selection methods applicable to general PLMs without structural assumptions on the non-linear component. In this paper, we propose a new variable selection method based on learning gradients for general PLMs without any assumption on the structure of the non-linear component. The proposed method utilizes the reproducing-kernel-Hilbert-space tool to learn the gradients and the group-lasso penalty to select variables. In addition, a block-coordinate descent algorithm is suggested and some theoretical properties are established including selection consistency and estimation consistency. The performance of the proposed method is further evaluated via simulation studies and illustrated using real data.

Article information

Electron. J. Statist. Volume 11, Number 2 (2017), 2907-2930.

Received: August 2016
First available in Project Euclid: 8 August 2017

Permanent link to this document

Digital Object Identifier

Zentralblatt MATH identifier

PLM group Lasso gradient learning variable selection high-dimensional data reproducing kernel Hilbert space

Creative Commons Attribution 4.0 International License.


Yang, Lei; Fang, Yixin; Wang, Junhui; Shao, Yongzhao. Variable selection for partially linear models via learning gradients. Electron. J. Statist. 11 (2017), no. 2, 2907--2930. doi:10.1214/17-EJS1300. https://projecteuclid.org/euclid.ejs/1502157627

Export citation


  • [1] Bach, Francis and Jenatton, Rodolphe and Mairal, Julien and Obozinski, Guillaume and others. (2004). Convex optimization with sparsity-inducing, norms.Optimization for Machine Learning,5, 19–53.
  • [2] Bertin, Karine and Lecué, Guillaume. (2008). Selection of variables and dimension reduction in high-dimensional non-parametric, regression.Electronic Journal of Statistics,5, 19–53.
  • [3] Bunea, Florentina and Wegkamp, Marten H. (2004). Two-stage model selection procedures in partially linear, regression.The Canadian Journal of Statistics,32, 105–118.
  • [4] Cheng, Guang and Zhang, Hao Helen and Shang, Zuofeng. (2015). Sparse and efficient estimation for partial spline models with increasing, dimension.Annals of the Institute of Statistical Mathematics.
  • [5] Cohen, J. (1960). A coefficient of agreement for nominal, scales.Educational and Psychological Measurement.20, 37–46.
  • [6] Comminges, Laëtitia and Dalalyan, Arnak S and others. (2012). Tight conditions for consistency of variable selection in the context of high, dimensionality.The Annals of Statistic.40, 2667–2696.
  • [7] De Brabanter, Kris and De Brabanter, Jos and De Moor, Bart and Gijbels, Irène. (2013). Derivative estimation with local polynomial, fitting.The Journal of Machine Learning Research.14, 281–301.
  • [8] Engle, R. F. and Granger, C. W. J. and Rice, J. and Weiss, A. (1986). Semiparametric estimates of the relation between weather and electricity, sales.Journal of the American Statistical Association.81, 310–320.
  • [9] Fan, Jianqing and Gijbels, I. (2003). Local polynomial modelling and its, applications.CRC Press, Boca Raton.
  • [10] Fan, J. and Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle, Properties.Journal of the American Statistical Association,96, 1348–1360.
  • [11] Fan, J. and Li, R. (2004). New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data, Analysis.Journal of the American Statistical Association,99, 710–723.
  • [12] Fan, Jianqing and Lv, Jinchi. (2010). A selective overview of variable selection in high dimensional feature, space.Statistica Sinica,20, 101–148.
  • [13] Friedman, Jerome and Hastie, Trevor and Tibshirani, Rob. (2010). Regularization paths for generalized linear models via coordinate, descent.Journal of statistical software,33.
  • [14] Golub, Gene H and Heath, Michael and Wahba, Grace. (1979). Generalized cross-validation as a method for choosing a good ridge, parameter.Technometrics,21, 215–223.
  • [15] Härdle, Wolfgang and Gasser, Theo. (1985). On robust kernel estimation of derivatives of regression, functions.Scandinavian journal of statistics, 233–240.
  • [16] Härdle, Wolfgang and Liang, Hua and Gao, Jiti. (2000). Partially Linear, Models.Physica-Verlag, Heidelberg.
  • [17] Härdle, W. and Müller, M. and Sperlich, S. and Werwatz, A. (2004). Nonparametric and Semiparametric, Models.Springer-Verlag, New York.
  • [18] Huang, J. and Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive, models.Annals of Statistics,38, 2282–2313.
  • [19] Huang, Jianhua Z and Yang, Lijian. (2004). Identification of non-linear additive autoregressive, models.Journal of the Royal Statistical Society: Series B,66, 463–477.
  • [20] Hunsberger, Sally and Albert, Paul S. and Follmann, Dean A. and Suh, Edward. (2002). Parametric and semiparametric approaches to testing for seasonal trend in serial count, data.Biostatistic,3, 289– 298.
  • [21] Jarrow, Robert and Ruppert, David and Yu, Yan. (2004). Estimating the interest rate term structure of corporate debt with a semiparametric penalized spline, model.Journal of the American Statistical Association,99, 57–66.
  • [22] Lafferty, John and Wasserman, Larry. (2008). Rodeo: sparse, greedy nonparametric, regression.The Annals of Statistics, 28–63.
  • [23] Liang, Feng and Paulo, Rui and Molina, German and Clyde, Merlise A. and Berger, Jim O. (2008). Mixtures of $g$ priors for Bayesian variable, selection.Journal of the American Statistical Association,103, 410–423.
  • [24] Liang, H. and Härdle, W. and and Carroll, R.J. (1999). Estimation in a semiparametric partially linear errors-in-variables, model.The Annals of Statistics,27, 1519–1535.
  • [25] Liang, Hua and Li, Runze. (2009). Variable selection for partially linear models with measurement, errors.Journal of the American Statistical Association,104, 234–248.
  • [26] Y. Lin and H. H. Zhang. (2006). Component Selection and Smoothing in Smoothing Spline Analysis of Variance, Models.Applied Statistics,34, 2272–2297.
  • [27] Liu, X. and Wang, Li and Liang, H. (2011). Estimation and Variable Selection for Semiparametric Additive Partial Linear, Models.Statistica Sinica,21, 1225–1248.
  • [28] Miller, Hugh and Hall, Peter. (2006). Local polynomial regression and variable, selection.Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown, 216–233.
  • [29] Mukherjee, Sayan and Wu, Qiang. (2006). Estimation of gradients and coordinate covariation in, classification.Journal of Machine Learning Research,7, 2481–2514.
  • [30] Mukherjee, S. and Zhou, D. (2006). Learning coordinate covariates via, gradient.Journal of Machine Learning Research,7, 419–549.
  • [31] Müller, Hans-Georg and Stadtmüller, U and Schmitt, Thoma. (1987). Bandwidth choice and confidence intervals for derivatives of noisy, data.Biometrika,74, 743–749.
  • [32] Nesterov, Yu. (2005). Smooth minimization of non-smooth, functions.Mathematical programming,103, 127–152.
  • [33] Ni, Xiao and Zhang, Hao Helen and Zhang, Daowen. (2009). Automatic model selection for partially linear, models.Journal of Multivariate Analysis,100, 2100–2111.
  • [34] Prada-Sánchez, J.M. and Febrero-Bande, M. and Cotos-Yáñez, T. and González-Manteiga, W. and Bermúdez-Cela, J.L. and Lucas-Dominguez, T. (2000). Prediction of SO$_2$ pollution incidents near a power station using partially linear models and an historical matrix of predictor-response, vectors.Environmetrics,11, 209–225.
  • [35] Robinson, P. M. (1998). Root $n$-Consistent Semiparametric, Regression.Econometrica,56, 931–954.
  • [36] Rosasco, Lorenzo and Villa, Silvia and Mosci, Sofia and Santoro, Matteo and Verri, Alessandro. (2013). Nonparametric sparsity and, regularization.Journal of Machine Learning Research,14, 1665–1714.
  • [37] Shively, T.S. and Kohn, R. and Wood, S. (1999). Variable selection and function estimation in additive nonparametric regression using a data-based prior (with, discussion).Journal of the American Statistical Association,94, 777–794.
  • [38] Speckman, P. E. (1998). Kernel smoothing in partial linear, models.Journal of the Royal Statistical Society: Series B,50, 413–436.
  • [39] Sun, W. and Wang, J. and Fang, Y. (2013). Consistent selection of tuning parameters via variable selection, stability.Journal of Machine Learning Research,14, 3419–3440.
  • [40] Tang, E Ke and Suganthan, Ponnuthurai N and Yao, Xin and Qin, A Kai. (2005). Linear dimensionality reduction using relevance weighted, LDA.Pattern recognition,38, 485–493.
  • [41] Tibshirani, Robert. (1996). Regression Shrinkage and Selection via the, Lasso.Journal of the Royal Statistical Society: Series B,58, 267–288.
  • [42] Van Der Vaart, Aad W and Wellner, Jon A. (1996). Weak, convergence.Weak Convergence and Empirical Processes, 16–28.
  • [43] G. Wahba. (1990). Spline Models for Observational, Data.CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, Philadelphia.
  • [44] Wang, Hansheng and Leng, Chenlei. (2008). A note on adaptive group, lasso.Computational Statistics and Data Analysis,52, 5277–5286.
  • [45] Wang, L. and Liu, X. and Liang, H. and Carroll, R. (2011). Estimation and Variable Selection for Generalized Additive Partial Linear, Models.The Annals of Statistics,39, 1827–1851.
  • [46] Xie, Huiliang and Huang, Jian. (2009). SCAD-penalized regression in high-dimensional partially linear, models.The Annals of Statistics,37, 673–696.
  • [47] Xue, Lan. (2009). Consistent variable selection in additive, models.Statistica Sinica,19, 1281–1296.
  • [48] Yafeh, Yishay and Yosha, Oved. (2003). Large Shareholders and Banks: Who monitors and, How?The Economic Journal,113, 128–146.
  • [49] Yang, Lei and Lv, Shaogao and Wang, Junhui. (2016). Model free variable selection in reproducing Kernel Hilbert, space.Journal of Machine Learning Research,17, 1–24.
  • [50] Yang, Yi and Zou, Hui. (2015). A fast unified algorithm for solving group-lasso penalize learning, problems.Statistics and Computing,25, 1129–1141.
  • [51] A. Yatchew and J. A. No. (2001). Household Gasoline Demand in, Canada.Econometrica,69, 1697–1709.
  • [52] Ye, Guibo and Xie, Xiaohui. (2012). Learning sparse gradients for variable selection and dimension, reduction.Machine Learning Journal,87, 303–355.
  • [53] Ying, Yiming and Wu, Qiang and Campbell, Colin. (2012). Learning the coordinate, gradients.Advances in Computational Mathematics,37, 355–378.
  • [54] Yuan, M. and Lin, Y. (2006). Model Selection and Estimation in Regression with Grouped, Variables.Journal of the Royal Statistical Society: Series B,68, 49–67.
  • [55] Zeger, S.L. and Diggle, P.J. (1994). Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV, seroconverters.Biometrics,50, 689–699.
  • [56] Zou, H. (2006). The adaptive lasso and its oracle, properties.Journal of the American Statistical Association,101, 1418–1429.