Electronic Journal of Statistics

Model selection in semiparametric expectile regression

Elmar Spiegel, Fabian Sobotka, and Thomas Kneib

Full-text: Open access


Ordinary least squares regression focuses on the expected response and strongly depends on the assumption of normally distributed errors for inferences. An approach to overcome these restrictions is expectile regression, where no distributional assumption is made but rather the whole distribution of the response is described in terms of covariates. This is similar to quantile regression, but expectiles provide a convenient generalization of the arithmetic mean while quantiles are a generalization of the median. To analyze more complex data structures where purely linear predictors are no longer sufficient, semiparametric regression methods have been introduced for both ordinary least squares and expectile regression. However, with increasing complexity of the data and the regression structure, the selection of the true covariates and their effects becomes even more important than in standard regression models. Therefore we introduce several approaches depending on selection criteria and shrinkage methods to perform model selection in semiparametric expectile regression. Moreover, we propose a joint approach for model selection based on several asymmetries simultaneously to deal with the special feature that expectile regression estimates the complete distribution of the response. Furthermore, to distinguish between linear and smooth predictors, we split nonlinear effects into the purely linear trend and the deviation from this trend. All selection methods are compared with the benchmark of functional gradient descent boosting in a simulation study and applied to determine the relevant covariates when studying childhood malnutrition in Peru.

Article information

Electron. J. Statist., Volume 11, Number 2 (2017), 3008-3038.

Received: April 2016
First available in Project Euclid: 11 August 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Expectiles semiparametric regression model selection least asymmetrically weighted squares boosting non-negative garrote

Creative Commons Attribution 4.0 International License.


Spiegel, Elmar; Sobotka, Fabian; Kneib, Thomas. Model selection in semiparametric expectile regression. Electron. J. Statist. 11 (2017), no. 2, 3008--3038. doi:10.1214/17-EJS1307. https://projecteuclid.org/euclid.ejs/1502416822

Export citation


  • Akaike, H. (1974). A new look at the statistical model identification., IEEE Transactions on Automatic Control 19 716–723.
  • Breiman, L. (1995). Better subset regression using the nonnegative garrote., Technometrics 37 373–384.
  • Bühlmann, P. and Hothorn, T. (2007). Boosting Algorithms: Regularization, Prediction and Model Fitting., Statistical Science 22 477–505.
  • Burnham, K. P. and Anderson, D. R. (2002)., Model selection and multimodel inference: a practical information-theoretic approach. Springer Verlag, New York.
  • Chouldechova, A. and Hastie, T. (2015). Generalized Additive Model Selection., arXiv preprint arXiv:1506.03850.
  • Currie, I. and Durban, M. (2002). Flexible smoothing with P-splines: a unified approach., Statistical Modeling 2 333–349.
  • Instituto Nacional de Estadistica e Informatica (INEI) Lima Peru and ICF International Calverton Maryland USA [Producer] (2012). Peru Demographic and Health Survey 2012 [Dataset]. PEKR6IFL.SAV., ICF International [Distributor].
  • Doksum, K. and Koo, J.-Y. (2000). On spline estimators and prediction intervals in nonparametric regression., Computational Statistics & Data Analysis 35 67–82.
  • Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties., Statistical Science 11 89–121.
  • Fahrmeir, L., Kneib, T. and Lang, S. (2004). Penalized structured additive regression for space-time data: A Bayesian perspective., Statistica Sinica 14 715–745.
  • Fenske, N., Kneib, T. and Hothorn, T. (2011). Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression., Journal of the American Statistical Association 106 494–510.
  • Gijbels, I., Verhasselt, A. and Vrinssen, I. (2015). Variable selection using P-splines., Wiley Interdisciplinary Reviews: Computational Statistics 7 1–20.
  • Gneiting, T. (2011). Making and evaluating point forecasts., Journal of the American Statistical Association 106 746–762.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation., Journal of the American Statistical Association 102 359–378.
  • Gneiting, T. and Ranjan, R. (2011). Comparing Density Forecasts Using Threshold- and Quantile-Weighted Scoring Rules., Journal of Business & Economic Statistics 29 411–422.
  • Greven, S. and Kneib, T. (2010). On the behaviour of marginal and conditional AIC in linear mixed models., Biometrika 97 773–789.
  • Guo, C., Yang, H. and Lv, J. (2015). Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression., Statistical Papers 1–25.
  • Guo, J., Tang, M., Tian, M. and Zhu, K. (2013). Variable selection in high-dimensional partially linear additive models for composite quantile regression., Computational Statistics & Data Analysis 65 56–67.
  • He, X. and Ng, P. (1999). Quantile splines with several covariates., Journal of Statistical Planning and Inference 75 343–352.
  • Hofner, B., Mayr, A., Robinzonov, N. and Schmid, M. (2014). Model-based boosting in R: a hands-on tutorial using the R package mboost., Computational Statistics 29 3–35.
  • Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models., Annals of statistics 38 2282.
  • Jiang, L., Bondell, H. D. and Wang, H. J. (2014). Interquantile shrinkage and variable selection in quantile regression., Computational Statistics & Data Analysis 69 208–219.
  • Kai, B., Li, R. and Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models., Annals of statistics 39 305.
  • Koenker, R. (2011). Additive models for quantile regression: model selection and confidence bandaids., Brazilian Journal of Probability and Statistics 25 239–262.
  • Koenker, R. and Bassett, G. (1978). Regression quantiles., Econometrica: Journal of the Econometric Society 46 33–50.
  • Koenker, R. and Machado, J. A. (1999). Goodness of fit and related inference processes for quantile regression., Journal of the American Statistical Association 94 1296–1310.
  • Koenker, R., Ng, P. and Portnoy, S. (1994). Quantile smoothing splines., Biometrika 81 673–680.
  • Krivobokova, T., Kneib, T. and Claeskens, G. (2010). Simultaneous confidence bands for penalized spline estimators., Journal of the American Statistical Association 105 852–863.
  • Li, Y. and Zhu, J. (2008). L1-Norm Quantile Regression., Journal of Computational and Graphical Statistics 17 163–185.
  • Lin, X. and Zhang, D. (1999). Inference in generalized additive mixed models by using smoothing splines., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61 381–400.
  • Lin, C.-Y., Bondell, H., Zhang, H. H. and Zou, H. (2013). Variable selection for non-parametric quantile regression via smoothing spline analysis of variance., Stat 2 255–268.
  • Lv, J., Yang, H. and Guo, C. (2015). Smoothing combined generalized estimating equations in quantile partially linear additive models with longitudinal data., Computational Statistics 1–32.
  • Marra, G. and Wood, S. N. (2011). Practical variable selection for generalized additive models., Computational Statistics & Data Analysis 55 2372–2387.
  • Newey, W. K. and Powell, J. L. (1987). Asymmetric least squares estimation and testing., Econometrica: Journal of the Econometric Society 55 819–847.
  • Noh, H., Chung, K., Van Keilegom, I. et al. (2012). Variable selection of varying coefficient models in quantile regression., Electronic Journal of Statistics 6 1220–1238.
  • WHO Expert Committee on Physical Status (1995). Physical status: The use and interpretation of anthropometry., WHO technical report series 854.
  • Saefken, B., Kneib, T., van Waveren, C. and Greven, S. (2014). A unifying approach to the estimation of the conditional Akaike information in generalized linear mixed models., Electronic Journal of Statistics 8 201–225.
  • Schall, R. (1991). Estimation in generalized linear models with random effects., Biometrika 78 719–727.
  • Schnabel, S. K. and Eilers, P. H. C. (2009). Optimal expectile smoothing., Computational Statistics & Data Analysis 53 4168–4177.
  • Schulze-Waltrup, L., Sobotka, F., Kneib, T. and Kauermann, G. (2015). Expectile and Quantile Regression – David and Goliath?, Statistical Modelling 15 433–456.
  • Schwarz, G. et al. (1978). Estimating the dimension of a model., The annals of statistics 6 461–464.
  • Sobotka, F. and Kneib, T. (2012). Geoadditive expectile regression., Computational Statistics & Data Analysis 56 755–767.
  • Sobotka, F., Kauermann, G., Schulze-Waltrup, L. and Kneib, T. (2013). On confidence intervals for semiparametric expectile regression., Statistics and Computing 23 135–148.
  • Sobotka, F., Schnabel, S., Schulze-Waltrup, L., Eilers, P., Kneib, T. and Kauermann, G. (2016). expectreg: Expectile and Quantile Regression R package version, 0.50.
  • Sohn, A. (2016). acid: Analysing Conditional Income Distributions R package version, 1.1.
  • Tang, Y., Wang, H. J. and Zhu, Z. (2013). Variable selection in quantile varying coefficient models with longitudinal data., Computational Statistics & Data Analysis 57 435–449.
  • Tang, Y., Wang, H. J., Zhu, Z. and Song, X. (2012). A unified variable selection approach for varying coefficient models., Statistica Sinica 22 601–628.
  • R Core Team (2017). R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society. Series B (Methodological) 58 267–288.
  • Wang, H. J., Zhu, Z. and Zhou, J. (2009). Quantile regression in partially linear varying coefficient models., The Annals of Statistics 37 3841–3866.
  • Wu, Y. and Liu, Y. (2009). Variable selection in quantile regression., Statistica Sinica 19 801–817.
  • Wu, C. and Ma, S. (2015). A selective review of robust variable selection with applications in bioinformatics., Briefings in Bioinformatics 16 873–883.
  • Yao, Q. and Tong, H. (1996). Asymmetric least squares regression estimation: A nonparametric approach., Journal of Nonparametric Statistics 6 273–292.
  • Zou, H. and Yuan, M. (2008a). Composite quantile regression and the oracle model selection theory., The Annals of Statistics 36 1108–1126.
  • Zou, H. and Yuan, M. (2008b). Regularized simultaneous model selection in multiple quantiles regression., Computational Statistics & Data Analysis 52 5296–5304.