Electronic Journal of Statistics

Model selection in semiparametric expectile regression

Elmar Spiegel, Fabian Sobotka, and Thomas Kneib

Full-text: Open access

Abstract

Ordinary least squares regression focuses on the expected response and strongly depends on the assumption of normally distributed errors for inferences. An approach to overcome these restrictions is expectile regression, where no distributional assumption is made but rather the whole distribution of the response is described in terms of covariates. This is similar to quantile regression, but expectiles provide a convenient generalization of the arithmetic mean while quantiles are a generalization of the median. To analyze more complex data structures where purely linear predictors are no longer sufficient, semiparametric regression methods have been introduced for both ordinary least squares and expectile regression. However, with increasing complexity of the data and the regression structure, the selection of the true covariates and their effects becomes even more important than in standard regression models. Therefore we introduce several approaches depending on selection criteria and shrinkage methods to perform model selection in semiparametric expectile regression. Moreover, we propose a joint approach for model selection based on several asymmetries simultaneously to deal with the special feature that expectile regression estimates the complete distribution of the response. Furthermore, to distinguish between linear and smooth predictors, we split nonlinear effects into the purely linear trend and the deviation from this trend. All selection methods are compared with the benchmark of functional gradient descent boosting in a simulation study and applied to determine the relevant covariates when studying childhood malnutrition in Peru.

Article information

Source
Electron. J. Statist. Volume 11, Number 2 (2017), 3008-3038.

Dates
Received: April 2016
First available in Project Euclid: 11 August 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1502416822

Digital Object Identifier
doi:10.1214/17-EJS1307

Zentralblatt MATH identifier
06790052

Keywords
Expectiles semiparametric regression model selection least asymmetrically weighted squares boosting non-negative garrote

Rights
Creative Commons Attribution 4.0 International License.

Citation

Spiegel, Elmar; Sobotka, Fabian; Kneib, Thomas. Model selection in semiparametric expectile regression. Electron. J. Statist. 11 (2017), no. 2, 3008--3038. doi:10.1214/17-EJS1307. https://projecteuclid.org/euclid.ejs/1502416822


Export citation

References

  • Akaike, H. (1974). A new look at the statistical model, identification.IEEE Transactions on Automatic Control19716–723.
  • Breiman, L. (1995). Better subset regression using the nonnegative, garrote.Technometrics37373–384.
  • Bühlmann, P. and Hothorn, T. (2007). Boosting Algorithms: Regularization, Prediction and Model, Fitting.Statistical Science22477–505.
  • Burnham, K. P. and Anderson, D. R., (2002).Model selection and multimodel inference: a practical information-theoretic approach. Springer Verlag, New York.
  • Chouldechova, A. and Hastie, T. (2015). Generalized Additive Model, Selection.arXiv preprint arXiv:1506.03850.
  • Currie, I. and Durban, M. (2002). Flexible smoothing with P-splines: a unified, approach.Statistical Modeling2333–349.
  • Instituto Nacional de Estadistica e Informatica (INEI) Lima Peru and ICF International Calverton Maryland USA [Producer] (2012). Peru Demographic and Health Survey 2012 [Dataset]., PEKR6IFL.SAV.ICF International [Distributor].
  • Doksum, K. and Koo, J.-Y. (2000). On spline estimators and prediction intervals in nonparametric, regression.Computational Statistics & Data Analysis3567–82.
  • Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and, penalties.Statistical Science1189–121.
  • Fahrmeir, L., Kneib, T. and Lang, S. (2004). Penalized structured additive regression for space-time data: A Bayesian, perspective.Statistica Sinica14715–745.
  • Fenske, N., Kneib, T. and Hothorn, T. (2011). Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile, Regression.Journal of the American Statistical Association106494–510.
  • Gijbels, I., Verhasselt, A. and Vrinssen, I. (2015). Variable selection using, P-splines.Wiley Interdisciplinary Reviews: Computational Statistics71–20.
  • Gneiting, T. (2011). Making and evaluating point, forecasts.Journal of the American Statistical Association106746–762.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and, estimation.Journal of the American Statistical Association102359–378.
  • Gneiting, T. and Ranjan, R. (2011). Comparing Density Forecasts Using Threshold- and Quantile-Weighted Scoring, Rules.Journal of Business & Economic Statistics29411–422.
  • Greven, S. and Kneib, T. (2010). On the behaviour of marginal and conditional AIC in linear mixed, models.Biometrika97773–789.
  • Guo, C., Yang, H. and Lv, J. (2015). Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile, regression.Statistical Papers1–25.
  • Guo, J., Tang, M., Tian, M. and Zhu, K. (2013). Variable selection in high-dimensional partially linear additive models for composite quantile, regression.Computational Statistics & Data Analysis6556–67.
  • He, X. and Ng, P. (1999). Quantile splines with several, covariates.Journal of Statistical Planning and Inference75343–352.
  • Hofner, B., Mayr, A., Robinzonov, N. and Schmid, M. (2014). Model-based boosting in R: a hands-on tutorial using the R package, mboost.Computational Statistics293–35.
  • Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive, models.Annals of statistics382282.
  • Jiang, L., Bondell, H. D. and Wang, H. J. (2014). Interquantile shrinkage and variable selection in quantile, regression.Computational Statistics & Data Analysis69208–219.
  • Kai, B., Li, R. and Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear, models.Annals of statistics39305.
  • Koenker, R. (2011). Additive models for quantile regression: model selection and confidence, bandaids.Brazilian Journal of Probability and Statistics25239–262.
  • Koenker, R. and Bassett, G. (1978). Regression, quantiles.Econometrica: Journal of the Econometric Society4633–50.
  • Koenker, R. and Machado, J. A. (1999). Goodness of fit and related inference processes for quantile, regression.Journal of the American Statistical Association941296–1310.
  • Koenker, R., Ng, P. and Portnoy, S. (1994). Quantile smoothing, splines.Biometrika81673–680.
  • Krivobokova, T., Kneib, T. and Claeskens, G. (2010). Simultaneous confidence bands for penalized spline, estimators.Journal of the American Statistical Association105852–863.
  • Li, Y. and Zhu, J. (2008). L1-Norm Quantile, Regression.Journal of Computational and Graphical Statistics17163–185.
  • Lin, X. and Zhang, D. (1999). Inference in generalized additive mixed models by using smoothing, splines.Journal of the Royal Statistical Society: Series B (Statistical Methodology)61381–400.
  • Lin, C.-Y., Bondell, H., Zhang, H. H. and Zou, H. (2013). Variable selection for non-parametric quantile regression via smoothing spline analysis of, variance.Stat2255–268.
  • Lv, J., Yang, H. and Guo, C. (2015). Smoothing combined generalized estimating equations in quantile partially linear additive models with longitudinal, data.Computational Statistics1–32.
  • Marra, G. and Wood, S. N. (2011). Practical variable selection for generalized additive, models.Computational Statistics & Data Analysis552372–2387.
  • Newey, W. K. and Powell, J. L. (1987). Asymmetric least squares estimation and, testing.Econometrica: Journal of the Econometric Society55819–847.
  • Noh, H., Chung, K., Van Keilegom, I. et al. (2012). Variable selection of varying coefficient models in quantile, regression.Electronic Journal of Statistics61220–1238.
  • WHO Expert Committee on Physical Status (1995). Physical status: The use and interpretation of, anthropometry.WHO technical report series854.
  • Saefken, B., Kneib, T., van Waveren, C. and Greven, S. (2014). A unifying approach to the estimation of the conditional Akaike information in generalized linear mixed, models.Electronic Journal of Statistics8201–225.
  • Schall, R. (1991). Estimation in generalized linear models with random, effects.Biometrika78719–727.
  • Schnabel, S. K. and Eilers, P. H. C. (2009). Optimal expectile, smoothing.Computational Statistics & Data Analysis534168–4177.
  • Schulze-Waltrup, L., Sobotka, F., Kneib, T. and Kauermann, G. (2015). Expectile and Quantile Regression – David and, Goliath?Statistical Modelling15433–456.
  • Schwarz, G. et al. (1978). Estimating the dimension of a, model.The annals of statistics6461–464.
  • Sobotka, F. and Kneib, T. (2012). Geoadditive expectile, regression.Computational Statistics & Data Analysis56755–767.
  • Sobotka, F., Kauermann, G., Schulze-Waltrup, L. and Kneib, T. (2013). On confidence intervals for semiparametric expectile, regression.Statistics and Computing23135–148.
  • Sobotka, F., Schnabel, S., Schulze-Waltrup, L., Eilers, P., Kneib, T. and Kauermann, G. (2016). expectreg: Expectile and Quantile Regression R package version, 0.50.
  • Sohn, A. (2016). acid: Analysing Conditional Income Distributions R package version, 1.1.
  • Tang, Y., Wang, H. J. and Zhu, Z. (2013). Variable selection in quantile varying coefficient models with longitudinal, data.Computational Statistics & Data Analysis57435–449.
  • Tang, Y., Wang, H. J., Zhu, Z. and Song, X. (2012). A unified variable selection approach for varying coefficient, models.Statistica Sinica22601–628.
  • R Core Team (2017). R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austriahttp://www.R-project.org.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the, lasso.Journal of the Royal Statistical Society. Series B (Methodological)58267–288.
  • Wang, H. J., Zhu, Z. and Zhou, J. (2009). Quantile regression in partially linear varying coefficient, models.The Annals of Statistics373841–3866.
  • Wu, Y. and Liu, Y. (2009). Variable selection in quantile, regression.Statistica Sinica19801–817.
  • Wu, C. and Ma, S. (2015). A selective review of robust variable selection with applications in, bioinformatics.Briefings in Bioinformatics16873–883.
  • Yao, Q. and Tong, H. (1996). Asymmetric least squares regression estimation: A nonparametric, approach.Journal of Nonparametric Statistics6273–292.
  • Zou, H. and Yuan, M. (2008a). Composite quantile regression and the oracle model selection, theory.The Annals of Statistics361108–1126.
  • Zou, H. and Yuan, M. (2008b). Regularized simultaneous model selection in multiple quantiles, regression.Computational Statistics & Data Analysis525296–5304.