The Annals of Statistics

High-dimensional additive modeling

Lukas Meier, Sara van de Geer, and Peter Bühlmann

Full-text: Open access

Abstract

We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for high dimensional but sparse additive models. Finally, an adaptive version of our sparsity-smoothness penalized approach yields large additional performance gains.

Article information

Source
Ann. Statist. Volume 37, Number 6B (2009), 3779-3821.

Dates
First available in Project Euclid: 23 October 2009

Permanent link to this document
http://projecteuclid.org/euclid.aos/1256303527

Digital Object Identifier
doi:10.1214/09-AOS692

Mathematical Reviews number (MathSciNet)
MR2572443

Zentralblatt MATH identifier
05644256

Subjects
Primary: 62G08: Nonparametric regression 62F12: Asymptotic properties of estimators
Secondary: 62J07: Ridge regression; shrinkage estimators

Keywords
Group lasso model selection nonparametric regression oracle inequality penalized likelihood sparsity

Citation

Meier, Lukas; van de Geer, Sara; Bühlmann, Peter. High-dimensional additive modeling. Ann. Statist. 37 (2009), no. 6B, 3779--3821. doi:10.1214/09-AOS692. http://projecteuclid.org/euclid.aos/1256303527.


Export citation

References

  • [1] Agmon, S. (1965). Lectures on Elliptic Boundary Value Problems. Van Nostrand, Princeton, NJ.
  • [2] Baraud, Y. (2002). Model selection for regression on a random design. ESAIM Probab. Stat. 6 127–146.
  • [3] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [4] Bousquet, O. (2002). A Bennet concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–550.
  • [5] Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statist. Sci. 22 477–505.
  • [6] Bühlmann, P., Kalisch, M. and Maathuis, M. (2009). Variable selection for high-dimensional models: Partially faithful distributions and the PC-simple algorithm. Technical report, ETH Zürich.
  • [7] Bühlmann, P. and Yu, B. (2003). Boosting with the L2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339.
  • [8] Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Aggregation and sparsity via 1-penalized least squares. In Learning Theory. Lecture Notes in Computer Science 4005 379–391. Springer, Berlin.
  • [9] Bunea, F., Tsybakov, A. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the lasso. Electron. J. Stat. 1 169–194.
  • [10] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [11] Conlon, E. M., Liu, X. S., Lieb, J. D. and Liu, J. S. (2003). Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Nat. Acad. Sci. U.S.A. 100 3339–3344.
  • [12] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high-dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • [13] Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Monographs on Statistics and Applied Probability 58. Chapman and Hall, London.
  • [14] Greenshtein, E. and Ritov, Y. (2004). Persistency in high-dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10 971–988.
  • [15] Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and Semiparametric Models. Springer, New York.
  • [16] Kim, Y., Kim, J. and Kim, Y. (2006). Blockwise sparse regression. Statist. Sinica 16 375–390.
  • [17] Koltchinskii, V. and Yuan, M. (2008). Sparse recovery in large ensembles of kernel machines. In COLT (R. A. Servedio and T. Zhang, eds.) 229–238. Omnipress, Madison, WI.
  • [18] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer, Berlin.
  • [19] Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272–2297.
  • [20] Liu, X. S., Brutlag, D. L. and Liu, J. S. (2002). An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology 20 835–839.
  • [21] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53–71.
  • [22] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
  • [23] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [24] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • [25] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
  • [26] Ravikumar, P., Liu, H., Lafferty, J. and Wasserman, L. (2008). Spam: Sparse additive models. In Advances in Neural Information Processing Systems 20 (J. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 1201–1208. MIT Press, Cambridge, MA.
  • [27] Sardy, S. and Tseng, P. (2004). Amlet, Ramlet, and Gamlet: Automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Statist. 13 283–309.
  • [28] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [29] van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge.
  • [30] van de Geer, S. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
  • [31] van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • [32] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • [33] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [34] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.