The Annals of Statistics

High-dimensional additive modeling

Lukas Meier, Sara van de Geer, and Peter Bühlmann

Source: Ann. Statist. Volume 37, Number 6B (2009), 3779-3821.

Abstract

We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for high dimensional but sparse additive models. Finally, an adaptive version of our sparsity-smoothness penalized approach yields large additional performance gains.

Primary Subjects: 62G08, 62F12
Secondary Subjects: 62J07
Keywords: Group lasso; model selection; nonparametric regression; oracle inequality; penalized likelihood; sparsity

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1256303527
Digital Object Identifier: doi:10.1214/09-AOS692

References

[1] Agmon, S. (1965). Lectures on Elliptic Boundary Value Problems. Van Nostrand, Princeton, NJ.
Mathematical Reviews (MathSciNet): MR178246
Zentralblatt MATH: 0142.37401
[2] Baraud, Y. (2002). Model selection for regression on a random design. ESAIM Probab. Stat. 6 127–146.
Mathematical Reviews (MathSciNet): MR1918295
Digital Object Identifier: doi:10.1051/ps:2002007
[3] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
Mathematical Reviews (MathSciNet): MR2533469
Zentralblatt MATH: 05582008
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830
[4] Bousquet, O. (2002). A Bennet concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–550.
Mathematical Reviews (MathSciNet): MR1890640
Zentralblatt MATH: 1001.60021
Digital Object Identifier: doi:10.1016/S1631-073X(02)02292-6
[5] Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statist. Sci. 22 477–505.
[6] Bühlmann, P., Kalisch, M. and Maathuis, M. (2009). Variable selection for high-dimensional models: Partially faithful distributions and the PC-simple algorithm. Technical report, ETH Zürich.
[7] Bühlmann, P. and Yu, B. (2003). Boosting with the L2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339.
[8] Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Aggregation and sparsity via 1-penalized least squares. In Learning Theory. Lecture Notes in Computer Science 4005 379–391. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR2280619
Zentralblatt MATH: 1143.62319
Digital Object Identifier: doi:10.1007/11776420_29
[9] Bunea, F., Tsybakov, A. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the lasso. Electron. J. Stat. 1 169–194.
Mathematical Reviews (MathSciNet): MR2312149
Zentralblatt MATH: 1146.62028
Digital Object Identifier: doi:10.1214/07-EJS008
Project Euclid: euclid.ejs/1179759718
[10] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
Mathematical Reviews (MathSciNet): MR2382644
Zentralblatt MATH: 1139.62019
Digital Object Identifier: doi:10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958
[11] Conlon, E. M., Liu, X. S., Lieb, J. D. and Liu, J. S. (2003). Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Nat. Acad. Sci. U.S.A. 100 3339–3344.
[12] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high-dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
[13] Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Monographs on Statistics and Applied Probability 58. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1270012
Zentralblatt MATH: 0832.62032
[14] Greenshtein, E. and Ritov, Y. (2004). Persistency in high-dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10 971–988.
[15] Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and Semiparametric Models. Springer, New York.
[16] Kim, Y., Kim, J. and Kim, Y. (2006). Blockwise sparse regression. Statist. Sinica 16 375–390.
Mathematical Reviews (MathSciNet): MR2267240
Zentralblatt MATH: 1096.62076
[17] Koltchinskii, V. and Yuan, M. (2008). Sparse recovery in large ensembles of kernel machines. In COLT (R. A. Servedio and T. Zhang, eds.) 229–238. Omnipress, Madison, WI.
[18] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1102015
Zentralblatt MATH: 0748.60004
[19] Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272–2297.
Mathematical Reviews (MathSciNet): MR2291500
Zentralblatt MATH: 1106.62041
Digital Object Identifier: doi:10.1214/009053606000000722
Project Euclid: euclid.aos/1169571797
[20] Liu, X. S., Brutlag, D. L. and Liu, J. S. (2002). An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology 20 835–839.
[21] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53–71.
[22] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
Mathematical Reviews (MathSciNet): MR2409990
[23] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
[24] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
Mathematical Reviews (MathSciNet): MR2488351
Zentralblatt MATH: 1155.62050
Digital Object Identifier: doi:10.1214/07-AOS582
Project Euclid: euclid.aos/1232115934
[25] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
Mathematical Reviews (MathSciNet): MR1822089
Digital Object Identifier: doi:10.2307/1390657
[26] Ravikumar, P., Liu, H., Lafferty, J. and Wasserman, L. (2008). Spam: Sparse additive models. In Advances in Neural Information Processing Systems 20 (J. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 1201–1208. MIT Press, Cambridge, MA.
[27] Sardy, S. and Tseng, P. (2004). Amlet, Ramlet, and Gamlet: Automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Statist. 13 283–309.
Mathematical Reviews (MathSciNet): MR2063986
Digital Object Identifier: doi:10.1198/1061860043434
[28] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242
[29] van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge.
[30] van de Geer, S. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
Mathematical Reviews (MathSciNet): MR2396809
Zentralblatt MATH: 1138.62323
Digital Object Identifier: doi:10.1214/009053607000000929
Project Euclid: euclid.aos/1205420513
[31] van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes. Springer, New York.
Mathematical Reviews (MathSciNet): MR1385671
Zentralblatt MATH: 0862.60002
[32] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
[33] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
Mathematical Reviews (MathSciNet): MR2435448
Zentralblatt MATH: 1142.62044
Digital Object Identifier: doi:10.1214/07-AOS520
Project Euclid: euclid.aos/1216237292
[34] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
Mathematical Reviews (MathSciNet): MR2279469
Zentralblatt MATH: 1171.62326
Digital Object Identifier: doi:10.1198/016214506000000735

2009 © Institute of Mathematical Statistics