The Annals of Statistics

Doubly penalized estimation in additive regression with high-dimensional data

Zhiqiang Tan and Cun-Hui Zhang

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Additive regression provides an extension of linear regression by modeling the signal of a response as a sum of functions of covariates of relatively low complexity. We study penalized estimation in high-dimensional nonparametric additive regression where functional semi-norms are used to induce smoothness of component functions and the empirical $L_{2}$ norm is used to induce sparsity. The functional semi-norms can be of Sobolev or bounded variation types and are allowed to be different amongst individual component functions. We establish oracle inequalities for the predictive performance of such methods under three simple technical conditions: a sub-Gaussian condition on the noise, a compatibility condition on the design and the functional classes under consideration and an entropy condition on the functional classes. For random designs, the sample compatibility condition can be replaced by its population version under an additional condition to ensure suitable convergence of empirical norms. In homogeneous settings where the complexities of the component functions are of the same order, our results provide a spectrum of minimax convergence rates, from the so-called slow rate without requiring the compatibility condition to the fast rate under the hard sparsity or certain $L_{q}$ sparsity to allow many small components in the true regression function. These results significantly broaden and sharpen existing ones in the literature.

Article information

Ann. Statist., Volume 47, Number 5 (2019), 2567-2600.

Received: April 2017
Revised: July 2018
First available in Project Euclid: 3 August 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 62E20: Asymptotic distribution theory 62F25: Tolerance and confidence regions 62F35: Robustness and adaptive procedures
Secondary: 62J05: Linear regression 62J12: Generalized linear models

Additive model bounded variation space ANOVA model high-dimensional data metric entropy penalized estimation teproducing kernel Hilbert space Sobolev space total variation trend filtering


Tan, Zhiqiang; Zhang, Cun-Hui. Doubly penalized estimation in additive regression with high-dimensional data. Ann. Statist. 47 (2019), no. 5, 2567--2600. doi:10.1214/18-AOS1757.

Export citation


  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • Dalalyan, A., Ingster, Y. and Tsybakov, A. B. (2014). Statistical inference in compound functional models. Probab. Theory Related Fields 158 513–532.
  • DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 303. Springer, Berlin.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Gu, C. (2002). Smoothing Spline ANOVA Models. Springer Series in Statistics. Springer, New York.
  • Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43. CRC Press, London.
  • Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.
  • Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009). $l_{1}$ trend filtering. SIAM Rev. 51 339–360.
  • Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
  • Koltchinskii, V. and Yuan, M. (2010). Sparsity in multiple kernel learning. Ann. Statist. 38 3660–3695.
  • Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin.
  • Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272–2297.
  • Lorentz, G. G., Golitschek, M. V. and Makovoz, Y. (1996). Constructive Approximation: Advanced Problems. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 304. Springer, Berlin.
  • Mammen, E. (1991). Nonparametric regression under qualitative smoothness assumptions. Ann. Statist. 19 741–759.
  • Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387–413.
  • Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist. 37 3779–3821.
  • Müller, P. and van de Geer, S. (2015). The partial linear model in high dimensions. Scand. J. Stat. 42 580–608.
  • Nirenberg, L. (1966). An extended interpolation inequality. Ann. Sc. Norm. Super. Pisa Cl. Sci. (3) 20 733–737.
  • Petersen, A., Witten, D. and Simon, N. (2016). Fused lasso additive model. J. Comput. Graph. Statist. 25 1005–1025.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 13 389–427.
  • Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 1009–1030.
  • Sadhanala, V. and Tibshirani, R. J. (2017). Additive models with trend filtering. Preprint. Available at arXiv:1702.05037.
  • Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040–1053.
  • Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689–705.
  • Suzuki, T. and Sugiyama, M. (2013). Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness. Ann. Statist. 41 1381–1405.
  • Tan, Z. and Zhang, C.-H. (2019). Supplement to “Doubly penalized estimation in additive regression with high-dimensional data.” DOI:10.1214/18-AOS1757SUPP.
  • Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. Ann. Statist. 42 285–323.
  • van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge.
  • van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, New York.
  • Yang, T. and Tan, Z. (2018). Backfitting algorithms for total-variation and empirical-norm penalized additive modelling with high-dimensional data. Stat 7 e198.
  • Yang, Y. and Tokdar, S. T. (2015). Minimax-optimal nonparametric regression in high dimensions. Ann. Statist. 43 652–674.
  • Yuan, M. and Zhou, D.-X. (2016). Minimax optimal rates of estimation in high dimensional additive models. Ann. Statist. 44 2564–2593.

Supplemental materials

  • Supplement to “Doubly penalized estimation in additive regression with high-dimensional data”. We provide proofs and technical tools.