Annals of Statistics

Approximation and learning by greedy algorithms

Andrew R. Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A. DeVore

Full-text: Open access


We consider the problem of approximating a given element f from a Hilbert space $\mathcal{H}$ by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. For all these algorithms, we prove convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary. We then show how these bounds for convergence rates lead to a new theory for the performance of greedy algorithms in learning. In particular, we build upon the results in [IEEE Trans. Inform. Theory 42 (1996) 2118–2132] to construct learning algorithms based on greedy approximations which are universally consistent and provide provable convergence rates for large classes of functions. The use of greedy algorithms in the context of learning is very appealing since it greatly reduces the computational burden when compared with standard model selection using general dictionaries.

Article information

Ann. Statist., Volume 36, Number 1 (2008), 64-94.

First available in Project Euclid: 1 February 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G07: Density estimation 41A46: Approximation by arbitrary nonlinear expressions; widths and entropy 41A63: Multidimensional problems (should also be assigned at least one other classification number in this section) 46N30: Applications in probability theory and statistics

Nonparametric regression statistical learning convergence rates for greedy algorithms interpolation spaces neural networks


Barron, Andrew R.; Cohen, Albert; Dahmen, Wolfgang; DeVore, Ronald A. Approximation and learning by greedy algorithms. Ann. Statist. 36 (2008), no. 1, 64--94. doi:10.1214/009053607000000631.

Export citation


  • Avellaneda, M., Davis, G. and Mallat, S. (1997). Adaptive greedy approximations. Constr. Approx. 13 57–98.
  • Barron, A. R. (1990). Complexity regularization with application to artificial neural network. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 561–576. Kluwer Academic Publishers, Dordrecht.
  • Barron, A. R. (1992). Neural net approximation. Proc. 7th Yale Workshop on Adaptive and Learning Systems (K. S. Narendra, ed.) 1 69–72. New Haven, CT.
  • Barron, A. R. (1993). Universal approximation bounds for superposition of n sigmoidal functions. IEEE Trans. Inform. Theory 39 930–945.
  • Barron, A. and Cheang, G. H. L. (2001). Penalized least squares, model selection, convex hull classes, and neural nets. In Proceedings of the 9th ESANN, Brugge, Belgium (M. Verleysen, ed.) 371–376. De-Facto Press.
  • Bennett, C. and Sharpley, R. (1988). Interpolation of Operators. Academic Press, Boston.
  • Bergh, J. and Löfström, J. (1976). Interpolation Spaces. Springer, Berlin.
  • DeVore, R. (1998). Nonlinear approximation. In Acta Numerica 7 51–150. Cambridge Univ. Press.
  • DeVore, R. and Temlyakov, V. (1996). Some remarks on greedy algorithms. Adv. Comput. Math. 5 173–187.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
  • Györfy, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, Berlin.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Springer, New York.
  • Huang, C., Cheng, G. L. H. and Barron, A. R. Risk of penalized least squares, greedy term selection, and L1-penalized estimators from flexible function libraries. Yale Department of Statistics Report.
  • Jones, L. K. (1992). A simple lemma on greedy approximation in Hilbert spaces and convergence rates for projection pursuit regression and neural network training. Ann. Statist. 20 608–613.
  • Konyagin, S. V. and Temlyakov, V. N. (1999). Rate of convergence of pure greedy algorithm. East J. Approx. 5 493–499.
  • Kurkova, V. and Sanguineti, M. (2001). Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inform. Theory 47 2659–2665.
  • Kurkova, V. and Sanguineti, M. (2002). Comparison of worst case errors in linear and neural network approximation. IEEE Trans. Inform. Theory 48 264–275.
  • Lee, W. S., Bartlett, P. and Williamson, R. C. (1996). Efficient agnostic learning of neural networks with bounded fan-in. IEEE Trans. Inform. Theory 42 2118–2132.
  • Livshitz, E. D. and Temlyakov, V. N. (2003). Two lower estimates in greedy approximation. Constr. Approx. 19 509–524.
  • Petrushev, P. P. (1998). Approximation by ridge functions and neural networks. SIAM J. Math. Anal. 30 155–189.
  • Temlyakov, V. (2003). Nonlinear methods of approximation. J. FOCM 3 33–107.
  • Temlyakov, V. (2005). Greedy algorithms with restricted depth search. Proc. of the Steklov Inst. Math. 248 255–267.
  • Tibshirani, R. (1995). Regression shrinkage and selection via the LASSO. J. Roy. Statist. Soc. Ser. B 58 267–288.