References
[1] Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9 1545–1588.
[2] Audrino, F. and Barone-Adesi, G. (2005). Functional gradient descent for financial time series with an application to the measurement of market risk. J. Banking and Finance 29 959–977.
[3] Audrino, F. and Barone-Adesi, G. (2005). A multivariate FGD technique to improve VaR computation in equity markets. Comput. Management Sci. 2 87–106.
[4] Audrino, F. and Bühlmann, P. (2003). Volatility estimation with functional gradient descent for very high-dimensional financial time series. J. Comput. Finance 6 65–89.
[5] Bartlett, P. (2003). Prediction algorithms: Complexity, concentration and convexity. In Proceedings of the 13th IFAC Symp. on System Identification.
[6] Bartlett, P. L., Jordan, M. and McAuliffe, J. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.
[7] Bartlett, P. and Traskin, M. (2007). AdaBoost is consistent. J. Mach. Learn. Res. 8 2347–2368.
[8] Benner, A. (2002). Application of “aggregated classifiers” in survival time studies. In Proceedings in Computational Statistics (COMPSTAT) (W. Härdle and B. Rönz, eds.) 171–176. Physica-Verlag, Heidelberg.
[9] Binder, H. (2006). GAMBoost: Generalized additive models by likelihood based boosting. R package version 0.9-3. Available at http://CRAN.R-project.org.
[10] Bissantz, N., Hohage, T., Munk, A. and Ruymgaart, F. (2007). Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J. Numer. Anal. 45 2610–2636.
[11] Blake, C. L. and Merz, C. J. (1998). UCI repository of machine learning databases. Available at http://www.ics.uci.edu/~mlearn/MLRepository.html.
[12] Blanchard, G., Lugosi, G. and Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers. J. Machine Learning Research 4 861–894.
[13] Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373–384.
[14] Breiman, L. (1996). Bagging predictors. Machine Learning 24 123–140.
[15] Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801–849.
[16] Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493–1517.
[17] Breiman, L. (2001). Random forests. Machine Learning 45 5–32.
[18] Bühlmann, P. (2006). Boosting for high-dimensional linear models. Ann. Statist. 34 559–583.
[19] Bühlmann, P. (2007). Twin boosting: Improved feature selection and prediction. Technical report, ETH Zürich. Available at ftp://ftp.stat.math.ethz.ch/Research-Reports/Other-Manuscripts/buhlmann/TwinBoosting1.pdf.
[20] Bühlmann, P. and Lutz, R. (2006). Boosting algorithms: With an application to bootstrapping multivariate time series. In The Frontiers in Statistics (J. Fan and H. Koul, eds.) 209–230. Imperial College Press, London.
[21] Bühlmann, P. and Yu, B. (2000). Discussion on “Additive logistic regression: A statistical view,” by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 377–386.
[22] Bühlmann, P. and Yu, B. (2003). Boosting with the L2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339.
[23] Bühlmann, P. and Yu, B. (2006). Sparse boosting. J. Machine Learning Research 7 1001–1024.
[24] Buja, A., Stuetzle, W. and Shen, Y. (2005). Loss functions for binary class probability estimation: Structure and applications. Technical report, Univ. Washington. Available at http://www.stat.washington.edu/wxs/Learning-papers/paper-proper-scoring.pdf.
[25] Dettling, M. (2004). BagBoosting for tumor classification with gene expression data. Bioinformatics 20 3583–3593.
[26] Dettling, M. and Bühlmann, P. (2003). Boosting for tumor classification with gene expression data. Bioinformatics 19 1061–1069.
[27] DiMarzio, M. and Taylor, C. (2008). On boosting kernel regression. J. Statist. Plann. Inference. To appear.
[28] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
[29] Freund, Y. and Schapire, R. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the Second European Conference on Computational Learning Theory. Springer, Berlin.
[30] Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA.
[31] Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139.
[32] Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232.
[33] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407.
[34] Garcia, A. L., Wagner, K., Hothorn, T., Koebnick, C., Zunft, H. J. and Trippo, U. (2005). Improved prediction of body fat by measuring skinfold thickness, circumferences, and bone breadths. Obesity Research 13 626–634.
[35] Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, M., Iacus, S., Irizarry, R., Leisch, F., Li, C., Mächler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. Y. and Zhang, J. (2004). Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5 R80.
[36] Green, P. and Silverman, B. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman and Hall, New York.
[37] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10 971–988.
[38] Hansen, M. and Yu, B. (2001). Model selection and minimum description length principle. J. Amer. Statist. Assoc. 96 746–774.
[39] Hastie, T. and Efron, B. (2004). Lars: Least angle regression, lasso and forward stagewise. R package version 0.9-7. Available at http://CRAN.R-project.org.
[40] Hastie, T. and Tibshirani, R. (1986). Generalized additive models (with discussion). Statist. Sci. 1 297–318.
Mathematical Reviews (MathSciNet):
MR858512
[41] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
[42] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning; Data Mining, Inference and Prediction. Springer, New York.
[43] Hothorn, T. and Bühlmann, P. (2007). Mboost: Model-based boosting. R package version 0.5-8. Available at http://CRAN.R-project.org/.
[44] Hothorn, T. and Bühlmann, P. (2006). Model-based boosting in high dimensions. Bioinformatics 22 2828–2829.
[45] Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A. and van der Laan, M. (2006). Survival ensembles. Biostatistics 7 355–373.
[46] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Party: A laboratory for recursive part(y)itioning. R package version 0.9-11. Available at http://CRAN.R-project.org/.
[47] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651–674.
[48] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression. Statist. Sinica. To appear.
[49] Hurvich, C., Simonoff, J. and Tsai, C.-L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. Roy. Statist. Soc. Ser. B 60 271–293.
[50] Iyer, R., Lewis, D., Schapire, R., Singer, Y. and Singhal, A. (2000). Boosting for document routing. In Proceedings of CIKM-00, 9th ACM Int. Conf. on Information and Knowledge Management (A. Agah, J. Callan and E. Rundensteiner, eds.). ACM Press, New York.
[51] Jiang, W. (2004). Process consistency for AdaBoost (with discussion). Ann. Statist. 32 13–29, 85–134.
[52] Kearns, M. and Valiant, L. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. J. Assoc. Comput. Machinery 41 67–95.
[53] Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 1–50.
[54] Leitenstorfer, F. and Tutz, G. (2006). Smoothing with curvature constraints based on boosting techniques. In Proceedings in Computational Statistics (COMPSTAT) (A. Rizzi and M. Vichi, eds.). Physica-Verlag, Heidelberg.
[55] Leitenstorfer, F. and Tutz, G. (2007). Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics 8 654–673.
[56] Leitenstorfer, F. and Tutz, G. (2007). Knot selection by boosting techniques. Comput. Statist. Data Anal. 51 4605–4621.
[57] Lozano, A., Kulkarni, S. and Schapire, R. (2006). Convergence and consistency of regularized boosting algorithms with stationary β-mixing observations. In Advances in Neural Information Processing Systems (Y. Weiss, B. Schölkopf and J. Platt, eds.) 18. MIT Press.
[58] Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods (with discussion). Ann. Statist. 32 30–55, 85–134.
[59] Lutz, R. and Bühlmann, P. (2006). Boosting for high-multivariate responses in high-dimensional linear regression. Statist. Sinica 16 471–494.
[60] Mallat, S. and Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 41 3397–3415.
[61] Mannor, S., Meir, R. and Zhang, T. (2003). Greedy algorithms for classification–consistency, convergence rates, and adaptivity. J. Machine Learning Research 4 713–741.
[62] Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers (A. Smola, P. Bartlett, B. Schölkopf and D. Schuurmans, eds.) 221–246. MIT Press, Cambridge.
[63] McCaffrey, D. F., Ridgeway, G. and Morral, A. R. G. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods 9 403–425.
[64] Mease, D., Wyner, A. and Buja, A. (2007). Cost-weighted boosting with jittering and over/under-sampling: JOUS-boost. J. Machine Learning Research 8 409–439.
[65] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
[66] Meir, R. and Rätsch, G. (2003). An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning (S. Mendelson and A. Smola, eds.). Springer, Berlin.
[67] Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403.
[68] Park, M.-Y. and Hastie, T. (2007). An L1 regularization-path algorithm for generalized linear models. J. Roy. Statist. Soc. Ser. B 69 659–677.
[69] R Development Core Team (2006). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org.
[70] Rätsch, G., Onoda, T. and Müller, K. (2001). Soft margins for AdaBoost. Machine Learning 42 287–320.
[71] Ridgeway, G. (1999). The state of boosting. Comput. Sci. Statistics 31 172–181.
[72] Ridgeway, G. (2000). Discussion on “Additive logistic regression: A statistical view of boosting,” by J. Friedman, T. Hastie, R. Tibshirani. Ann. Statist. 28 393–400.
[73] Ridgeway, G. (2002). Looking for lumps: Boosting and bagging for density estimation. Comput. Statist. Data Anal. 38 379–392.
[74] Ridgeway, G. (2006). Gbm: Generalized boosted regression models. R package version 1.5-7. Available at http://www.i-pensieri.com/gregr/gbm.shtml.
[75] Schapire, R. (1990). The strength of weak learnability. Machine Learning 5 197–227.
[76] Schapire, R. (2002). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification. Lecture Notes in Statist. 171 149–171. Springer, New York.
[77] Schapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651–1686.
[78] Schapire, R. and Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning 39 135–168.
[79] Southwell, R. (1946). Relaxation Methods in Theoretical Physics. Oxford, at the Clarendon Press.
Mathematical Reviews (MathSciNet):
MR18983
[80] Street, W. N., Mangasarian, O. L., and Wolberg, W. H. (1995). An inductive learning approach to prognostic prediction. In Proceedings of the Twelfth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA.
[81] Temlyakov, V. (2000). Weak greedy algorithms. Adv. Comput. Math. 12 213–227.
[82] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
[83] Tukey, J. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.
[84] Tutz, G. and Binder, H. (2006). Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics 62 961–971.
[85] Tutz, G. and Binder, H. (2007). Boosting Ridge regression. Comput. Statist. Data Anal. 51 6044–6059.
[86] Tutz, G. and Hechenbichler, K. (2005). Aggregating classifiers with ordinal response structure. J. Statist. Comput. Simul. 75 391–408.
[87] Tutz, G. and Leitenstorfer, F. (2007). Generalized smooth monotonic regression in additive modelling. J. Comput. Graph. Statist. 16 165–188.
[88] Tutz, G. and Reithinger, F. (2007). Flexible semiparametric mixed models. Statistics in Medicine 26 2872–2900.
[89] van der Laan, M. and Robins, J. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer, New York.
[90] West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J. and Nevins, J. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA 98 11462–11467.
[91] Yao, Y., Rosasco, L. and Caponnetto, A. (2007). On early stopping in gradient descent learning. Constr. Approx. 26 289–315.
[92] Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538–1579.
[93] Zhao, P. and Yu, B. (2007). Stagewise Lasso. J. Mach. Learn. Res. 8 2701–2726.
[94] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Machine Learning Research 7 2541–2563.
[95] Zhu, J., Rosset, S., Zou, H. and Hastie, T. (2005). Multiclass AdaBoost. Technical report, Stanford Univ. Available at http://www-stat.stanford.edu/~hastie/Papers/samme.pdf.
[96] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.