Source: Statist. Sci.
Volume 16, Issue 3
There are two cultures in the use of statistical modeling to reach
conclusions from data. One assumes that the data are generated by a given
stochastic data model. The other uses algorithmic models and treats the data
mechanism as unknown. The statistical community has been committed to the
almost exclusive use of data models. This commitment has led to irrelevant
theory, questionable conclusions, and has kept statisticians from working on a
large range of interesting current problems. Algorithmic modeling, both in
theory and practice, has developed rapidly in fields outside statistics. It can
be used both on large complex data sets and as a more accurate and informative
alternative to data modeling on smaller data sets. If our goal as a field is to
use data to solve problems, then we need to move away from exclusive dependence
on data models and adopt a more diverse set of tools.
Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation 9 1545- 1588.
Arena, C., Sussman, N., Chiang, K., Mazumdar, S., Macina, O. and Li, W. (2000). Bagging Structure-Activity Relationships: A simulation study for assessing misclassification rates. Presented at the Second Indo-U.S. Workshop on Mathematical Chemistry, Duluth, MI. (Available at NSussman@server.ceoh.pitt.edu).
Bickel, P., Ritov, Y. and Stoker, T. (2001). Tailor-made tests for goodness of fit for semiparametric hy potheses. Unpublished manuscript.
Breiman, L. (1998). Arcing classifiers. Discussion paper, Ann. Statist. 26 801-824.
Breiman, L. (2000). Some infinity theory for tree ensembles. (Available at www.stat.berkeley.edu/technical reports).
Breiman, L. (2001). Random forests. Machine Learning J. 45 5- 32.
Breiman, L. and Friedman, J. (1985). Estimating optimal transformations in multiple regression and correlation. J. Amer. Statist. Assoc. 80 580-619.
Mathematical Reviews (MathSciNet): MR803258
Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
Cristianini, N. and Shawe-Tay lor, J. (2000). An Introduction to Support Vector Machines. Cambridge Univ. Press.
Daniel, C. and Wood, F. (1971). Fitting equations to data. Wiley, New York.
Dempster, A. (1998). Logicist statistic 1. Models and Modeling. Statist. Sci. 13 3 248-276.
Diaconis, P. and Efron, B. (1983). Computer intensive methods in statistics. Scientific American 248 116-131.
Mathematical Reviews (MathSciNet): MR773679
Domingos, P. (1998). Occam's two razors: the sharp and the blunt. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (R. Agrawal and P. Stolorz, eds.) 37-43. AAAI Press, Menlo Park, CA.
Domingos, P. (1999). The role of Occam's razor in knowledge discovery. Data Mining and Knowledge Discovery 3 409-425.
Dudoit, S., Fridly and, J. and Speed, T. (2000). Comparison of discrimination methods for the classification of tumors. (Available at www.stat.berkeley.edu/technical reports).
Freedman, D. (1987). As others see us: a case study in path analysis (with discussion). J. Ed. Statist. 12 101-223.
Freedman, D. (1991). Statistical models and shoe leather. Sociological Methodology 1991 (with discussion) 291-358.
Freedman, D. (1991). Some issues in the foundations of statistics. Foundations of Science 1 19-83.
Freedman, D. (1994). From association to causation via regression. Adv. in Appl. Math. 18 59-110.
Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference 148-156. Morgan Kaufmann, San Francisco.
Friedman, J. (1999). Greedy predictive approximation: a gradient boosting machine. Technical report, Dept. Statistics Stanford Univ.
Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. Ann. Statist. 28 337-407.
Gifi, A. (1990). Nonlinear Multivariate Analy sis. Wiley, New York.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Trans. Pattern Analy sis and Machine Intelligence 20 832-844.
Landswher, J., Preibon, D. and Shoemaker, A. (1984). Graphical methods for assessing logistic regression models (with discussion). J. Amer. Statist. Assoc. 79 61-83.
McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR727836
Meisel, W. (1972). Computer-Oriented Approaches to Pattern Recognition. Academic Press, New York.
Michie, D., Spiegelhalter, D. and Tay lor, C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York.
Mosteller, F. and Tukey, J. (1977). Data Analy sis and Regression. Addison-Wesley, Redding, MA.
Mountain, D. and Hsiao, C. (1989). A combined structural and flexible functional approach for modelenery substitution. J. Amer. Statist. Assoc. 84 76-87.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. B 36 111-147.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, New York.
Vapnik, V (1998). Statistical Learning Theory. Wiley, New York.
Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.