Open Access
February 2006 Classifier Technology and the Illusion of Progress
David J. Hand
Statist. Sci. 21(1): 1-14 (February 2006). DOI: 10.1214/088342306000000060
Abstract

A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.

References

1.

Adams, N. M. and Hand, D. J. (1999). Comparing classifiers when the misallocation costs are uncertain. Pattern Recognition 32 1139--1147. 1059.62065 Adams, N. M. and Hand, D. J. (1999). Comparing classifiers when the misallocation costs are uncertain. Pattern Recognition 32 1139--1147. 1059.62065

2.

Benton, T. C. (2002). Theoretical and empirical models. Ph.D. dissertation, Dept. Mathematics, Imperial College London. Benton, T. C. (2002). Theoretical and empirical models. Ph.D. dissertation, Dept. Mathematics, Imperial College London.

3.

Breiman, L. (2001). Statistical modeling: The two cultures (with discussion). Statist. Sci. 16 199--231. MR1874152 10.1214/ss/1009213726 euclid.ss/1009213726  1059.62505 Breiman, L. (2001). Statistical modeling: The two cultures (with discussion). Statist. Sci. 16 199--231. MR1874152 10.1214/ss/1009213726 euclid.ss/1009213726  1059.62505

4.

Duin, R. P. W. (1996). A note on comparing classifiers. Pattern Recognition Letters 17 529--536. Duin, R. P. W. (1996). A note on comparing classifiers. Pattern Recognition Letters 17 529--536.

5.

Efron, B. (2001). Comment on ``Statistical modeling: The two cultures,'' by L. Breiman. Statist. Sci. 16 218--219. MR1861072 10.1214/ss/1009213290 euclid.ss/1009213290  1059.01542 Efron, B. (2001). Comment on ``Statistical modeling: The two cultures,'' by L. Breiman. Statist. Sci. 16 218--219. MR1861072 10.1214/ss/1009213290 euclid.ss/1009213290  1059.01542

6.

Fawcett, T. and Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery 1 291--316. Fawcett, T. and Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery 1 291--316.

7.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 179--188. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 179--188.

8.

Friedman, C. P. and Wyatt, J. C. (1997). Evaluation Methods in Medical Informatics. Springer. New York. Friedman, C. P. and Wyatt, J. C. (1997). Evaluation Methods in Medical Informatics. Springer. New York.

9.

Friedman, J. H. (1997). On bias, variance, 0/1 loss, and the curse of dimensionality. Data Mining and Knowledge Discovery 1 55--77. Friedman, J. H. (1997). On bias, variance, 0/1 loss, and the curse of dimensionality. Data Mining and Knowledge Discovery 1 55--77.

10.

Gallagher, J. C., Hedlund, L. R., Stoner, S. and Meeger, C. (1988). Vertebral morphometry: Normative data. Bone and Mineral 4 189--196. Gallagher, J. C., Hedlund, L. R., Stoner, S. and Meeger, C. (1988). Vertebral morphometry: Normative data. Bone and Mineral 4 189--196.

11.

Hand, D. J. (1981). Discrimination and Classification. Wiley, Chichester. MR634676 0587.62119 Hand, D. J. (1981). Discrimination and Classification. Wiley, Chichester. MR634676 0587.62119

12.

Hand, D. J. (1996). Classification and computers: Shifting the focus. In COMPSTAT-96: Proceedings in Computational Statistics (A. Prat, ed.) 77--88. Physica, Berlin. MR1602244 Hand, D. J. (1996). Classification and computers: Shifting the focus. In COMPSTAT-96: Proceedings in Computational Statistics (A. Prat, ed.) 77--88. Physica, Berlin. MR1602244

13.

Hand, D. J. (1997). Construction and Assessment of Classification Rules. Wiley, Chichester. 0997.62500 Hand, D. J. (1997). Construction and Assessment of Classification Rules. Wiley, Chichester. 0997.62500

14.

Hand, D. J. (1998). Strategy, methods, and solving the right problems. Comput. Statist. 13 5--14. Hand, D. J. (1998). Strategy, methods, and solving the right problems. Comput. Statist. 13 5--14.

15.

Hand, D. J. (1999). Intelligent data analysis and deep understanding. In Causal Models and Intelligent Data Management (A. Gammerman, ed.) 67--80. Springer, Berlin. MR1722705 Hand, D. J. (1999). Intelligent data analysis and deep understanding. In Causal Models and Intelligent Data Management (A. Gammerman, ed.) 67--80. Springer, Berlin. MR1722705

16.

Hand, D. J. (2001). Modelling consumer credit risk. IMA J. Management Mathematics 12 139--155. Hand, D. J. (2001). Modelling consumer credit risk. IMA J. Management Mathematics 12 139--155.

17.

Hand, D. J. (2001). Reject inference in credit operations. In Handbook of Credit Scoring (E. Mays, ed.) 225--240. Glenlake, Chicago. Hand, D. J. (2001). Reject inference in credit operations. In Handbook of Credit Scoring (E. Mays, ed.) 225--240. Glenlake, Chicago.

18.

Hand, D. J. (2004). Academic obsessions and classification realities: Ignoring practicalities in supervised classification. In Classification, Clustering and Data Mining Applications (D. Banks, L. House, F. R. McMorris, P. Arabie and W. Gaul, eds.) 209--232. Springer, Berlin. MR2113611 Hand, D. J. (2004). Academic obsessions and classification realities: Ignoring practicalities in supervised classification. In Classification, Clustering and Data Mining Applications (D. Banks, L. House, F. R. McMorris, P. Arabie and W. Gaul, eds.) 209--232. Springer, Berlin. MR2113611

19.

Hand, D. J. (2005). Supervised classification and tunnel vision. Applied Stochastic Models in Business and Industry 21 97--109. MR2137544 10.1002/asmb.540 1089.62077 Hand, D. J. (2005). Supervised classification and tunnel vision. Applied Stochastic Models in Business and Industry 21 97--109. MR2137544 10.1002/asmb.540 1089.62077

20.

Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A review. J. Roy. Statist. Soc. Ser. A 160 523--541. Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A review. J. Roy. Statist. Soc. Ser. A 160 523--541.

21.

Hoadley, B. (2001). Comment on ``Statistical modeling: The two cultures,'' by L. Breiman. Statist. Sci. 16 220--224. MR1874152 10.1214/ss/1009213726 euclid.ss/1009213726  1059.62505 Hoadley, B. (2001). Comment on ``Statistical modeling: The two cultures,'' by L. Breiman. Statist. Sci. 16 220--224. MR1874152 10.1214/ss/1009213726 euclid.ss/1009213726  1059.62505

22.

Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning 11 63--90. Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning 11 63--90.

23.

Jamain, A. (2004). Meta-analysis of classification methods. Ph.D. dissertation, Dept. Mathematics, Imperial College London. Jamain, A. (2004). Meta-analysis of classification methods. Ph.D. dissertation, Dept. Mathematics, Imperial College London.

24.

Jamain, A. and Hand, D. J. (2005). Mining supervised classification performance studies: A meta-analytic investigation. Technical report, Dept. Mathematics, Imperial College London. Jamain, A. and Hand, D. J. (2005). Mining supervised classification performance studies: A meta-analytic investigation. Technical report, Dept. Mathematics, Imperial College London.

25.

Kelly, M. G and Hand, D. J. (1999). Credit scoring with uncertain class definitions. IMA J. Mathematics Management 10 331--345. Kelly, M. G and Hand, D. J. (1999). Credit scoring with uncertain class definitions. IMA J. Mathematics Management 10 331--345.

26.

Kelly, M. G., Hand, D. J. and Adams, N. M. (1998). Defining the goals to optimise data mining performance. In Proc. Fourth International Conference on Knowledge Discovery and Data Mining (R. Agrawal, P. Stolorz and G. Piatetsky-Shapiro, eds.) 234--238. AAAI Press, Menlo Park, CA. Kelly, M. G., Hand, D. J. and Adams, N. M. (1998). Defining the goals to optimise data mining performance. In Proc. Fourth International Conference on Knowledge Discovery and Data Mining (R. Agrawal, P. Stolorz and G. Piatetsky-Shapiro, eds.) 234--238. AAAI Press, Menlo Park, CA.

27.

Kelly, M. G., Hand, D. J. and Adams, N. M. (1999). The impact of changing populations on classifier performance. In Proc. Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (S. Chaudhuri and D. Madigan, eds.) 367--371. ACM, New York. Kelly, M. G., Hand, D. J. and Adams, N. M. (1999). The impact of changing populations on classifier performance. In Proc. Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (S. Chaudhuri and D. Madigan, eds.) 367--371. ACM, New York.

28.

Kelly, M. G., Hand, D. J. and Adams, N. M. (1999). Supervised classification problems: How to be both judge and jury. In Advances in Intelligent Data Analysis. Lecture Notes in Comput. Sci. 1642 235--244. Springer, Berlin. MR1723394 Kelly, M. G., Hand, D. J. and Adams, N. M. (1999). Supervised classification problems: How to be both judge and jury. In Advances in Intelligent Data Analysis. Lecture Notes in Comput. Sci. 1642 235--244. Springer, Berlin. MR1723394

29.

Klinkenberg, R. and Thorsten, J. (2000). Detecting concept drift with support vector machines. In Proc. 17th International Conference on Machine Learning (P. Langley, ed.) 487--494. Morgan Kaufmann, San Francisco. Klinkenberg, R. and Thorsten, J. (2000). Detecting concept drift with support vector machines. In Proc. 17th International Conference on Machine Learning (P. Langley, ed.) 487--494. Morgan Kaufmann, San Francisco.

30.

Lane, T. and Brodley, C. E. (1998). Approaches to online learning and concept drift for user identification in computer security. In Proc. Fourth International Conference on Knowledge Discovery and Data Mining (R. Agrawal, P. Stolorz and G. Piatetsky-Shapiro, eds.) 259--263. AAAI Press, Menlo Park, CA. Lane, T. and Brodley, C. E. (1998). Approaches to online learning and concept drift for user identification in computer security. In Proc. Fourth International Conference on Knowledge Discovery and Data Mining (R. Agrawal, P. Stolorz and G. Piatetsky-Shapiro, eds.) 259--263. AAAI Press, Menlo Park, CA.

31.

Lewis, E. M. (1990). An Introduction to Credit Scoring. Athena, San Rafael, CA. Lewis, E. M. (1990). An Introduction to Credit Scoring. Athena, San Rafael, CA.

32.

Li, H. G. and Hand, D. J. (2002). Direct versus indirect credit scoring classifications. J. Operational Research Society 53 647--654. Li, H. G. and Hand, D. J. (2002). Direct versus indirect credit scoring classifications. J. Operational Research Society 53 647--654.

33.

McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York. MR1190469 McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York. MR1190469

34.

Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning 4 227--243. Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning 4 227--243.

35.

Newman, D. J., Hettich, S., Blake, C. L. and Merz, C. J. (1998). UCI repository of machine learning databases. Dept. Information and Computer Sciences, Univ. California, Irvine. Available at www.ics.uci.edu/~mlearn/MLRepository.html. Newman, D. J., Hettich, S., Blake, C. L. and Merz, C. J. (1998). UCI repository of machine learning databases. Dept. Information and Computer Sciences, Univ. California, Irvine. Available at www.ics.uci.edu/~mlearn/MLRepository.html.

36.

Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning 42 203--231. Provost, F. and Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning 42 203--231.

37.

Rendell, A. L. and Seshu, R. (1990). Learning hard concepts through constructive induction: Framework and rationale. Computational Intelligence 6 247--270. Rendell, A. L. and Seshu, R. (1990). Learning hard concepts through constructive induction: Framework and rationale. Computational Intelligence 6 247--270.

38.

Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press. MR1438788 0853.62046 Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press. MR1438788 0853.62046

39.

Rosenberg, E. and Gleit, A. (1994). Quantitative methods in credit management: A survey. Oper. Res. 42 589--613. Rosenberg, E. and Gleit, A. (1994). Quantitative methods in credit management: A survey. Oper. Res. 42 589--613.

40.

Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery 1 317--328. Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery 1 317--328.

41.

Shavlik, J., Mooney, R. J. and Towell, G. (1991). Symbolic and neural learning algorithms: An experimental comparison. Machine Learning 6 111--143. 1141.68327 Shavlik, J., Mooney, R. J. and Towell, G. (1991). Symbolic and neural learning algorithms: An experimental comparison. Machine Learning 6 111--143. 1141.68327

42.

Thomas, L. C. (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International J. Forecasting 16 149--172. Thomas, L. C. (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International J. Forecasting 16 149--172.

43.

von Winterfeldt, D. and Edwards, W. (1982). Costs and payoffs in perceptual research. Psychological Bulletin 91 609--622. von Winterfeldt, D. and Edwards, W. (1982). Costs and payoffs in perceptual research. Psychological Bulletin 91 609--622.

44.

Webb, A. R. (2002). Statistical Pattern Recognition, 2nd ed. Wiley, Chichester. MR2191640 Webb, A. R. (2002). Statistical Pattern Recognition, 2nd ed. Wiley, Chichester. MR2191640

45.

Weiss, S. M., Galen, R. S. and Tadepalli, P. V. (1990). Maximizing the predictive value of production rules. Artificial Intelligence 45 47--71. 0899.68070 Weiss, S. M., Galen, R. S. and Tadepalli, P. V. (1990). Maximizing the predictive value of production rules. Artificial Intelligence 45 47--71. 0899.68070

46.

Widmer, G. and Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning 23 69--101. Widmer, G. and Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning 23 69--101.

47.

Zahavi, J. and Levin, N. (1997). Issues and problems in applying neural computing to target marketing. J. Direct Marketing 11(4) 63--75. Zahavi, J. and Levin, N. (1997). Issues and problems in applying neural computing to target marketing. J. Direct Marketing 11(4) 63--75.
Copyright © 2006 Institute of Mathematical Statistics
David J. Hand "Classifier Technology and the Illusion of Progress," Statistical Science 21(1), 1-14, (February 2006). https://doi.org/10.1214/088342306000000060
Published: February 2006
Vol.21 • No. 1 • February 2006
Back to Top