The Annals of Statistics

An overtraining-resistant stochastic modeling method for pattern recognition

E. M. Kleinberg

Full-text: Open access


We will introduce a generic approach for solving problems in pattern recognition based on the synthesis of accurate multiclass discriminators from large numbers of very inaccurate "weak" models through the use of discrete stochastic processes. Contrary to the standard expectation held for the many statistical and heuristic techniques normally associated with the field, a significant feature of this method of "stochastic modeling" is its resistance to so-called "overtraining." The drop in performance of any stochastic model in going from training to test data remains comparable to that of the component weak models from which it is synthesized; and since these component models are very simple, their performance drop is small, resulting in a stochastic model whose performance drop is also small despite its high level of accuracy.

Article information

Ann. Statist., Volume 24, Number 6 (1996), 2319-2349.

First available in Project Euclid: 16 September 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30} 68T05: Learning and adaptive systems [See also 68Q32, 91E40]

Pattern recognition machine learning


Kleinberg, E. M. An overtraining-resistant stochastic modeling method for pattern recognition. Ann. Statist. 24 (1996), no. 6, 2319--2349. doi:10.1214/aos/1032181157.

Export citation


  • 1 AMIT, Y., GEMAN, D. and WILDER, K. 1996. Recognizing shapes from simple queries about geometry. Unpublished manuscript.
  • 2 BERLIND, R. 1994. An alternative method of stochastic discrimination with applications to pattern recognition. Ph.D. dissertation, Dept. Mathematics, State Univ. New York, Buffalo.
  • 3 BERLIND, R. 1994. Almost uniformity in stochastic modeling. Unpublished manuscript.
  • 4 COVER, T. M. and HART, P. E. 1967. Nearest neighbor pattern classification. IEEE Trans. Inform. Theory IT-13 21 27.
  • 5 DUDA, R. O. and HART, P. E. 1973. Pattern Classification and Scene Analy sis. Wiley, New York.
  • 6 FREUND, Y. 1995. Boosting a weak learning algorithm by majority. Inform. and Comput. 121 256 285.
  • 7 GEMAN, S., BIENENSTOCK, E. and DOURSAT, R. 1992. Neural networks and the bias variance dilemma. Neural Computation 4 1 58.
  • 8 GOLDMAN, S. A., KEARNS, M. J. and SCHAPIRE, R. E. 1995. On the sample complexity of weakly learning. Inform. and Comput. 117 276 287.
  • 9 GUy ON, I. 1991. Applications of neural networks to character recognition. In Character Z. and Handwriting Recognition P. S. P. Wang, ed.. World Scientific, Singapore.
  • 10 HARALICK, R. M. 1976. The table look-up rule. Comm. Statist. Theory Methods 5 1163 1191.
  • 12 HO, T. K. 1992. A theory of multiple classifier sy stems and its application to visual word recognition. Ph.D. thesis, Dept. Computer Science, State Univ. New York, Buffalo.
  • 13 HO, T. K. 1993. Recognition of handwritten digits by combining independent learning vector quantizations. In Proceedings of the Second International Conference on DocuZ. ment Analy sis and Recognition M. Kavanaugh, ed. 818 821. IEEE Computer Society Press, New York.
  • 14 HO, T. K. 1995. Random decision forests. In Proceedings of the Third International Z Conference on Document Analy sis and Recognition M. Kavanaugh and P. Storms,. eds. 278 282. IEEE Computer Society Press, New York.
  • 15 KANAL, L. 1974. Patterns in pattern recognition: 1968 1974. IEEE Trans. Inform. Theory IT-20 697 722.
  • 16 KLEINBERG, E. M. 1990. Stochastic discrimination. Annals of Mathematics and Artificial Intelligence 1 207 239.
  • 17 KLEINBERG, E. M. and HO, T. K. 1993. Pattern recognition by stochastic modeling. In Proceedings of the Third International Workshop on Frontiers in Handwriting RecogZ. nition M. Bosker, R. Casey, et al., eds. 175 183. Partners Press, Buffalo, NY.
  • 18 KLEINBERG, E. M. and HO, T. K. 1996. Building projectable classifiers of arbitrary complexity. In Proceedings of the 13th International Conference on Pattern Recognition Z. M. E. Kavanaugh and B. Werner, eds. 880 885. IEEE Computer Society Press, New York.
  • 20 SCHAPIRE, R. E. 1990. The strength of weak learnability. Machine Learning 5 197 227.