Bernoulli

  • Bernoulli
  • Volume 10, Number 6 (2004), 989-1010.

Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations

Peter J. Bickel and Elizaveta Levina

Full-text: Open access

Abstract

We show that the `naive Bayes' classifier which assumes independent covariates greatly outperforms the Fisher linear discriminant rule under broad conditions when the number of variables grows faster than the number of observations, in the classical problem of discriminating between two normal populations. We also introduce a class of rules spanning the range between independence and arbitrary dependence. These rules are shown to achieve Bayes consistency for the Gaussian `coloured noise' model and to adapt to a spectrum of convergence rates, which we conjecture to be minimax.

Article information

Source
Bernoulli Volume 10, Number 6 (2004), 989-1010.

Dates
First available in Project Euclid: 21 January 2005

Permanent link to this document
https://projecteuclid.org/euclid.bj/1106314847

Digital Object Identifier
doi:10.3150/bj/1106314847

Mathematical Reviews number (MathSciNet)
MR2108040

Zentralblatt MATH identifier
1064.62073

Keywords
Fisher's linear discriminant Gaussian coloured noise minimax regret naive Bayes

Citation

Bickel, Peter J.; Levina, Elizaveta. Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations. Bernoulli 10 (2004), no. 6, 989--1010. doi:10.3150/bj/1106314847. https://projecteuclid.org/euclid.bj/1106314847


Export citation

References

  • [1] Böttcher, A., Dijksma, A., Langer, H., Dritschel, M., Rovnyak, J. and Kaashoek, M. (1996) Lectures on Operator Theory and Its Applications. Providence, RI: American Mathematical Society.
  • [2] Bradley, T. (2002) On positive spectral density functions. Bernoulli, 8, 175-193. Abstract can also be found in the ISI/STMA publication
  • [3] De Vore, R. and Lorentz, G. (1993) Constructive Approximation. Berlin: Springer-Verlag.
  • [4] Domingos, P. and Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zeroone loss. Machine Learning, 29, 103-130.
  • [5] Donoho, D.L., Johnstone, I.M., Kerkyacharian, G. and Pickard, D. (1995) Wavelet shrinkage: asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B, 57, 301-369.
  • [6] Dudoit, S., Fridlyand, J. and Speed, T.P. (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc., 97, 77-87. Abstract can also be found in the ISI/STMA publication
  • [7] Greenshtein, E. and Ritov, Y. (2004) Consistency in high dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10, 971-988.
  • [8] Grenander, U. and Szegö, G. (1984) Toeplitz Forms and Their Applications. New York: Chelsea.
  • [9] Johnstone, I.M. (2002) Function estimation and Gaussian sequence models. Manuscript.
  • [10] Levina, E. (2002) Statistical issues in texture analysis. PhD thesis, University of California, Berkeley.
  • [11] Lewis, D.D. (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In C. Nédellec and C. Rouveirol (eds), Proceedings of ECML-98, 10th European Conference on Machine Learning, pp. 4-15. Heidelberg: Springer-Verlag.
  • [12] Luenberger, D.G. (1984) Linear and Nonlinear Programming. Addison-Wesley.
  • [13] McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley.