Electronic Journal of Statistics

Modeling temporal text streams using the local multinomial model

Guy Lebanon, Yang Zhao, and Yanjun Zhao

Full-text: Open access

Abstract

Temporal text data such as news feeds cannot be adequately modeled by standard n-grams which correspond to multinomial or Markov chain models. Instead, we examine the application of local n-grams to modeling time stamped documents. We derive the asymptotic bias and variance and consider the bandwidth selection problem. Experimental results are presented on news feeds and web search query logs.

Article information

Source
Electron. J. Statist. Volume 4 (2010), 566-584.

Dates
First available in Project Euclid: 16 June 2010

Permanent link to this document
http://projecteuclid.org/euclid.ejs/1276694115

Digital Object Identifier
doi:10.1214/09-EJS522

Mathematical Reviews number (MathSciNet)
MR2660533

Subjects
Primary: 62G99: None of the above, but in this section
Secondary: 62P99: None of the above, but in this section

Keywords
Kernel smoothing text modeling

Citation

Lebanon, Guy; Zhao, Yang; Zhao, Yanjun. Modeling temporal text streams using the local multinomial model. Electron. J. Statist. 4 (2010), 566--584. doi:10.1214/09-EJS522. http://projecteuclid.org/euclid.ejs/1276694115.


Export citation

References

  • [1] Baeza-Yates, R. and Ribeiro-Neto, B. (1999)., Modern Information Retrieval. Addison Wesley.
  • [2] Jelinek, F. (1999)., Statistical methods for speech recognition. MIT press.
  • [3] Jurafsky, D., Martin, J. H. and Kehler, A. (2000)., Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. MIT Press.
  • [4] Lewis, D., Yang, Y., Rose, T. and Li, F. (2004). RCV1: A new benchmark collection for text categorization research., Journal of Machine Learning Research 5 361–397.
  • [5] Manning, C. D. and Schutze, H. (1999)., Foundations of Statistical Natural Language Processing. MIT Press.
  • [6] Pass, G., Chowdhury, A. and Torgeson, C. (2006). A picture of search. In, The First International Conference on Scalable Information Systems.
  • [7] Trujillo, A. (1999)., Translation engines: techniques for machine translation. Springer Verlag.
  • [8] Yang, Y. (1999). An evaluation of statistical approaches to text categorization., Information Retrieval 1 69–90.