Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Resources for today’s lecture

  • IIR 13

  • Fabrizio Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1):1-47, 2002.

  • Yiming Yang & Xin Liu, A re-examination of text categorization methods. Proceedings of SIGIR, 1999.

  • Andrew McCallum and Kamal Nigam. A Comparison of Event Models for Naive Bayes Text Classification. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.

  • Tom Mitchell, Machine Learning. McGraw-Hill, 1997.

    • Clear simple explanation of Naïve Bayes

  • Open Calais: Automatic Semantic Tagging

    • Free (but they can keep your data), provided by Thompson/Reuters (ex-ClearForest)

  • Weka: A data mining software package that includes an implementation of Naive Bayes

  • Reuters-21578 – the most famous text classification evaluation set

    • Still widely used by lazy people (but now it’s too small for realistic experiments – you should use Reuters RCV1)


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.