Classification Methods (3)

  • Supervised learning of a document-label assignment function

    • Many systems partly or wholly rely on machine learning (Autonomy, Microsoft, Enkata, Yahoo!, …)

      • k-Nearest Neighbors (simple, powerful)

      • Naive Bayes (simple, common method)

      • Support-vector machines (new, generally more powerful)

      • … plus many other methods

    • No free lunch: requires hand-classified training data

    • But data can be built up (and refined) by amateurs

  • Many commercial systems use a mixture of methods

