Classification Methods (2)

  • Hand-coded rule-based classifiers

    • One technique used by CS dept’s spam filter, Reuters, CIA, etc.

    • It’s what Google Alerts is doing

      • Widely deployed in government and enterprise

    • Companies provide “IDE” for writing such rules

    • E.g., assign category if document contains a given boolean combination of words

    • Commercial systems have complex query languages (everything in IR query languages +score accumulators)

    • Accuracy is often very high if a rule has been carefully refined over time by a subject expert

    • Building and maintaining these rules is expensive

