SpamAssassin

  • Naïve Bayes has found a home in spam filtering

    • Paul Graham’s A Plan for Spam

      • A Naive Bayes-like classifier with weird parameter estimation

    • Widely used in spam filters

    • But many features beyond words:

      • black hole lists, etc.

      • particular hand-crafted text patterns

Naïve Bayes in Spam Filtering

  • SpamAssassin Features:

    • Basic (Naïve) Bayes spam probability

    • Mentions: Generic Viagra

    • Mentions millions of (dollar) ((dollar) NN,NNN,NNN.NN)

    • Phrase: impress ... girl

    • Phrase: 'Prestigious Non-Accredited Universities’

    • From: starts with many numbers

    • Subject is all capitals

    • HTML has a low ratio of text to image area

    • Relay in RBL, http://www.mail-abuse.com/enduserinfo_rbl.html

    • RCVD line looks faked

    • http://spamassassin.apache.org/tests_3_3_x.html

Naïve Bayes on spam email