Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Issues with Jaccard for scoring

  • It doesn’t consider term frequency (how many times a term occurs in a document)

  • Rare terms in a collection are more informative than frequent terms. Jaccard doesn’t consider this information

  • We need a more sophisticated way of normalizing for length

  • Later in this lecture, we’ll use 

  • . . . instead of |A ∩ B|/|A ∪ B| (Jaccard) for length normalization.


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.