Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
It doesn’t consider term frequency (how many times a term occurs in a document)
Rare terms in a collection are more informative than frequent terms. Jaccard doesn’t consider this information
We need a more sophisticated way of normalizing for length
Later in this lecture, we’ll use
. . . instead of |A ∩ B|/|A ∪ B| (Jaccard) for length normalization.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License