Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
With a stop list, you exclude from the dictionary entirely the commonest words. Intuition:
They have little semantic content: the, a, and, to, be
There are a lot of them: ~30% of postings for top 30 words
But the trend is away from doing this:
Good compression techniques (lecture 5) means the space for including stopwords in a system is very small
Good query optimization techniques (lecture 7) mean you pay little at query time for including stop words.
You need them for:
Phrase queries: “King of Denmark”
Various song titles, etc.: “Let it be”, “To be or not to be”
“Relational” queries: “flights to London”
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License