Current Slide
Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Stop words
With a stop list, you exclude from the dictionary entirely the commonest words. Intuition:
They have little semantic content: the, a, and, to, be
There are a lot of them: ~30% of postings for top 30 words
But the trend is away from doing this:
Good compression techniques (lecture 5) means the space for including stopwords in a system is very small
Good query optimization techniques (lecture 7) mean you pay little at query time for including stop words.
You need them for:
Phrase queries: “King of Denmark”
Various song titles, etc.: “Let it be”, “To be or not to be”
“Relational” queries: “flights to London”
Speaker notes:
Content Tools
Tools
Sources (0)
Tags (0)
Comments (0)
History
Usage
Questions (0)
Playlists (0)
Quality
Sources
There are currently no sources for this slide.