Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Stop words

  • With a stop list, you exclude from the dictionary entirely the commonest words. Intuition:

    • They have little semantic content: the, a, and, to, be

    • There are a lot of them: ~30% of postings for top 30 words

  • But the trend is away from doing this:

    • Good compression techniques (lecture 5) means the space for including stopwords in a system is very small

    • Good query optimization techniques (lecture 7) mean you pay little at query time for including stop words.

    • You need them for:

      • Phrase queries: “King of Denmark”

      • Various song titles, etc.: “Let it be”, “To be or not to be”

      • “Relational” queries: “flights to London”


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.