Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Recap of the previous lecture

  • The type/token distinction

    • Terms are normalized types put in the dictionary

  • Tokenization problems:

    • Hyphens, apostrophes, compounds, CJK

  • Term equivalence classing:

    • Numbers, case folding, stemming, lemmatization

  • Skip pointers

    • Encoding a tree-like structure in a postings list

  • Biword indexes for phrases

  • Positional indexes for phrases/proximity queries


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.