Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Normalization to terms

  • We need to “normalize” words in indexed text as well as query words into the same form

    • We want to match U.S.A. and USA

  • Result is terms: a term is a (normalized) word type, which is an entry in our IR system dictionary

  • We most commonly implicitly define equivalence classes of terms by, e.g.,

    • deleting periods to form a term

      • U.S.A., USA ⌊ USA

    • deleting hyphens to form a term

      • anti-discriminatory, antidiscriminatory ⌊ antidiscriminatory


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.