Thesauri and soundex

  • Do we handle synonyms and homonyms?

    • E.g., by hand-constructed equivalence classes

      • car = automobile      color = colour

    • We can rewrite to form equivalence-class terms

      • When the document contains automobile, index it under car-automobile (and vice-versa)

    • Or we can expand a query

      • When the query contains automobile, look under car as well

  • What about spelling mistakes?

    • One approach is soundex, which forms equivalence classes of words based on phonetic heuristics

  • More in lectures 3 and 9

