Other sorts of indexes

  • Positional indexes

    • Same sort of sorting problem … just larger                ← WHY?

  • Building character n-gram indexes:

    • As text is parsed, enumerate n-grams.

    • For each n-gram, need pointers to all dictionary terms containing it – the “postings”.

    • Note that the same “postings entry” will arise repeatedly in parsing the docs – need efficient hashing to keep track of this.

      • E.g., that the trigram uou occurs in the term deciduous will be discovered on each text occurrence of deciduous

      • Only need to process each term once

