SPIMI: Single-pass in-memory indexing

  • Key idea 1: Generate separate dictionaries for each block – no need to maintain term-termID mapping across blocks.

  • Key idea 2: Don’t sort. Accumulate postings in postings lists as they occur.

  • With these two ideas we can generate a complete inverted index for each block.

  • These separate indexes can then be merged into one big index.

SPIMI-Invert

  • Merging of blocks is analogous to BSBI. 

SPIMI: Compression

  • Compression makes SPIMI even more efficient.

    • Compression of terms

    • Compression of postings

  • See next lecture