Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

BSBI: Blocked sort-based Indexing (Sorting with fewer disk seeks)

  • 12-byte (4+4+4) records (term, doc, freq).

  • These are generated as we parse docs.

  • Must now sort 100M such 12-byte records by term.

  • Define a Block ~ 10M such records

    • Can easily fit a couple into memory.

    • Will have 10 such blocks to start with.

  • Basic idea of algorithm:

    • Accumulate postings for each block, sort, write to disk.

    • Then merge the blocks into one long sorted order.


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.