Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
12-byte (4+4+4) records (term, doc, freq).
These are generated as we parse docs.
Must now sort 100M such 12-byte records by term.
Define a Block ~ 10M such records
Can easily fit a couple into memory.
Will have 10 such blocks to start with.
Basic idea of algorithm:
Accumulate postings for each block, sort, write to disk.
Then merge the blocks into one long sorted order.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License