Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Postings compression

  • The postings file is much larger than the dictionary, factor of at least 10.

  • Key desideratum: store each posting compactly.

  • A posting for our purposes is a docID.

  • For Reuters (800,000 documents), we would use 32 bits per docID when using 4-byte integers.

  • Alternatively, we can use log2 800,000 ≈ 20 bits per docID.

  • Our goal: use far fewer than 20 bits per docID.


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.