Current Slide
Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Postings compression
The postings file is much larger than the dictionary, factor of at least 10.
Key desideratum: store each posting compactly.
A posting for our purposes is a docID.
For Reuters (800,000 documents), we would use 32 bits per docID when using 4-byte integers.
Alternatively, we can use log2 800,000 ≈ 20 bits per docID.
Our goal: use far fewer than 20 bits per docID.
Speaker notes:
Content Tools
Tools
Sources (0)
Tags (0)
Comments (0)
History
Usage
Questions (0)
Playlists (0)
Quality
Sources
There are currently no sources for this slide.