Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Recap of lecture 5

  • Collection and vocabulary statistics: Heaps’ and Zipf’s laws
  • Dictionary compression for Boolean indexes
    • Dictionary string, blocks, front coding
  • Postings compression: Gap encoding, prefix-unique codes
    • Variable-Byte and Gamma codes
  • collection (text, xml markup etc)
  • 3,600.0       MB
  • collection (text)
  • 960.0
  • Term-doc incidence matrix

  • 40,000.0

  • postings, uncompressed (32-bit words)
  • 400.0
  • postings, uncompressed (20 bits)
  • 250.0

  • postings, variable byte encoded
  • 116.0

  • postings, g-encoded
  • 101.0


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.