Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Vocabulary vs. collection size

  • Heaps’ law: M = kTb

  • M is the size of the vocabulary, T is the number of tokens in the collection

  • Typical values: 30 ≤ k ≤ 100 and b ≈ 0.5

  • In a log-log plot of vocabulary size M vs. T, Heaps’ law predicts a line with slope about ½

    • It is the simplest possible relationship between the two in log-log space

    • An empirical finding (“empirical law”)


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.