Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Heaps’ Law

For RCV1, the dashed line

log10M = 0.49 log10T + 1.64 is the best least squares fit.

Thus, M = 101.64T0.49 so k = 101.64 ≈ 44 and b = 0.49.

Good empirical fit for Reuters RCV1 !

For first 1,000,020 tokens,

law predicts 38,323 terms;

actually, 38,365 terms.


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.