Current Slide
Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Exercises
- What is the effect of including spelling errors, vs. automatically correcting spelling errors on Heaps’ law?
Compute the vocabulary size M for this scenario:
Looking at a collection of web pages, you find that there are 3000 different terms in the first 10,000 tokens and 30,000 different terms in the first 1,000,000 tokens.
Assume a search engine indexes a total of 20,000,000,000 (2 × 1010) pages, containing 200 tokens on average
What is the size of the vocabulary of the indexed collection as predicted by Heaps’ law?
Speaker notes:
Content Tools
Tools
Sources (0)
Tags (0)
Comments (0)
History
Usage
Questions (0)
Playlists (0)
Quality
Sources
There are currently no sources for this slide.