Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Same sort of sorting problem … just larger ← WHY?
Building character n-gram indexes:
As text is parsed, enumerate n-grams.
For each n-gram, need pointers to all dictionary terms containing it – the “postings”.
Note that the same “postings entry” will arise repeatedly in parsing the docs – need efficient hashing to keep track of this.
E.g., that the trigram uou occurs in the term deciduous will be discovered on each text occurrence of deciduous
Only need to process each term once
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License