Current Slide
Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Tokenization
Issues in tokenization:
Finland’s capital →
Finland? Finlands? Finland’s?
Hewlett-Packard → Hewlett and Packard as two tokens?
state-of-the-art: break up hyphenated sequence.
co-education
lowercase, lower-case, lower case ?
It can be effective to get the user to put in possible hyphens
San Francisco: one token or two?
How do you decide it is one token?
Speaker notes:
Content Tools
Tools
Sources (0)
Tags (0)
Comments (0)
History
Usage
Questions (0)
Playlists (0)
Quality
Sources
There are currently no sources for this slide.