Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Tokenization

  • Issues in tokenization:

    • Finland’s capital → 

    • Finland? Finlands? Finland’s?

    • Hewlett-Packard → Hewlett and Packard as two tokens?

      • state-of-the-art: break up hyphenated sequence.

      • co-education

      • lowercase, lower-case, lower case ?

      • It can be effective to get the user to put in possible hyphens

    • San Francisco: one token or two?

      • How do you decide it is one token?


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.