Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Tokenization

  • Input: “Friends, Romans, Countrymen

  • Output: Tokens

    • Friends

    • Romans

    • Countrymen

  • A token is a sequence of characters in a document

  • Each such token is now a candidate for an index entry, after further processing

    • Described below

  • But what are valid tokens to emit?


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.