Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Tokenization: language issues

  • Chinese and Japanese have no spaces between words:

    • 莎拉波娃现在居住在美国东南部的佛罗里达。
    • Not always guaranteed a unique tokenization
  • Further complicated in Japanese, with multiple alphabets intermingled

  • Dates/amounts in multiple formats

      End-user can express query entirely in hiragana!

Speaker notes:

Content Tools


There are currently no sources for this slide.