Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Document correction

  • Especially needed for OCR’ed documents

    • Correction algorithms are tuned for this: rn/m

    • Can use domain-specific knowledge

      • E.g., OCR can confuse O and D more often than it would confuse O and I (adjacent on the QWERTY keyboard, so more likely interchanged in typing).

  • But also: web pages and even printed material have typos

  • Goal: the dictionary contains fewer misspellings

  • But often we don’t change the documents and instead fix the query-document mapping

Speaker notes:

Content Tools


There are currently no sources for this slide.