Bigram (k-gram) indexes

  • Enumerate all k-grams (sequence of k chars) occurring in any term

  • e.g., from text “April is the cruelest month” we get the 2-grams (bigrams)

    • $ is a special word boundary symbol

  • Maintain a second inverted index from bigrams to dictionary terms that match each bigram.

  • $a,ap,pr,ri,il,l$,$i,is,s$,$t,th,he,e$,$c,cr,ru,

    ue,el,le,es,st,t$, $m,mo,on,nt,h$

