Comparison With Vector Space

  • There’s some relation to traditional tf.idf models:

    • (unscaled) term frequency is directly in model

    • the probabilities do length normalization of term frequencies

    • the effect of doing a mixture with overall collection frequencies is a little like idf: terms rare in the general collection but common in some documents will have a greater influence on the ranking

