Insufficient data

  • Zero probabilities spell disaster

    • We need to smooth probabilities

      • Discount nonzero probabilities

      • Give some probability mass to unseen things

  • There’s a wide space of approaches to smoothing probability distributions to deal with this problem, such as adding 1, ½ or  to counts, Dirichlet priors, discounting, and interpolation

    • [See FSNLP ch. 6 or CS224N if you want more]

  • A simple idea that works well in practice is to use a mixture between the document multinomial and the collection multinomial distribution

