Zipf consequences

  • If the most frequent term (the) occurs cf1 times

    • then the second most frequent term (of) occurs cf1/2 times

    • the third most frequent term (and) occurs cf1/3 times …

  • Equivalent: cfi = K/i where K is a normalizing factor, so

    • log cfi = log K - log i

    • Linear relationship between log cfi and log i

  • Another power law relationship

