Interestingness Measure: Correlations (Lift)

  • play basketball eat cereal [40%, 66.7%] is misleading
    • The overall % of students eating cereal is 75% > 66.7%.
  • play basketball not eat cereal [20%, 33.3%] is more accurate, although with lower support and confidence
  • Measure of dependent/correlated events: lift

Are lift and X^2 Good Measures of Correlation?

  • “Buy walnuts buy milk [1%, 80%]” is misleading if 85% of customers buy milk
  • Support and confidence are not good to indicate correlations
  • Over 20 interestingness measures have been proposed (see Tan, Kumar, Sritastava @KDD’02)
  • Which are good ones?

Null-Invariant Measures

Comparison of Interestingness Measures

Analysis of DBLP Coauthor Relationships

Which Null-Invariant Measure Is Better?

  • IR (Imbalance Ratio): measure the imbalance of two itemsets A and B in rule implications

\[ IR(A,B) = \frac {|sup(A)-sup(B)|} {sup(A)+sup(B)-sup(AUB)}\]

  • Kulczynski and Imbalance Ratio (IR) together present a clear picture for all the three datasets D4 through D6
    • D4 is balanced & neutral
    • D5 is imbalanced & neutral
    • D6 is very imbalanced & neutral