Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Similarity Measure in ROCK

  • Traditional measures for categorical data may not work well, e.g., Jaccard coefficient
  • Example: Two groups (clusters) of transactions
    • C1. : {a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e}, {a, d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e}
    • C2. : {a, b, f}, {a, b, g}, {a, f, g}, {b, f, g}
  • Jaccard co-efficient may lead to wrong clustering result
    • C1: 0.2 ({a, b, c}, {b, d, e}} to 0.5 ({a, b, c}, {a, b, d})
    • C1 & C2: could be as high as 0.5 ({a, b, c}, {a, b, f})
  • Jaccard co-efficient-based similarity function:

\[Sim(T_{1},T_{2})=\frac{|T_{1} \bigcap T_{2}|}{|T_{1} \bigcup T_{2}|}\]

    • Ex. Let T1 = {a, b, c}, T2 = {c, d, e}

\[Sim(T_{1},T_{2})=\frac{|\left \{ c \right \}|}{|a,b,c,d,e|}=\frac{1}{5}=0.2\]

Speaker notes:

Content Tools


There are currently no sources for this slide.