Current Slide
Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Similarity Measure in ROCK
- Traditional measures for categorical data may not work well, e.g., Jaccard coefficient
- Example: Two groups (clusters) of transactions
- C1. : {a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e}, {a, d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e}
- C2. : {a, b, f}, {a, b, g}, {a, f, g}, {b, f, g}
- Jaccard co-efficient may lead to wrong clustering result
- C1: 0.2 ({a, b, c}, {b, d, e}} to 0.5 ({a, b, c}, {a, b, d})
- C1 & C2: could be as high as 0.5 ({a, b, c}, {a, b, f})
- Jaccard co-efficient-based similarity function:
\[Sim(T_{1},T_{2})=\frac{|T_{1} \bigcap T_{2}|}{|T_{1} \bigcup T_{2}|}\]
- Ex. Let T1 = {a, b, c}, T2 = {c, d, e}
\[Sim(T_{1},T_{2})=\frac{|\left \{ c \right \}|}{|a,b,c,d,e|}=\frac{1}{5}=0.2\]
Speaker notes:
Content Tools
Tools
Sources (0)
Tags (0)
Comments (0)
History
Usage
Questions (0)
Playlists (0)
Quality
Sources
There are currently no sources for this slide.