Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

A Probabilistic Hierarchical Clustering Algorithm

  • For a set of objects partitioned into m clusters C1, . . . ,Cm, the quality can be measured by, 

\[Q(\left \{ C_{1},…,C_{m} \right \})=\prod_{i=1}^{m} P(C_{i})\]

    • where P() is the maximum likelihood
  • If we merge two clusters Cj1 and Cj2 into a cluster Cj1∪Cj2, then, the change in quality of the overall clustering is

\[Q((\left \{ C_{1},…,C_{m} \right \}-\left \{ C_{j1},C_{j2} \right \})U\left \{ C_{j1} U C_{j2} \right \})-Q(\left \{ C_{1},…,C_{m} \right \})\]

\[=\frac{\prod_{i=1}^{m}P(C_{i}).P(C_{j1}UC_{j2})}{P(C_{j1}).P(C_{j2})}-\prod_{i=1}^{m}P(C_{i})\]

\[=\prod_{i=1}^{m}P(C_{i})(\frac{P(C_{j1}UC_{j2})}{P(C_{j1})P(C_{j2})}-1)\]

  • Distance between clusters C1 and C2:

\[dist(C_{i},C_{j})=-log\frac{P(C_{1}UC_{2})}{P(C_{1})P(C_{2})}\]


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.