Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Bi-Clustering (II): δ-pCluster

  • Enumerating all bi-clusters (δ-pClusters) [H. Wang, et al., Clustering by pattern similarity in large data sets. SIGMOD’02]
  • Since a submatrix I x J is a bi-cluster with (perfect) coherent values iff ei1j1 − ei2j1 = ei1j2 − ei2j2. For any 2 x 2 submatrix of I x J, define p-score

\[p-score \begin{pmatrix} e_{i1j1} &e_{i1j2} \\ e_{i2j1}&e_{i2j2} \end{pmatrix} =|(e_{i1j1}-e_{i2j1})-(e_{i1j2}-e_{i2j2})|\]

  • A submatrix I x J is a δ-pCluster (pattern-based cluster) if the p-score of every 2 x 2 submatrix of I x J is at most δ, where δ 0 is a threshold specifying a user's tolerance of noise against a perfect bi-cluster
  • The p-score controls the noise on every element in a bi-cluster, while the mean squared residue captures the average noise
  • Monotonicity: If I x J is a δ-pClusters, every x x y (x,y ≥ 2) submatrix of I x J is also a δ-pClusters.
  • A δ-pCluster is maximal if no more row or column can be added into the cluster and retain δ-pCluster: We only need to compute all maximal δ-pClusters.


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.