CF-Tree in BIRCH

  • Clustering feature:
    • Summary of the statistics for a given subcluster: the 0-th, 1st, and 2nd moments of the subcluster from the statistical point of view
    • Registers crucial measurements for computing cluster and utilizes storage efficiently
  • A CF tree is a height-balanced tree that stores the clustering features for a hierarchical clustering
    • A nonleaf node in a tree has descendants or “children”
    • The nonleaf nodes store sums of the CFs of their children
  • A CF tree has two parameters
    • Branching factor: max # of children
    • Threshold: max diameter of sub-clusters stored at the leaf nodes

