Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies)

  • Zhang, Ramakrishnan & Livny, SIGMOD’96
  • Incrementally construct a CF (Clustering Feature) tree, a hierarchical data structure for multiphase clustering
    • Phase 1: scan DB to build an initial in-memory CF tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data)
    • Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree
  • Scales linearly: finds a good clustering with a single scan and improves the quality with a few additional scans
  • Weakness: handles only numeric data, and sensitive to the order of the data record

Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.