Discretization by Classification & Correlation Analysis

  • Classification (e.g., decision tree analysis)
    • Supervised: Given class labels, e.g., cancerous vs. benign
    • Using entropy to determine split point (discretization point)
    • Top-down, recursive split
    • Details to be covered in Chapter 7
  • Correlation analysis (e.g., Chi-merge: χ2-based discretization)
    • Supervised: use class information
    • Bottom-up merge: find the best neighboring intervals (those having similar distributions of classes, i.e., low χ2 values) to merge
    • Merge performed recursively, until a predefined stopping condition

