Computing Information-Gain for Continuous-Valued Attributes

  • Let attribute A be a continuous-valued attribute
  • Must determine the best split point for A
    • Sort the value A in increasing order
    • Typically, the midpoint between each pair of adjacent values is considered as a possible split point
      • (ai+ai+1)/2 is the midpoint between the values of ai and ai+1
    • The point with the minimum expected information requirement for A is selected as the split-point for A
  • Split:
    • D1 is the set of tuples in D satisfying A ≤ split-point, and D2 is the set of tuples in D satisfying A > split-point

