Computing a Threshold

  • Goal: Determining a threshold c that produces the greatest information gain.
  • Approach:

    1. Sort examples according to the continuous attribute A.

    2. Identify adjacent examples that differ in their target classification.

    3. Generate a set of candidate thresholds midway between the corresponding values of A (c must always lie at such a boundary, see [5]).

    4. Evaluate candidate thresholds by computing the information gain associated with each.
  • Example:
    • Two candidate thresholds at which the value of PlayTennis changes: (48+60)/2, (80+90)/2
    • Information gain for the first threshold is higher.

