Other Attribute Selection Measures

  • CHAID: a popular decision tree algorithm, measure based on χ2 test for independence
  • C-SEP: performs better than info. gain and gini index in certain cases
  • G-statistic: has a close approximation to χ^2 distribution
  • MDL (Minimal Description Length) principle (i.e., the simplest solution is preferred):
    • The best tree as the one that requires the fewest # of bits to both (1) encode the tree, and (2) encode the exceptions to the tree
  • Multivariate splits (partition based on multiple variable combinations)
    • CART: finds multivariate splits based on a linear comb. of attrs.
  • Which attribute selection measure is the best?
    • Most give good results, none is significantly superior than others

