Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Strategies to Avoid Overfitting

  • Overfitting might occur based on erroneous input data or based on coincidental regularities.
  • Different types of strategies:
    • Stopping to grow the tree earlier, before it reaches the point where it perfectly classifies the training data.
    • Allowing the tree to overfit the data, and then post-prune the tree (more successful approach).
  • Key question: How to determine the correct final tree size?
    • Use of a separate set of examples to evaluate the utility of post-pruning nodes from the tree (“Training and Validation Set” – approach); two approaches applied by Quinlain: “Reduced Error Pruning” and “Rule-Post Pruning”
    • Use all available data for training, but apply a statistical test to estimate whether expanding (or pruning) a particular node is likely to produce an improvement beyond the training set.
    • Use an explicit measure of the complexity for encoding the training examples and the decision trees, halting growth of the tree when this encoding size is minimized (“Minimum Decision Length” – principle, see [1] Chapter 6)

Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.