Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Strategies to Avoid Overfitting
Overfitting might occur based on erroneous input data or based on coincidental regularities.
Different types of strategies:
Stopping to grow the tree earlier, before it reaches the point where it perfectly classifies the training data.
Allowing the tree to overfit the data, and then post-prune the tree (more successful approach).
Key question: How to determine the correct final tree size?
Use of a separate set of examples to evaluate the utility of post-pruning nodes from the tree (“Training and Validation Set” – approach); two approaches applied by Quinlain: “Reduced Error Pruning” and “Rule-Post Pruning”
Use all available data for training, but apply a statistical test to estimate whether expanding (or pruning) a particular node is likely to produce an improvement beyond the training set.
Use an explicit measure of the complexity for encoding the training examples and the decision trees, halting growth of the tree when this encoding size is minimized (“Minimum Decision Length” – principle, see  Chapter 6)