Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Representations of text are usually very high dimensional (one feature for each word)
High-bias algorithms that prevent overfitting in high-dimensional space should generally work best*
For most text categorization tasks, there are many relevant features and many irrelevant ones
Methods that combine evidence from many or all features (e.g. naive Bayes, kNN) often tend to work better than ones that try to isolate just a few relevant features*
*Although the results are a bit more mixed than often thought
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License