Current Slide

Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.

Summary: Representation ofText Categorization Attributes

  • Representations of text are usually very high dimensional (one feature for each word)

  • High-bias algorithms that prevent overfitting in high-dimensional space should generally work best*

  • For most text categorization tasks, there are many relevant features and many irrelevant ones

  • Methods that combine evidence from many or all features (e.g. naive Bayes, kNN) often tend to work better than ones that try to isolate just a few relevant features*

                                                    *Although the results are a bit more mixed than often thought


Speaker notes:

Content Tools

Sources

There are currently no sources for this slide.