Current Slide
Small screen detected. You are viewing the mobile version of SlideWiki. If you wish to edit slides you will need to use a larger device.
Summary: Representation ofText Categorization Attributes
Representations of text are usually very high dimensional (one feature for each word)
High-bias algorithms that prevent overfitting in high-dimensional space should generally work best*
For most text categorization tasks, there are many relevant features and many irrelevant ones
Methods that combine evidence from many or all features (e.g. naive Bayes, kNN) often tend to work better than ones that try to isolate just a few relevant features*
*Although the results are a bit more mixed than often thought
Speaker notes:
Content Tools
Tools
Sources (0)
Tags (0)
Comments (0)
History
Usage
Questions (0)
Playlists (0)
Quality
Sources
There are currently no sources for this slide.