Recall: Vector Space Representation

  • Each document is a vector, one component for each term (= word).

  • Normally normalize vectors to unit length.

  • High-dimensional vector space:

    • Terms are axes

    • 10,000+ dimensions, or even 100,000+

    • Docs are vectors in this space

  • How can we do classification in this space?

