Classification Using Vector Spaces

  • As before, the training set is a set of documents, each labeled with its class (e.g., topic)

  • In vector space classification, this set corresponds to a labeled set of points (or, equivalently, vectors) in the vector space

  • Premise 1: Documents in the same class form a contiguous region of space

  • Premise 2: Documents from different classes don’t overlap (much)

  • We define surfaces to delineate classes in the space

