kNN decision boundaries

Example: k=6 (6NN)

Nearest-Neighbor Learning Algorithm

  • Learning is just storing the representations of the training examples in D.

  • Testing instance x (under 1NN):

    • Compute similarity between x and all examples in D.

    • Assign x the category of the most similar example in D.

  • Does not explicitly compute a generalization or category prototypes.

  • Also called:

    • Case-based learning

    • Memory-based learning

    • Lazy learning

  • Rationale of kNN: contiguity hypothesis

kNN Is Close to Optimal

  • Cover and Hart (1967)

  • Asymptotically, the error rate of 1-nearest-neighbor classification is less than twice the Bayes rate [error rate of classifier knowing model that generated data]

  • In particular, asymptotic error rate is 0 if Bayes rate is 0.

  • Assume: query point coincides with a training point.

  • Both query point and training point contribute error → 2 times Bayes rate

k Nearest Neighbor

  • Using only the closest example (1NN) to determine the class is subject to errors due to:

    • A single atypical example.

    • Noise (i.e., an error) in the category label of a single training example.

  • More robust alternative is to find the k most-similar examples and return the majority category of these k examples.

  • Value of k is typically odd to avoid ties; 3 and 5 are most common.

k Nearest Neighbor Classification

  • kNN = k Nearest Neighbor

  • To classify a document d into class c:

  • Define k-neighborhood N as k nearest neighbors of d

  • Count number of documents i in N that belong to c

  • Estimate P(c|d) as i/k

  • Choose as class argmaxc P(c|d) [ = majority class]

Similarity Metrics

  • Nearest neighbor method depends on a similarity (or distance) metric.

  • Simplest for continuous m-dimensional instance space is Euclidean distance.

  • Simplest for m-dimensional binary instance space is Hamming distance (number of feature values that differ).

  • For text, cosine similarity of tf.idf weighted vectors is typically most effective.

Illustration of 3 Nearest Neighbor for Text Vector Space

3 Nearest Neighbor vs. Rocchio

  • Nearest Neighbor tends to handle polymorphic categories better than Rocchio/NB.

Nearest Neighbor with Inverted Index

  • Naively, finding nearest neighbors requires a linear search through |D| documents in collection

  • But determining k nearest neighbors is the same as determining the k best retrievals using the test document as a query to a database of training documents.

  • Use standard vector space inverted index methods to find the k nearest neighbors.

  • Testing Time: O(B|Vt|) where B is the average number of training documents in which a test-document word appears.

    • Typically B << |D|

kNN: Discussion

  • No feature selection necessary

  • Scales well with large number of classes

    • Don’t need to train n classifiers for n classes

  • Classes can influence each other

    • Small changes to one class can have ripple effect

  • Scores can be hard to convert to probabilities

  • No training necessary

    • Actually: perhaps not true. (Data editing, etc.)

  • May be expensive at test time

  • In most cases it’s more accurate than NB or Rocchio

kNN vs. Naive Bayes

  • Bias/Variance tradeoff

    • Variance ≈ Capacity

  • kNN has high variance and low bias.

    • Infinite memory

  • NB has low variance and high bias.

    • Decision surface has to be linear (hyperplane – see later)

  • Consider asking a botanist: Is an object a tree?

    • Too much capacity/variance, low bias

      • Botanist who memorizes

      • Will always say “no” to new object (e.g., different # of leaves)

    • Not enough capacity/variance, high bias

      • Lazy botanist

      • Says “yes” if the object is green

    • You want the middle ground

                                                                                (Example due to C. Burges)

Creator: Tgbyrdmc


Licensed under the Creative Commons
Attribution ShareAlike CC-BY-SA license

This deck was created using SlideWiki.