### Nearest-Neighbor Learning Algorithm

• Learning is just storing the representations of the training examples in D.

• Testing instance x (under 1NN):

• Compute similarity between x and all examples in D.

• Assign x the category of the most similar example in D.

• Does not explicitly compute a generalization or category prototypes.

• Also called:

• Case-based learning

• Memory-based learning

• Lazy learning

• Rationale of kNN: contiguity hypothesis

### kNN Is Close to Optimal

• Cover and Hart (1967)

• Asymptotically, the error rate of 1-nearest-neighbor classification is less than twice the Bayes rate [error rate of classifier knowing model that generated data]

• In particular, asymptotic error rate is 0 if Bayes rate is 0.

• Assume: query point coincides with a training point.

• Both query point and training point contribute error → 2 times Bayes rate

### k Nearest Neighbor

• Using only the closest example (1NN) to determine the class is subject to errors due to:

• A single atypical example.

• Noise (i.e., an error) in the category label of a single training example.

• More robust alternative is to find the k most-similar examples and return the majority category of these k examples.

• Value of k is typically odd to avoid ties; 3 and 5 are most common.

### k Nearest Neighbor Classification

• kNN = k Nearest Neighbor

• To classify a document d into class c:

• Define k-neighborhood N as k nearest neighbors of d

• Count number of documents i in N that belong to c

• Estimate P(c|d) as i/k

• Choose as class argmaxc P(c|d) [ = majority class]

### Similarity Metrics

• Nearest neighbor method depends on a similarity (or distance) metric.

• Simplest for continuous m-dimensional instance space is Euclidean distance.

• Simplest for m-dimensional binary instance space is Hamming distance (number of feature values that differ).

• For text, cosine similarity of tf.idf weighted vectors is typically most effective.

### 3 Nearest Neighbor vs. Rocchio

• Nearest Neighbor tends to handle polymorphic categories better than Rocchio/NB.

### Nearest Neighbor with Inverted Index

• Naively, finding nearest neighbors requires a linear search through |D| documents in collection

• But determining k nearest neighbors is the same as determining the k best retrievals using the test document as a query to a database of training documents.

• Use standard vector space inverted index methods to find the k nearest neighbors.

• Testing Time: O(B|Vt|) where B is the average number of training documents in which a test-document word appears.

• Typically B << |D|

### kNN: Discussion

• No feature selection necessary

• Scales well with large number of classes

• Don’t need to train n classifiers for n classes

• Classes can influence each other

• Small changes to one class can have ripple effect

• Scores can be hard to convert to probabilities

• No training necessary

• Actually: perhaps not true. (Data editing, etc.)

• May be expensive at test time

• In most cases it’s more accurate than NB or Rocchio

### kNN vs. Naive Bayes

• Variance ≈ Capacity

• kNN has high variance and low bias.

• Infinite memory

• NB has low variance and high bias.

• Decision surface has to be linear (hyperplane – see later)

• Consider asking a botanist: Is an object a tree?

• Too much capacity/variance, low bias

• Botanist who memorizes

• Will always say “no” to new object (e.g., different # of leaves)

• Not enough capacity/variance, high bias

• Lazy botanist

• Says “yes” if the object is green

• You want the middle ground

(Example due to C. Burges)

Creator: Tgbyrdmc

Contributors:
-