k-Neighbours

The $k$-neighbours method is an instance-based learning algorithm. It remembers the training set and when a new data point is presented it looks for the closest $k$ samples from the training set and returns

  • the average of the target values of these $k$ values for regression
  • the class of the majority of the $k$ training examples. (using some procedure to break ties)

Regularisation

The parameter $k$ can be used to control overfitting.

  • With $k=1$ the algorithm is likely to overfit.
  • Large values of $k$ can lead to underfitting.

Example

We can use the iris dataset:

k=1

k =3

k=10

k=20

Digits example

We can use the 8x8 digits picture example after applying PCA to reduce it to 2 dimensions:

k=1

k=3

k=5

k=20