Search code examples
pythonmachine-learningscikit-learnknn

What would be predicted class of KNN for an even value of K and in case of a tie?


In KNN (K nearest neighbour) classifiers, if an even value of K is chosen, then what would be the prediction in majority voting rule or in Euclidean distance rule. For example if there are 3 classes say

  1. Iris-setosa
  2. Iris-versicolor
  3. Iris-virginica

And now say we have value of n_neighbors = 6. There is a fair amount of chance to get a tie result for Majority Voting Rule? In most visualisation this region is represented in white saying no decision could be achieved. But What would be the actual prediction for tie. This problem is hard to emulate and fairly conceptual hence may not be emulated so easily.

Also does an odd value of n_neighbors solves/reduces this issue? Do you think instead of using simple majority voting, euclidean/Manhattan distance would better handle this. However sklearn docs does not mention this at all.


Solution

  • After some digging, I have some good answers. First let me tell you that as mentioned by some users like @anasvaf, you should only use odd number for binary classification. This is absolutely untrue. Firstly, when we use majority voting for binary classification, on some tie, it is entirely dependent on the actual library to choose the action. For example, in scikit-learn, it takes the mode of the variable. This means if in the training dataset, number of datapoints for class 1 is more, then 1 would be used on the tie. But there is a better solution.

    What we can use weighted-KNN instead of normal KNN. In the weighed KNN, if there is a tie, we can see total distance for 1 labeled datapoints from k points and 0 labelled points. If total distance for 1 is more, then class would be 0 and vice-versa.

    There are other good techniques too to handle tie in KNN but to be very honest, KNN is not a good learning algorithm specially due to its space of time complexity on large datasets.