Search code examples
algorithmmatlabknn

Number of neighbours KNN algorithm


I applied the KNN algorithm in matlab for classifying handwritten digits. the digits are in vector format initially 8*8, and stretched to form a vector 1*64. So each time I am comparing the first digit with all the rest data set, (which is quite huge), then the second one with the rest of the set etc etc etc. Now my question is, isn't 1 neighbor the best choice always? Since I am using Euclidean Distance, (I pick the one that is closer) why should I also choose 2 or 3 more neighbors since I got the closest digit?

Thanks


Solution

  • You have to take noise into consideration. Assume that maybe some of your classified examples were classified wrongly, or maybe one of them is oddly very close to other examples - which are different, but it is actually only a "glitch". In these cases - classifying according to this off the track example could lead to a mistake.

    From personal experience, usually the best results are achieved for k=3/5/7, but it is instance dependent.

    If you want to achieve best performance - you should use cross validation top chose the optimal k for your specific instance.

    Also, it is common to use only odd number as k for KNN, to avoid "draws"