I applied the KNN algorithm in matlab for classifying handwritten digits. the digits are in vector format initially 8*8, and stretched to form a vector 1*64. So each time I am comparing the first digit with all the rest data set, (which is quite huge), then the second one with the rest of the set etc etc etc. Now my question is, isn't 1 neighbor the best choice always? Since I am using Euclidean Distance, (I pick the one that is closer) why should I also choose 2 or 3 more neighbors since I got the closest digit?
Thanks
You have to take noise into consideration. Assume that maybe some of your classified examples were classified wrongly, or maybe one of them is oddly very close to other examples - which are different, but it is actually only a "glitch". In these cases - classifying according to this off the track example could lead to a mistake.
From personal experience, usually the best results are achieved for k=3/5/7, but it is instance dependent.
If you want to achieve best performance - you should use cross validation top chose the optimal k
for your specific instance.
Also, it is common to use only odd number as k
for KNN, to avoid "draws"