Search code examples
pythonscikit-learnclassificationknn

How to select K in sklearn's KNeighborsClassifier based on the highest accuracy


I am using KNN in a classification project

I am trying to find the K with highest accuracy bit it just give me the highest K I am using more of an automated process instead of using the elbow method.

k=6
acc_array=np.zeros(k)
for n in range(1,k):
    classifier=KNeighborsClassifier(n_neighbors=k).fit(x_train,y_train)
    y_pred=classifier.predict(x_test)
    acc=metrics.accuracy_score(y_test, y_pred)
    acc_array[k-1]=acc
max_acc=np.amax(acc_array)
acc_list=list(acc_array)
k=acc_list.index(max_acc)
print("The best accuracy was with", max_acc, "with k=",k) 

I tried it for different values and it is just the same.


Solution

  • You have multiple errors in your code.

    First, inside the forloop you always have n_neighbors=k and k is defined outside the loop thus, it's always the same.

    Second, you use acc_array[k-1]=acc and again k is constant so you store the acc values on the SAME position.

    Here is a correct version using the Iris dataset:

    from sklearn import datasets
    from sklearn.model_selection import train_test_split
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn import metrics                
    
    # import some data to play with
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    
    x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
    
    k=10
    acc_array=np.zeros(k)
    for k in np.arange(1,k+1,1): # here k will take values from 1 to 10
        classifier = KNeighborsClassifier(n_neighbors=k).fit(x_train,y_train) # k changes after each iteration
        y_pred = classifier.predict(x_test)
        acc = metrics.accuracy_score(y_test, y_pred)
        acc_array[k-1]=acc # store correctly the results
    
    max_acc=np.amax(acc_array)
    acc_list=list(acc_array)
    k=acc_list.index(max_acc)
    print("The best accuracy was with", max_acc, "with k=", k+1)
    
    

    In this case, the acc is the same for all k used.

    acc_array
    array([0.98, 0.98, 0.98, 0.98, 0.98, 0.98, 0.98, 0.98, 0.98, 0.98])