Search code examples
pythonpandasscikit-learnknn

Fit function in sklearn KNN model does not work: 'n_neighbors does not take <class 'float'> value, enter integer value'


I am working on a project where I want to use the KNN model from the sklearn library. I simplified the original problem to the following one. X1, X2 and X3 are the predicters to assign each row to a category (Y- variable), which is either 1 or 2. I used a online instruction and all went fine untill I use the fit function. Here is the code:

#Importing necessary libraries
import pandas as pd
import numpy as np
#Imports for KNN models
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
#Imports for testing the model
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

#Import the data file
data = pd.read_csv("/content/drive/MyDrive/Python/Colab Notebooks/Onlyinttest.csv")

#Split data
X = data.loc[:,['X1','X2','X3']]

Y = data.loc[:,'Y']

X_train, X_test, Y_train, Y_test = train_test_split(X,Y, random_state=0, test_size=0.2)

#Determine k by using sqrt
import math
k = math.sqrt(len(Y_test))
print(k)
#Make k uneven
k = k-1

#KNN Model
classifer = KNeighborsClassifier(n_neighbors=k, p=2,metric='euclidean')
classifer.fit(X_train,Y_train)

The error: ''n_neighbors does not take <class 'float'> value, enter integer value''

All the data from the original data were float data, but in every online example I read the algorithm also works with float data, so I do not understand this error.

To double check I created the csv used in the code above (Onlyinttest.csv), which only contains int values, but still the some error occures: CSV data

Can someone help me out here?


Solution

  • In your example, k is a float, not an integer. The n_neighbors value in KNeighborsClassifier(n_neighbors=k, p=2,metric='euclidean') has to be an integer, not a float.

    You could convert k into an integer in this example using the math.ceil() function which return the integer that is equal to or greater than the float value. Alternately, you could use the math.floor() function which will return the integer that less than or equal to the input float.

    For example:

    #Determine k by using sqrt
    import math
    k = math.sqrt(len(Y_test))
    print(k)
    #Make k uneven
    k = k-1
    k = math.ceil(k)
    print(k)  # should now be an integer
    print(type(k))  # <class 'int'>
    
    #KNN Model
    classifer = KNeighborsClassifier(n_neighbors=k, p=2, metric='euclidean')
    classifer.fit(X_train, Y_train)