I have a problem with my knn algorithm python script. I changed the metric used in the algorithm with the manhattan one. So this is what I wrote:
def manhattan_dist(self, data1, data2):
return sum(abs(data1 - data2))
X = df.iloc[:, :-1].values
y = df.iloc[:, 36].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
knn = KNeighborsClassifier(n_neighbors=5, metric=manhattan_dist)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(classification_report(y_test, y_pred))
The problem is that when I run this script, I have this error:
TypeError: manhattan_dist() missing 1 required positional argument: 'data2'
This error is related to the line
knn.fit(X_train, y_train)
Everything works fine with the euclidean distance. If you need any information about my dataset, please, ask me. The code is pretty long.
I'm not very skilled with python yet and it's the for time that I use the knn algorithm. Do yo have any suggestions?
You don't need self in the function definition. See the following code for an example of using custom distance metric.
from sklearn.neighbors import KNeighborsClassifier
def manhattan_dist(data1, data2):
return sum(abs(data1 - data2))
X = [[0, 1, 2],
[3, 4, 5],
[8, 9, 1],
[11, 7, 9]]
y = [0, 1, 1, 0]
knn = KNeighborsClassifier(n_neighbors=3, metric=manhattan_dist)
knn.fit(X, y)
knn.predict(X) # array([1, 1, 1, 1])