Search code examples
pythonknn

knn algorithm - TypeError: manhattan_dist() missing 1 required positional argument


I have a problem with my knn algorithm python script. I changed the metric used in the algorithm with the manhattan one. So this is what I wrote:

def manhattan_dist(self, data1, data2):
    return sum(abs(data1 - data2))

X = df.iloc[:, :-1].values
y = df.iloc[:, 36].values
  

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2) 
    
knn = KNeighborsClassifier(n_neighbors=5, metric=manhattan_dist) 

knn.fit(X_train, y_train) 

y_pred = knn.predict(X_test)

print(classification_report(y_test, y_pred))

The problem is that when I run this script, I have this error:

TypeError: manhattan_dist() missing 1 required positional argument: 'data2'

This error is related to the line

knn.fit(X_train, y_train)

Everything works fine with the euclidean distance. If you need any information about my dataset, please, ask me. The code is pretty long.

I'm not very skilled with python yet and it's the for time that I use the knn algorithm. Do yo have any suggestions?


Solution

  • You don't need self in the function definition. See the following code for an example of using custom distance metric.

    from sklearn.neighbors import KNeighborsClassifier
    
    def manhattan_dist(data1, data2):
        return sum(abs(data1 - data2))
    
    X = [[0, 1, 2],
         [3, 4, 5],
         [8, 9, 1],
         [11, 7, 9]]
    y = [0, 1, 1, 0]
    
    
    knn = KNeighborsClassifier(n_neighbors=3, metric=manhattan_dist)
    knn.fit(X, y)
    
    knn.predict(X) # array([1, 1, 1, 1])