Search code examples
machine-learningscikit-learnknn

How to run KNN with cosine_similarity?


I'm trying to run cosine_similarity with KNN Classifier with no success.

from sklearn.metrics.pairwise import cosine_similarity
knn = KNeighborsClassifier(n_neighbors=10,  metric=cosine_similarity).fit(x, y)

shape of x (150 sample with 4 features):

(150, 4)

shape of y:

(150,)

I'm getting error:

ValueError: Expected 2D array, got 1D array instead

I have tried to reshape x with reshape(-1,1) or rehsape(1,-1) with no success.

How can I run KNN Classifier on this dataset (x have 4 features) with cosine_similarity ?


Solution

  • The problem is that the cosine similarity is only supported by the brute-force variant of the nearest neighbor algorithm. You have two options here to make this work:

    Option 1: Explicitly specify to use the brute-force algorithm with algorithm='brute':

    from sklearn.datasets import make_classification
    from sklearn.metrics.pairwise import cosine_similarity
    from sklearn.neighbors import KNeighborsClassifier
    
    
    X, y = make_classification(n_samples=150, n_features=4, random_state=42)
    
    knn = KNeighborsClassifier(n_neighbors=10, algorithm='brute',  metric=cosine_similarity)
    knn.fit(X, y)
    

    Option 2: Specify metric='cosine' which will automatically pick the brute-force algorithm:

    from sklearn.datasets import make_classification
    from sklearn.metrics.pairwise import cosine_similarity
    from sklearn.neighbors import KNeighborsClassifier
    
    
    X, y = make_classification(n_samples=150, n_features=4, random_state=42)
    
    knn = KNeighborsClassifier(n_neighbors=10,  metric='cosine')
    knn.fit(X, y)
    

    If you want to read more about the different nearest neighbor algorithms you can refer to the user guide.