Search code examples
pythondata-scienceknnrecommendation-engine

KNN model returns same distances with any k


I am trying to create a basic item based recommender system with knn. But with the following code it always returns same distances with different k's with the model. Why it returns same results?

df_ratings = pd.read_csv('ml-1m/ratings.dat', names=["user_id", "movie_id", "rating", "timestamp"],
            header=None, sep='::', engine='python')
matrix_df = df_ratings.pivot(index='movie_id', columns='user_id', values='rating').fillna(0).astype(bool).astype(int)

um_matrix = scipy.sparse.csr_matrix(matrix_df.values)

# knn model
model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=17, n_jobs=-1)
model_knn.fit(um_matrix)

distances, indices = model_knn.kneighbors(um_matrix[int(movie)], n_neighbors=100)

Solution

  • Your model returns the same distances for any K because your K does not change the distances between your datapoints.

    K-Nearest-Neigbours does simply find the nearest neigbours of a point in your feature space, the K does spceify how many of them you are looking for, not how far they are away from each other.

    A simple example would be

    X = [[0,0],[0,5],[5,0],[5,5][4,4]] 
    

    As a scatter plot it looks like

    so your distance matrix defines the distances between all points:

       [0,0]:  [0.        , 5.        , 5.        , 5.65685425, 7.07106781],
       [0,5]:  [0.        , 4.12310563, 5.        , 5.        , 7.07106781],
       [5,0]:  [0.        , 4.12310563, 5.        , 5.        , 7.07106781],
       [5,5]:  [0.        , 1.41421356, 5.        , 5.        , 7.07106781],
       [4,4]:  [0.        , 1.41421356, 4.12310563, 4.12310563, 5.65685425]]
    

    The first row shows the distances from point [0,0] to any other point

    • to itself its 0
    • to [0,5] the distance is 5
    • to [5,0] the distance is 5
    • to [4,4] its (in my case euklidian distance) the squareroot of 4*4+4*4 so 5.65..
    • to [5,5] the euklidian distance is 7.07106781

    So no matter how many points you are looking for (K) the distances are allways the same.