python data-science knn recommendation-engine

KNN model returns same distances with any k

I am trying to create a basic item based recommender system with knn. But with the following code it always returns same distances with different k's with the model. Why it returns same results?

df_ratings = pd.read_csv('ml-1m/ratings.dat', names=["user_id", "movie_id", "rating", "timestamp"],
            header=None, sep='::', engine='python')
matrix_df = df_ratings.pivot(index='movie_id', columns='user_id', values='rating').fillna(0).astype(bool).astype(int)

um_matrix = scipy.sparse.csr_matrix(matrix_df.values)

# knn model
model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=17, n_jobs=-1)
model_knn.fit(um_matrix)

distances, indices = model_knn.kneighbors(um_matrix[int(movie)], n_neighbors=100)

Solution

Your model returns the same distances for any K because your K does not change the distances between your datapoints.

K-Nearest-Neigbours does simply find the nearest neigbours of a point in your feature space, the K does spceify how many of them you are looking for, not how far they are away from each other.

A simple example would be

X = [[0,0],[0,5],[5,0],[5,5][4,4]]

As a scatter plot it looks like

so your distance matrix defines the distances between all points:

   [0,0]:  [0.        , 5.        , 5.        , 5.65685425, 7.07106781],
   [0,5]:  [0.        , 4.12310563, 5.        , 5.        , 7.07106781],
   [5,0]:  [0.        , 4.12310563, 5.        , 5.        , 7.07106781],
   [5,5]:  [0.        , 1.41421356, 5.        , 5.        , 7.07106781],
   [4,4]:  [0.        , 1.41421356, 4.12310563, 4.12310563, 5.65685425]]

The first row shows the distances from point [0,0] to any other point

to itself its 0
to [0,5] the distance is 5
to [5,0] the distance is 5
to [4,4] its (in my case euklidian distance) the squareroot of 4*4+4*4 so 5.65..
to [5,5] the euklidian distance is 7.07106781

So no matter how many points you are looking for (K) the distances are allways the same.