I am trying to create a basic item based recommender system with knn. But with the following code it always returns same distances with different k's with the model. Why it returns same results?
df_ratings = pd.read_csv('ml-1m/ratings.dat', names=["user_id", "movie_id", "rating", "timestamp"],
header=None, sep='::', engine='python')
matrix_df = df_ratings.pivot(index='movie_id', columns='user_id', values='rating').fillna(0).astype(bool).astype(int)
um_matrix = scipy.sparse.csr_matrix(matrix_df.values)
# knn model
model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=17, n_jobs=-1)
model_knn.fit(um_matrix)
distances, indices = model_knn.kneighbors(um_matrix[int(movie)], n_neighbors=100)
Your model returns the same distances for any K because your K does not change the distances between your datapoints.
K-Nearest-Neigbours does simply find the nearest neigbours of a point in your feature space, the K does spceify how many of them you are looking for, not how far they are away from each other.
A simple example would be
X = [[0,0],[0,5],[5,0],[5,5][4,4]]
As a scatter plot it looks like
so your distance matrix defines the distances between all points:
[0,0]: [0. , 5. , 5. , 5.65685425, 7.07106781],
[0,5]: [0. , 4.12310563, 5. , 5. , 7.07106781],
[5,0]: [0. , 4.12310563, 5. , 5. , 7.07106781],
[5,5]: [0. , 1.41421356, 5. , 5. , 7.07106781],
[4,4]: [0. , 1.41421356, 4.12310563, 4.12310563, 5.65685425]]
The first row shows the distances from point [0,0] to any other point
So no matter how many points you are looking for (K) the distances are allways the same.