Search code examples
pythondataframemachine-learningpredictionknn

When using k nearest neighbors, is there a way to retrieve the "neighbors" that are used?


I'd like to find a way to determine which neighbors are actually used in my knn algorithm, so I can dive deeper into the rows of data that are similar to my features.

Here is an example of a dataset which I split into a training set and a test set for the prediction model:

    Player       PER    VORP    WS
Fabricio Oberto 11.9    1.0    4.1
Eddie Johnson   16.5    1.7    4.8
Tim Legler      15.9    2.0    6.8
Ersan Ilyasova  14.3    0.7    3.8
Kevin Love      25.4    3.5    10.0
Tim Hardaway    20.6    5.1    11.7
Frank Brickowsk 8.6    -0.2    1.6

    etc....

And here is an example of my knn algorithm code:

features = ['PER','VORP']
knn = KNeighborsRegressor(n_neighbors=5, algorithm='brute')
knn.fit(train[features], train['WS'])
predictions = knn.predict(test[features])

Now, I'm aware that the algorithm will iterate over each row and make each target prediction based on the 5 closest neighbors that come from the target features I've specified.

I'd like to find out WHICH 5 n_neighbors were actually used in determining my target feature? In this case - which players were actually used in determining the target?

Is there a way to get a list of the 5 neighbors (aka players) which were used in the analysis for each row?


Solution

  • knn.kneighbors will return you an array of the corresponding nearest neighbours.