Search code examples
pythonpandasscikit-learnknnnearest-neighbor

Generate 'K' Nearest Neighbours to a datapoint


I need to generate K nearest neighbours given a datapoint. I read up the sklearn.neighbours module of sklearn but it generates neighbours between two sets of data. What I want is probably a list of 100 datapoints closest to the datapoint passed.

Any KNN algorithm shall anyways be finding these K datapoints under the hood. Is there any way these K points could be returned as output?

Here is my sample notebook.


Solution

  • from sklearn.neighbors import NearestNeighbors 
    

    This can give you the index of the k nearest neighbors in your dataset. use kneighbors, first value is the distance and second value is the index of the neighbors. From documentation:

    >>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
    >>> from sklearn.neighbors import NearestNeighbors
    >>> neigh = NearestNeighbors(n_neighbors=1)
    >>> neigh.fit(samples) 
    NearestNeighbors(algorithm='auto', leaf_size=30, ...)
    >>> print(neigh.kneighbors([[1., 1., 1.]])) 
    (array([[0.5]]), array([[2]]))