I have a list of the station with x and y coordinates. I tried to find at least 4 closest points for each station. I had a look at this link but can not able to figure out how to do that.
for example, my data looks like:
station Y X
601 28.47 83.43
604 28.45 83.42
605 28.16 83.36
606 28.29 83.39
607 28.38 83.36
608 28.49 83.53
609 28.21 83.34
610 29.03 83.53
612 29.11 83.58
613 28.11 83.45
614 28.13 83.42
615 282.4 83.06
616 28.36 83.13
619 28.24 83.44
620 28.02 83.39
621 28.23 83.24
622 28.09 83.34
623 29.06 84
624 28.58 83.47
625 28.54 83.41
626 28.28 83.36
627 28.23 83.29
628 28.3 83.18
629 28.34 83.23
630 28.08 83.37
633 29.11 83.59
Any help will be highly appriciated.
For large-data, you might try to be clever in regards to data-stuctures. As already tagged by yourself, there are specialized data-structures for these kind of lookups. Scipy supports some, sklearn is even more complete (and imho better and more actively developed for these tasks; personal opinion)!
The code-example uses scipy's API to not use (python-)loops. The disadvantage is the need for discarding the 0-distance to itself for each element.
import numpy as np
from scipy.spatial import KDTree
""" Data """
data_i = np.array([601, 604, 605, 606])
data = np.array([[28.47, 83.43],[28.45, 83.42],[28.16, 83.36],[82.29, 83.39]])
print(data_i)
print(data)
""" KDTree """
N_NEIGHBORS = 2
kdtree = KDTree(data)
kdtree_q = kdtree.query(data, N_NEIGHBORS+1) # 0-dist to self -> +1
print(data_i[kdtree_q[1][:, 1:]]) # discard 0-dist
# uses guarantee of sorted-by-dist
[601 604 605 606]
[[ 28.47 83.43]
[ 28.45 83.42]
[ 28.16 83.36]
[ 82.29 83.39]]
[[604 605]
[601 605]
[604 601]
[601 604]]