I have a dataframe like:
projectName latitude longitude
a 56.864229 60.609576
b 55.810413 37.701168
c 55.924912 37.966033
d 56.804987 60.590667
e 55.806000 37.569863
I want to get a list of points in a given radius for each point. Example for 30 km it should be like that:
projectName latitude longitude 30km
a 56.864229 60.609576 [d]
b 55.810413 37.701168 [c, e]
c 55.924912 37.966033 [b, e]
d 56.804987 60.590667 [a]
e 55.806000 37.569863 [b, c]
How can I get this most quickly?
You can compute the pairwise haversine_distances
and filter the values:
from sklearn.metrics.pairwise import haversine_distances
DIST = 10 # distance in km
tmp = np.radians(df.set_index('projectName')[['latitude', 'longitude']])
# compute pairwise distance
keep = haversine_distances(tmp)*6371 <= DIST
# remove self (e.g. a/a)
np.fill_diagonal(keep, False)
# combine the strings
df[f'{DIST}km'] = (keep @ (tmp.index+',')).str[:-1]
Output (for 10 and 30 km):
projectName latitude longitude 10km 30km
0 a 56.864229 60.609576 d d
1 b 55.810413 37.701168 e c,e
2 c 55.924912 37.966033 b,e
3 d 56.804987 60.590667 a a
4 e 55.806000 37.569863 b b,c
If you want a list:
df[f'{DIST}km'] = [tmp.index[x].tolist() for x in keep]
Output:
projectName latitude longitude 10km 30km
0 a 56.864229 60.609576 [d] [d]
1 b 55.810413 37.701168 [e] [c, e]
2 c 55.924912 37.966033 [] [b, e]
3 d 56.804987 60.590667 [a] [a]
4 e 55.806000 37.569863 [b] [b, c]