Search code examples
pythonpandasgeo

List with the names of points in a given radius for all rows of the dataframe


I have a dataframe like:

projectName latitude    longitude
a          56.864229    60.609576
b          55.810413    37.701168
c          55.924912    37.966033
d          56.804987    60.590667
e          55.806000    37.569863

I want to get a list of points in a given radius for each point. Example for 30 km it should be like that:

projectName latitude    longitude   30km
a          56.864229    60.609576  [d]
b          55.810413    37.701168  [c, e]
c          55.924912    37.966033  [b, e]
d          56.804987    60.590667  [a]
e          55.806000    37.569863  [b, c]

How can I get this most quickly?


Solution

  • You can compute the pairwise haversine_distances and filter the values:

    from sklearn.metrics.pairwise import haversine_distances
    
    DIST = 10 # distance in km
    
    tmp = np.radians(df.set_index('projectName')[['latitude', 'longitude']])
    
    # compute pairwise distance
    keep = haversine_distances(tmp)*6371 <= DIST
    
    # remove self (e.g. a/a)
    np.fill_diagonal(keep, False)
    
    # combine the strings
    df[f'{DIST}km'] = (keep @ (tmp.index+',')).str[:-1]
    

    Output (for 10 and 30 km):

      projectName   latitude  longitude 10km 30km
    0           a  56.864229  60.609576    d    d
    1           b  55.810413  37.701168    e  c,e
    2           c  55.924912  37.966033       b,e
    3           d  56.804987  60.590667    a    a
    4           e  55.806000  37.569863    b  b,c
    

    If you want a list:

    df[f'{DIST}km'] = [tmp.index[x].tolist() for x in keep]
    

    Output:

      projectName   latitude  longitude 10km    30km
    0           a  56.864229  60.609576  [d]     [d]
    1           b  55.810413  37.701168  [e]  [c, e]
    2           c  55.924912  37.966033   []  [b, e]
    3           d  56.804987  60.590667  [a]     [a]
    4           e  55.806000  37.569863  [b]  [b, c]