Search code examples
pandasdataframematrixdistancehaversine

Distance Matrix - How to find the closest person in a dataframe based on coordinates?


I have a pandas dataframe with three columns: Name, Latitude and Longitude. For every person in the dataframe I want to 1) determine the person that is closest to him/her and 2)calculate the linear distance to that person. My code is like the example below:

import numpy as np
import pandas as pd
from scipy.spatial.distance import cdist
from haversine import haversine
df = pd.read_csv('..data/file_name.csv')
df.set_index('Name', inplace=True)
dm = cdist(df, df, metric=haversine)
closest = dm.argmin(axis=1)
distances = dm.min(axis=1)
df['closest person'] = df.index[closest]
df['distance'] = distances

I know that the issue here is the argmin and min functions I am using are simply causing me to append every person to him/herself which is not what I want. I'm trying to modify the code to determine the distinct individual who is closest. For example the closest person to John Doe is Bob Smith and the distance is xx. I've tried indexing and seeing if there is a way to sort the matrix but it's not really working. Is there a good way of doing this?

Edit: example input data


Solution

  • You can just modify the 0 values in this way:

    #your code
    import numpy as np
    import pandas as pd
    from scipy.spatial.distance import cdist
    from haversine import haversine
    df = pd.read_csv('..data/file_name.csv')
    df.set_index('Name', inplace=True)
    dm = cdist(df, df, metric=haversine)
    
    #my code
    dm[dm==0] = np.max(dm,axis = 1)
    
    #yoru code
    closest = dm.argmin(axis=1)
    distances = dm.min(axis=1)
    df['closest person'] = df.index[closest]
    df['distance'] = distances