Search code examples

Geopy calculate geodesic distance from two dataframes

I am trying to calculate geodesic distance with Geopy from two different dfs.

I want to feed a function a point from df1 (tuple of lat, lon coordinates), and have it calculate a new column in df2 of distances from that point. I then want it to return the lowest value.

So far this is what I have:

df1 and df2 both contain a column called [lat_lon] which is a tuple of coordinates.

from geopy.distance import geodesic

def get_distance(point, df2): 
    df2['dist'] = df2.apply(geodesic(point, df2['lat_lon']).miles)
    closest = df2.loc[df2['dist'].idxmin()]
    return closest

I then want to apply this to df1 so that a new column is created with the closest value.

df1['closest_location'] = df1['lat_lon'].apply(lambda x: get_distance(x, df2))

I am getting this error when running the last line:

ValueError: When creating a Point from sequence, it must not have more than 3 items.

I think I am lost in the lambdas here.


  • You're passing the entire df2 to geodesic, but it only takes single tuples as input. To solve it you could include a lambda in the function as well:

    def get_distance(point, df2): 
        dists = df2['lat_lon'].apply(lambda x: geodesic(point, x).miles)
        closest = df2.loc[dists.idxmin()]
        return closest