Search code examples
pythonpandasgeopy

Pandas use the value from previous row in a geopy distance


I am trying to compute geo-distance based on the coordinates from the previous row. Is there a way to compute without adding extra columns to the data-frame?

Sample code:

import pandas
import geopy.distance

d = {'id_col':['A','B','C','D'], 
  'lat':[ 40.8397,40.7664,40.6845,40.6078], 
  'lon':[-104.9661,-104.999,-105.01,-105.003]
   }
df = pandas.DataFrame(data=d)

First approach with lambda and apply

df['geo_dist']=df.apply(lambda x: geopy.distance.geodesic((x['lat'],x['lon']),(x['lat'].shift(),x['lon']).shift()),axis=1)

I would get the error: AttributeError: ("'float' object has no attribute 'shift'", u'occurred at index 0')

And my second approach via calling a function on the dataframe:

def geodist(x):
    return geopy.distance.geodesic((x['lat'],x['lon']),(x['lat'].shift(),x['lon']).shift())

df['geo_dist']=geodist(f)

In this case I would get the error:ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any help is greatly appreciated.


Solution

  • The first approach won't work like that, as the lambda function is applied to a single row of the DataFrame and x is not a list of all observations as you expect it. For this to work, you can take the previous element index with x.name-1 and access the location in df, like so

    df['geo_dist']=df.apply(lambda x: geopy.distance.geodesic((x['lat'],x['lon']),(df.iloc[x.name - 1].lat,df.iloc[x.name - 1].lon)) if x.name > 0 else 0,axis=1)
    

    Hope this helps