I am trying to compute geo-distance based on the coordinates from the previous row. Is there a way to compute without adding extra columns to the data-frame?
Sample code:
import pandas
import geopy.distance
d = {'id_col':['A','B','C','D'],
'lat':[ 40.8397,40.7664,40.6845,40.6078],
'lon':[-104.9661,-104.999,-105.01,-105.003]
}
df = pandas.DataFrame(data=d)
First approach with lambda
and apply
df['geo_dist']=df.apply(lambda x: geopy.distance.geodesic((x['lat'],x['lon']),(x['lat'].shift(),x['lon']).shift()),axis=1)
I would get the error: AttributeError: ("'float' object has no attribute 'shift'", u'occurred at index 0')
And my second approach via calling a function on the dataframe:
def geodist(x):
return geopy.distance.geodesic((x['lat'],x['lon']),(x['lat'].shift(),x['lon']).shift())
df['geo_dist']=geodist(f)
In this case I would get the error:ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Any help is greatly appreciated.
The first approach won't work like that, as the lambda function is applied to a single row of the DataFrame and x is not a list of all observations as you expect it. For this to work, you can take the previous element index with x.name-1 and access the location in df, like so
df['geo_dist']=df.apply(lambda x: geopy.distance.geodesic((x['lat'],x['lon']),(df.iloc[x.name - 1].lat,df.iloc[x.name - 1].lon)) if x.name > 0 else 0,axis=1)
Hope this helps