I am trying to map a new column to a pandas dataframe using a custom function that takes in two input tuples. The function is:
def distance(origin, destination):
lat1, lon1 = origin
lat2, lon2 = destination
radius = 3958.8 # miles
dlat = math.radians(lat2-lat1)
dlon = math.radians(lon2-lon1)
a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
* math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
d = radius * c
return d
The dataframe has a column of lat and long in tuple form and I am attempting to measure the distance between the current and previous row's coordinates.
I have tried a for loop:
df3.loc[0, 'dist'] = 0
for i in range(1, len(df3)):
df3.loc[i, 'dist'] = distance(df3.loc[i-1, 'lat_long'], df3.loc[i, 'lat_long'])
but I get an error "ValueError: not enough values to unpack (expected 2, got 1)"
Any ideas on how to do this better?
synthesised data to illustrate
reset_index()
to get row number as a column index
range()
that is from previous row to current row. Floored previous row to 0 if it's first rowtuplecalc()
. You noted your long,lat are tuplesindex
columndf = pd.DataFrame({"geo":[(1,2),(3,4),(5,6)]}).reset_index()
def distance(prev, curr):
return prev[0] + prev[1] + curr[0] + curr[1]
def tuplecalc(tuples):
return distance(tuples[0], tuples[1] if len(tuples)==2 else (0,0))
df["distance"] = df.apply(lambda r: tuplecalc(df.loc[range(max(r["index"]-1,0),r["index"]+1),"geo"].values), axis=1)
df.drop(["index"], axis=1)
as additional columns
df = pd.DataFrame({"long":[1,3,5], "lat":[2,4,6]}).reset_index()
def rowrange(index, col):
return 0 if index==0 else df.loc[range(max(index-1,0),index), col].values[0]
df["prev_long"] = df.apply(lambda r: rowrange(r["index"], "long"), axis=1)
df["prev_lat"] = df.apply(lambda r: rowrange(r["index"], "lat"), axis=1)
df
output
geo distance
0 (1, 2) 3
1 (3, 4) 10
2 (5, 6) 18