I would like to create a new column in the data frame which consists of the distances between the location of the current transaction and the location of the last transaction.
I have the lat and long for each location and have used the haversine formula to compute the distance between two coordinates.
def haversine(lat1, lon1, lat2, lon2):
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat / 2.0) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0) ** 2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c # Radius of earth in kilometers. (Use 3956 for miles)
return km
However, I am trying to adapt it so that it computes the difference from the last row (which was the previous location):
for i in range(0,df.shape[0]-1):
df['Dist_last_trans'] = \
haversine(df['merch_lat'].iloc[i-1], df['merch_long'].iloc[i-1],
df['merch_lat'].iloc[i], df['merch_long'].iloc[i])
but then the output is the same for every row, which is clearly wrong.
Any help would be greatly appreciated.
I have reproduced your case with a toy dataframe. The problem is that you are not specifying a row during assignment. This results in a column-wide assignment which modifies the Diff_last_trans column for all rows.
>>> import pandas as pd
>>> data = [['Alex',10],['Bob',12],['Clarke',13]]
>>> df = pd.DataFrame(data,columns=['Name','Diff_last_trans'])
>>> df['Diff_last_trans']
0 10
1 12
2 13
Name: Diff_last_trans, dtype: int64
>>> df['Diff_last_trans'] =3
>>> df['Diff_last_trans']
0 3
1 3
2 3
Name: Diff_last_trans, dtype: int64
Try to specify a row index with
>>> df.loc[1]['Diff_last_trans'] = 2
>>> df['Diff_last_trans']
0 3
1 2
2 3
in your case this would be used as
df.loc[i]['Diff_last_trans'] = \
haversine(df['merch_lat'].iloc[i-1], df['merch_long'].iloc[i-1],
df['merch_lat'].iloc[i], df['merch_long'].iloc[i])