I have a bunch of dataframes that all look like this
DATE A B
2021-01-01 1 2
2021-01-05 1 2
2021-01-06 1 2
2021-01-10 1 2
2021-01-20 1 2
I would like to calculate the diffence in time between all rows and the last row. Meaning I'd like to create another column that contains the time difference between that row and the last row in the dataframe. So it should look like this:
DATE A B timediff
2021-01-01 1 2 19 days
2021-01-05 1 2 14 days
2021-01-06 1 2 13 days
2021-01-10 1 2 9 days
2021-01-20 1 2 0 days
Is there a way to do this? Date is already a datetime variable.
Thanks
You can subtract the last row and take absolute:
df['timediff_days'] =df['DATE'].sub(df['DATE'].iloc[-1]).dt.days.abs()
Or:
df['timediff'] = pd.Timedelta(0,unit='d')-df['DATE'].sub(df['DATE'].iloc[-1])
DATE A B timediff
0 2021-01-01 1 2 19 days
1 2021-01-05 1 2 15 days
2 2021-01-06 1 2 14 days
3 2021-01-10 1 2 10 days
4 2021-01-20 1 2 0 days