Search code examples
pythonpandasloopsdatetimetimedelta

Calculate time difference between last row and all other rows in a dataframe in Python


I have a bunch of dataframes that all look like this

   DATE        A     B    
2021-01-01     1     2   
2021-01-05     1     2
2021-01-06     1     2
2021-01-10     1     2
2021-01-20     1     2

I would like to calculate the diffence in time between all rows and the last row. Meaning I'd like to create another column that contains the time difference between that row and the last row in the dataframe. So it should look like this:

   DATE        A     B     timediff  
2021-01-01     1     2     19 days
2021-01-05     1     2     14 days
2021-01-06     1     2     13 days
2021-01-10     1     2     9 days
2021-01-20     1     2     0 days

Is there a way to do this? Date is already a datetime variable.

Thanks


Solution

  • You can subtract the last row and take absolute:

    df['timediff_days'] =df['DATE'].sub(df['DATE'].iloc[-1]).dt.days.abs()
    

    Or:

    df['timediff'] = pd.Timedelta(0,unit='d')-df['DATE'].sub(df['DATE'].iloc[-1])
    
            DATE  A  B timediff
    0 2021-01-01  1  2  19 days
    1 2021-01-05  1  2  15 days
    2 2021-01-06  1  2  14 days
    3 2021-01-10  1  2  10 days
    4 2021-01-20  1  2   0 days