Search code examples
pythonpandasdataframesubtraction

Subtract rows from a dataframe two by two


I have a data frame with two columns : the event and the date.

    created_at           event
0   2020-11-16 13:41:34  meeting-created
1   2020-11-16 13:49:52  meeting-ended
2   2020-11-16 14:01:36  meeting-created
3   2020-11-16 15:16:24  meeting-ended

I want to calculate the total duration of the meeting so I need to subtract the two first dates and then the last two. Knowing that there may be more lines in the dataframe.


Solution

  • I believe you need if there is always pairs subtract filtered value with convert second Series to numpy array:

    df['created_at'] = pd.to_datetime(df['created_at'])
    
    s1 = df.loc[df['event'].eq('meeting-ended'), 'created_at']
    s2 = df.loc[df['event'].eq('meeting-created'), 'created_at']
    
    df['new'] = s1.sub(s2.to_numpy())
    print (df)
               created_at            event             new
    0 2020-11-16 13:41:34  meeting-created             NaT
    1 2020-11-16 13:49:52    meeting-ended 0 days 00:08:18
    2 2020-11-16 14:01:36  meeting-created             NaT
    3 2020-11-16 15:16:24    meeting-ended 0 days 01:14:48