Search code examples
python-3.xpandasdataframegroup-bydata-munging

pandas get time delta between two conditions per group


I have a dataframe:

> df = batch Code.  time 
>       a     100.  2019-08-01 00:59:12.000
>       a     120.  2019-08-01 00:59:32.000
>       a     130.  2019-08-01 00:59:42.000
>       a     120.  2019-08-01 00:59:52.000
>       b     100.  2019-08-01 00:44:11.000
>       b     140.  2019-08-02 00:14:11.000
>       b     150.  2019-08-03 00:47:11.000
>       c     150.  2019-09-01 00:44:11.000
>       d     100.  2019-08-01 00:10:00.000
>       d     100.  2019-08-01 00:10:05.000
>       d     130.  2019-08-01 00:10:10.000
>       d     130.  2019-08-01 00:10:20.000

I want to get the number of seconds, per group, between the time of the first '100' code to the last '130' code. If for a group there is no code 100 with code 130 after (one of them is missing) - put nan. So the output should be:

df2 = batch duration
        a      30
        b.     nan
        c.     nan
        d.     20

What is the best way to do it?


Solution

  • Use:

    #convert values to datetimes
    df['time'] = pd.to_datetime(df['time'])
       
    #get first 100 Code per batch 
    s1=df[df['Code.'].eq(100)].drop_duplicates('batch').set_index('batch')['time']
    #get last 130 Code per batch 
    s2=df[df['Code.'].eq(130)].drop_duplicates('batch', keep='last').set_index('batch')['time']
    
    #subtract and convert to timedeltas
    df = (s2.sub(s1)
            .dt.total_seconds()
            .reindex(df['batch'].unique())
            .reset_index(name='duration'))
    print (df)
      batch  duration
    0     a      30.0
    1     b       NaN
    2     c       NaN
    3     d      20.0