Search code examples
pythonpandasgroupingtimedelta

Python: Grouping of time-delta values


I'm working on a dataset which details emergency service shifts worked. If the start time of the next shift is within 1 hours of the previous shift then we want to combine these shifts together to find the total length. This condition applies if there are multiple instances of close shifts, essentially 'chaining' them together. These combined shifts would share a key. For example in the table below, rows 3+4 would have one key and rows 5,6,7 have one key.

My attempt at grouping these shifts together resulted in the ['Key'] column, which falsely groups rows 3-7 together, even though there is a greater than 1 hour time delta difference between rows 3+4.

The desired output is the ['Desired_Key'] column. Would appreciate any help in ideas/solution to break this down!

Start Time End Time Time till next shift (hrs) continuing Key Desired_Key
1 22/11/2021 20:30 23/11/2021 2:00 2.4 4705 4705
2 23/11/2021 4:23 23/11/2021 9:00 1680.0 4706 4706
3 1/02/2022 9:03 1/02/2022 12:30 0.0 Y 4707 4707
4 1/02/2022 12:30 1/02/2022 14:30 150.6 Y 4707 4707
5 7/02/2022 21:07 7/02/2022 23:55 0.4 Y 4707 4708
6 8/02/2022 0:18 8/02/2022 5:30 0.5 Y 4707 4708
7 8/02/2022 6:00 8/02/2022 8:00 0.0 Y 4707 4708
8 7/10/2021 0:55 7/10/2021 2:55 174.9 4708 4709
9 14/10/2021 9:46 14/10/2021 13:59 18.2 4709 4710
10 15/10/2021 8:10 15/10/2021 13:59 0.7 Y 4710 4711
11 15/10/2021 14:43 15/10/2021 16:43 71.7 Y 4710 4711
12 18/10/2021 16:25 18/10/2021 18:25 24.6 4711 4712

Solution

  • IIUC, shift 8 should also be included in group [5,6,7]. If so, the following should work:

    import numpy as np
    import pandas as pd
    
    df.assign(
        Desired_Key = df['Time till next shift (hrs)'].gt(1).cumsum().shift(1, fill_value=0).values + 4505,
        continuing = np.where(df['Time till next shift (hrs)'].le(1), 'Y', '')
    )
    

    Update:

    df.assign(
        Desired_Key=df.groupby('ID', as_index=False)['Time till next shift (hrs)']
        .apply(lambda x: x.gt(1).shift(1, fill_value=1))
        .cumsum()
        .sub(1)
        .values
        + 4505,
        continuing=np.where(
            df['ID'].eq(df['ID'].shift(-1)) & df['Time till next shift (hrs)'].le(1),
            'Y',
            '',
        ),
    )