I'm working on a dataset which details emergency service shifts worked. If the start time of the next shift is within 1 hours of the previous shift then we want to combine these shifts together to find the total length. This condition applies if there are multiple instances of close shifts, essentially 'chaining' them together. These combined shifts would share a key. For example in the table below, rows 3+4 would have one key and rows 5,6,7 have one key.
My attempt at grouping these shifts together resulted in the ['Key'] column, which falsely groups rows 3-7 together, even though there is a greater than 1 hour time delta difference between rows 3+4.
The desired output is the ['Desired_Key'] column. Would appreciate any help in ideas/solution to break this down!
Start Time | End Time | Time till next shift (hrs) | continuing | Key | Desired_Key | |
---|---|---|---|---|---|---|
1 | 22/11/2021 20:30 | 23/11/2021 2:00 | 2.4 | 4705 | 4705 | |
2 | 23/11/2021 4:23 | 23/11/2021 9:00 | 1680.0 | 4706 | 4706 | |
3 | 1/02/2022 9:03 | 1/02/2022 12:30 | 0.0 | Y | 4707 | 4707 |
4 | 1/02/2022 12:30 | 1/02/2022 14:30 | 150.6 | Y | 4707 | 4707 |
5 | 7/02/2022 21:07 | 7/02/2022 23:55 | 0.4 | Y | 4707 | 4708 |
6 | 8/02/2022 0:18 | 8/02/2022 5:30 | 0.5 | Y | 4707 | 4708 |
7 | 8/02/2022 6:00 | 8/02/2022 8:00 | 0.0 | Y | 4707 | 4708 |
8 | 7/10/2021 0:55 | 7/10/2021 2:55 | 174.9 | 4708 | 4709 | |
9 | 14/10/2021 9:46 | 14/10/2021 13:59 | 18.2 | 4709 | 4710 | |
10 | 15/10/2021 8:10 | 15/10/2021 13:59 | 0.7 | Y | 4710 | 4711 |
11 | 15/10/2021 14:43 | 15/10/2021 16:43 | 71.7 | Y | 4710 | 4711 |
12 | 18/10/2021 16:25 | 18/10/2021 18:25 | 24.6 | 4711 | 4712 |
IIUC, shift 8 should also be included in group [5,6,7]. If so, the following should work:
import numpy as np
import pandas as pd
df.assign(
Desired_Key = df['Time till next shift (hrs)'].gt(1).cumsum().shift(1, fill_value=0).values + 4505,
continuing = np.where(df['Time till next shift (hrs)'].le(1), 'Y', '')
)
Update:
df.assign(
Desired_Key=df.groupby('ID', as_index=False)['Time till next shift (hrs)']
.apply(lambda x: x.gt(1).shift(1, fill_value=1))
.cumsum()
.sub(1)
.values
+ 4505,
continuing=np.where(
df['ID'].eq(df['ID'].shift(-1)) & df['Time till next shift (hrs)'].le(1),
'Y',
'',
),
)