I need to create a dataframe that contains all possible start times for a scheduler for some machines. My initial dataframe (msDF) contains three simple columns:
msDF is copied from a master dataframe, but for illustration, it may look like this:
msDF = pd.DataFrame({ 'MachID': [1,2,3,4,5],
'Start': ["02/04/2021 9:00","06/04/2021 12:30","09/04/2021 10:00", \
"12/04/2021 11:00","15/04/2021 08:00"],
'slots': [2, 3, 4, 3, 1]})
MachID | Start | slots | |
---|---|---|---|
0 | 1 | 02/04/2021 9:00 | 2 |
1 | 2 | 06/04/2021 12:30 | 3 |
2 | 3 | 09/04/2021 10:00 | 4 |
3 | 1 | 12/04/2021 11:00 | 3 |
4 | 1 | 15/04/2021 08:00 | 1 |
I need to explode this dataframe so that each row is duplicated "slots" times with a slotIndex. The desired output is:
MachID | Start | slots | SlotIndex | |
---|---|---|---|---|
0 | 1 | 02/04/2021 9:00 | 2 | 0 |
0 | 1 | 02/04/2021 9:00 | 2 | 1 |
1 | 2 | 06/04/2021 12:30 | 3 | 0 |
1 | 2 | 06/04/2021 12:30 | 3 | 1 |
1 | 2 | 06/04/2021 12:30 | 3 | 2 |
My approach is problematic. I am creating variable length lists into the SlotIndex and exploding them, but this creates warnings.
To do this, I use:
msDF['SlotIndex'] = None
for x in msDF.index:
msDF.SlotIndex.loc[x] = list(range(msDF.loc[x,'slots']))
It works but with warnings : SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
I later explode msDF to get the result I want:
msDF = msDF.explode('SlotIndex')
How can this be improved?
Use repeat.
df.loc[df.index.repeat(df.slots)]
The index will be repeated. So you can use that to set the slot id.
df['slot_id'] = 1
df['slot_id'] = df.groupby(df.index)['slot_id'].transform('cumsum')